Fastest approach to returning a vector of elements choose 2

narnold0 · February 21, 2024, 3:31am

Hi there. I’m trying to do the following, given a vector of elements, I wish to get the vector of tuples containing the elements choose 2, with the order of the tuples obviously not mattering.

Importantly, I am trying to not use Combinatorics. The clear approach is to use two for loops, which is what I’ve been doing, but I need to do it faster. Is this the fastest way to conceivably do this? For context, this is using Float32s.

Say the length of the input vector is n. Given the index of a spot in the length n-choose-2 vector that is being returned, even inefficiently its around ~1.5ns to calculate the indexes for the two element in the input vector that should be present at that index in the output tuple.
For some actual times with for-loops, for a 500 length vector input, it takes ~815us. Here is a profile.

using LinearAlgebra
using BenchmarkTools
using GeometryBasics
using StatProfilerHTML


function emptyvec(T, n)
    
    return Vector{T}(undef, n)
    
end 

function choose2(vec::Vector{Line{2, Float32}})
    
 
    n = length(vec)    
    pairs = emptyvec(Tuple{typeof(vec[1]), typeof(vec[1])}, Int32(n*(n-1)/2))
    index = 1
    
    for i in 1:n
        for j in i+1:n
            pairs[index] = (vec[i], vec[j])
            index += 1
        end
    end
  return pairs
     
end


random_vector(n) = [ GeometryBasics.Line(GeometryBasics.Point(Float32(rand()), Float32(rand())), GeometryBasics.Point(Float32(rand()), Float32(rand()))) for _ in 1:n]

vec = random_vector(500)

The resulting vector of pairs is 124750 elements long. Using the time to calc the indices along with this, the time spent calculating is at least on the order of ~190us, which suggested to me that I could maybe do it faster than what I have above with another methodology. I tried to use this calculation for the indices and just loop over the indices of the output vector in order to make the pairs, but it was much slower. I tried map, and replace!, but I haven’t been able to do any better than the original way with for loops.

abraemer · February 21, 2024, 6:40am

I don’t think you can do it significantly faster. There are ~120.000 elements in the final vector and your CPU can do ~3 instructions per ns. If each element took just 1 instruction it would still take ~40.000ns = 40μs, so 190μs for seems pretty reasonable to me.

Ofc if you don’t need an actual Vector of these results you can do much faster by just constructing a generator instead:

julia> function choose2_generator(vec::Vector)
    return ((vec[i], vec[j]) for i in 1:length(vec) for j in 1:i)
end

julia> v = rand(500);

julia> @btime choose2_generator($v);
  3.702 ns (0 allocations: 0 bytes)

SteffenPL · February 21, 2024, 8:18am

I think there might be a few things one could try:

You can add @inbounds for a little bit of extra speed (in my case around 9% for n = 1000).
If you have n > 10_000, then you could use multi-threading (I just did a naive test with Threads.@threads. There seems to be something to gain.)

Threads.@threads for i in 1:n
        vi = vec[i]
        start = div((i-1)*(i-2),2)
        for j in 1:i-1
            pairs[start+j] = (vi, vec[j])
        end
    end

For n = 10_000 I got:
Threaded: 14.821 ms (243 allocations: 190.72 MiB)
Serial: 43.672 ms (2 allocations: 190.70 MiB)

Note that the implementation can clearly be improved, as the threads have very different workloads.

Dan · February 21, 2024, 11:29am

The speed of choose2 is bounded below by the amount of memory that needs to be read and written.

The following ‘trick’ will take about 0 time, by not materializing the output vector, but allowing lazy on-the-fly calculation of the tuples:

julia> using MappedArrays

julia> @inline function iuppert(k::Integer,n::Integer)
         i = n - 1 - floor(Int,sqrt(-8*k + 4*n*(n-1) + 1)/2 - 0.5)
         j = k + i + ( (n-i+1)*(n-i) - n*(n-1) )÷2
         return i, j
       end
iuppert (generic function with 1 method)

julia> function choose22(vec)
           n = length(vec)
           mappedarray(1:n*(n-1)÷2) do I
               i,j = iuppert(I,n)
               (vec[i], vec[j])
           end
       end
choose22 (generic function with 1 method)

julia> vec = rand(4)
4-element Vector{Float64}:
 0.5316777625406868
 0.5674443769426843
 0.6308345621666326
 0.7776348322244716

julia> choose22(vec)
6-element mappedarray(var"#86#87"{Vector{Float64}, Int64}([0.5316777625406868, 0.5674443769426843, 0.6308345621666326, 0.7776348322244716], 4), ::UnitRange{Int64}) with eltype Tuple{Float64, Float64}:
 (0.5316777625406868, 0.5674443769426843)
 (0.5316777625406868, 0.6308345621666326)
 (0.5316777625406868, 0.7776348322244716)
 (0.5674443769426843, 0.6308345621666326)
 (0.5674443769426843, 0.7776348322244716)
 (0.6308345621666326, 0.7776348322244716)

The iuppert function is taken from another answer by @lmiq .

PS this tuple vector is read-only unfortunately and is linked to the original vec.

rocco_sprmnt21 · February 21, 2024, 2:05pm

a slightly different wording used in the same post @Dan is probably referring to

v=1:10
n=size(v)[1]
function n_over_2(v,idx)
    j = Int(ceil(sqrt(2 * idx + 0.25) - 0.5))+1
    i = Int(idx - (j-2) * (j-1) / 2)
    (v[i],v[j])
end

n_over_2.([v],1:n*(n-1)/2)

but perhaps it is less suitable for this case

narnold0 · February 21, 2024, 8:56pm

Thank you! You’re right, I didn’t need a vector. That makes more sense.

I was a bit confused by what the docs say on generators. Is it just another name for an iterator? Are they are specific type of iterator?

narnold0 · February 21, 2024, 8:58pm

Many thanks for these. Given those numbers it seems one can gain much more from threading comapred to inbounds? Can you use both, like

@inbounds Threads.@threads

adienes · February 21, 2024, 9:01pm

basically an “iterator” is just a fuzzy concept for types that implement a few iteration related methods (including iterate)

a “generator” here refers to the type Base.Generator which has implemented all those methods, and is also privileged in the sense that it is the type of the object constructed from comprehensions (i for i in ...)

narnold0 · February 21, 2024, 9:08pm

Wow. That’s really interersting. Read-only is fine in my case, exactly what I want I think. But, I’m guessing that this is not possible without the use of MappedArrays? My initial goal was to remove external dependencies, but that seems quite powerful.

narnold0 · February 21, 2024, 9:14pm

Ahh okay. Thank you, that makes sense. I feel like the Julia documentation could do a better job of being explicit with that kind of information and detail. The section on them is pretty minimal for some reason.

narnold0 · February 21, 2024, 9:55pm

Is it possible to make a generator with if statements in it?

adienes · February 21, 2024, 10:04pm

you mean like (i for i in 1:10 if iseven(i)) ?

Jeff_Emanuel · February 21, 2024, 10:08pm

Probably you want the ternary operator.

narnold0 · February 21, 2024, 10:08pm

But can one do that like in a full-multi line if statement? Say I had a f(i) that I had to call two times in the if statement in the example you provided. In a multiline statement I could assign it to a variable and use that, but how can one do such a thing in a single line if statement without calling something twice? Sorry if thats a dumb question lol

Jeff_Emanuel · February 21, 2024, 10:11pm

julia> f(x) = rand()
f (generic function with 1 method)

julia> collect( let u = f(i); i%2==1 ? u : u*3; end for i=1:10)
10-element Vector{Float64}:
 0.07940946494560674
 1.6035072917117752
 0.4815119250120512
 1.381748619911222
 0.9124211172296329
 1.665817797002973
 0.49427982995996966
 1.7073952286259733
 0.06165322397334305
 0.05246104092133874

adienes · February 21, 2024, 10:29pm

it is a good question, and one that has been discussed in the past

I don’t think such a feature was ever ruled out, but it has also not yet been implemented

If you have this need, it may be difficult to do it as a snazzy comprehension one-liner.

You may notice that (i for i in 1:10 if iseven(i)) internally uses Iterators.Filter, but you are free to call Iterators.Filter directly

(foo(x) for x in itr if cond(foo(x)))

becomes

Iterators.Filter(cond, (foo(x) for x in itr))

narnold0 · February 22, 2024, 12:38am

As an example: right after getting the pairs choose 2, I had used another vector in the same manner to store information obtained from these. I tried a couple different forms but none worked. It has variables being declared and also conditionals from these, so I don’t know how I could conceivably produce a generator with the same logic and effect.

intersections = Vector{Tuple{Tuple{GeometryBasics.Line{2, Float32}, GeometryBasics.Line{2, Float32}}, GeometryBasics.Point{2,Float32}}}() 
  
    for (seg1, seg2) in choose2_generator(segments)  
    x1, y1, x2, y2, x3, y3, x4, y4, t, u = line_data1(seg1, seg2,)
    
        if is_intersecting(t, u)
            point = lineintersection(x1, y1, x1-x2, y1-y2, x3, y3, x3-x4, y3-y4, t, u)
            if point != false && not_on_endpoint(seg1, seg2, point)
    
                push!(intersections, ((seg1, seg2) , point) )
            end
   
        end
            
    end 
    
    return intersections

PeterSimon · February 22, 2024, 1:34am

If you define a function first:

function intersect_p(seg1, seg2)
    x1, y1, x2, y2, x3, y3, x4, y4, t, u = line_data1(seg1, seg2)
    is_intersecting(t, u) || return nothing
    point = lineintersection(x1, y1, x1-x2, y1-y2, x3, y3, x3-x4, y3-y4, t, u)
    if point != false && not_on_endpoint(seg1, seg2, point)
        return point
    else
        return nothing
    end
end

then you can create a generator that performs your desired function using filter as suggested by @adienes above:

myit = Iterators.filter((vec[i], vec[j], intersect_p(vec[i],vec[j])) for i in 1:length(vec) for j in 1:i) do x
    !isnothing(x[3])
end

narnold0 · February 22, 2024, 4:53am

Thank you. I tried this and, and it is much much faster itself. But, trying the generator out now with other functions, in graphing the points, getting the number of points, getting the number of unique points, in creating a matrix etc, the generator seems to allocate much more, and it takes longer such that the gain from these new functions is outdone by the large increase in time when manipulating them. Is there a tradeoff with the generators for using them in this regard?

PeterSimon · February 22, 2024, 5:48am

I assume that the allocations and increased execution time are due to the functions being called in intercept_p. The generator puts off calling them until it is iterated, but sooner or later you have to pay for the calculations you are doing. You could try to optimize these functions for memory and speed, perhaps with help from this forum. ~~You could also multi thread the loop over the generator.~~

Topic		Replies	Views
A generator with two "for" keywords is slow Performance	14	1780	December 4, 2017
Generators vs loops vs broadcasting: Calculate PI via Monte Carlo Sampling Performance	7	1501	June 20, 2021
Performance of creating a tuple with a for loop Performance	26	2633	September 15, 2020
A good way to iterate through sequences? General Usage question	19	2338	June 26, 2018
Prefered pattern for generators? Performance	12	948	January 3, 2019

Fastest approach to returning a vector of elements choose 2

Related topics