Fastest approach to returning a vector of elements choose 2

Hi there. I’m trying to do the following, given a vector of elements, I wish to get the vector of tuples containing the elements choose 2, with the order of the tuples obviously not mattering.

Importantly, I am trying to not use Combinatorics. The clear approach is to use two for loops, which is what I’ve been doing, but I need to do it faster. Is this the fastest way to conceivably do this? For context, this is using Float32s.

Say the length of the input vector is n. Given the index of a spot in the length n-choose-2 vector that is being returned, even inefficiently its around ~1.5ns to calculate the indexes for the two element in the input vector that should be present at that index in the output tuple.
For some actual times with for-loops, for a 500 length vector input, it takes ~815us. Here is a profile.


bench

using LinearAlgebra
using BenchmarkTools
using GeometryBasics
using StatProfilerHTML


function emptyvec(T, n)
    
    return Vector{T}(undef, n)
    
end 

function choose2(vec::Vector{Line{2, Float32}})
    
 
    n = length(vec)    
    pairs = emptyvec(Tuple{typeof(vec[1]), typeof(vec[1])}, Int32(n*(n-1)/2))
    index = 1
    
    for i in 1:n
        for j in i+1:n
            pairs[index] = (vec[i], vec[j])
            index += 1
        end
    end
  return pairs
     
end


random_vector(n) = [ GeometryBasics.Line(GeometryBasics.Point(Float32(rand()), Float32(rand())), GeometryBasics.Point(Float32(rand()), Float32(rand()))) for _ in 1:n]

vec = random_vector(500)

The resulting vector of pairs is 124750 elements long. Using the time to calc the indices along with this, the time spent calculating is at least on the order of ~190us, which suggested to me that I could maybe do it faster than what I have above with another methodology. I tried to use this calculation for the indices and just loop over the indices of the output vector in order to make the pairs, but it was much slower. I tried map, and replace!, but I haven’t been able to do any better than the original way with for loops.

I don’t think you can do it significantly faster. There are ~120.000 elements in the final vector and your CPU can do ~3 instructions per ns. If each element took just 1 instruction it would still take ~40.000ns = 40μs, so 190μs for seems pretty reasonable to me.

Ofc if you don’t need an actual Vector of these results you can do much faster by just constructing a generator instead:

julia> function choose2_generator(vec::Vector)
    return ((vec[i], vec[j]) for i in 1:length(vec) for j in 1:i)
end

julia> v = rand(500);

julia> @btime choose2_generator($v);
  3.702 ns (0 allocations: 0 bytes)
1 Like

I think there might be a few things one could try:

  • You can add @inbounds for a little bit of extra speed (in my case around 9% for n = 1000).
  • If you have n > 10_000, then you could use multi-threading (I just did a naive test with Threads.@threads. There seems to be something to gain.)
Threads.@threads for i in 1:n
        vi = vec[i]
        start = div((i-1)*(i-2),2)
        for j in 1:i-1
            pairs[start+j] = (vi, vec[j])
        end
    end

For n = 10_000 I got:
Threaded: 14.821 ms (243 allocations: 190.72 MiB)
Serial: 43.672 ms (2 allocations: 190.70 MiB)

Note that the implementation can clearly be improved, as the threads have very different workloads.

1 Like

The speed of choose2 is bounded below by the amount of memory that needs to be read and written.

The following ‘trick’ will take about 0 time, by not materializing the output vector, but allowing lazy on-the-fly calculation of the tuples:

julia> using MappedArrays

julia> @inline function iuppert(k::Integer,n::Integer)
         i = n - 1 - floor(Int,sqrt(-8*k + 4*n*(n-1) + 1)/2 - 0.5)
         j = k + i + ( (n-i+1)*(n-i) - n*(n-1) )÷2
         return i, j
       end
iuppert (generic function with 1 method)

julia> function choose22(vec)
           n = length(vec)
           mappedarray(1:n*(n-1)÷2) do I
               i,j = iuppert(I,n)
               (vec[i], vec[j])
           end
       end
choose22 (generic function with 1 method)
julia> vec = rand(4)
4-element Vector{Float64}:
 0.5316777625406868
 0.5674443769426843
 0.6308345621666326
 0.7776348322244716

julia> choose22(vec)
6-element mappedarray(var"#86#87"{Vector{Float64}, Int64}([0.5316777625406868, 0.5674443769426843, 0.6308345621666326, 0.7776348322244716], 4), ::UnitRange{Int64}) with eltype Tuple{Float64, Float64}:
 (0.5316777625406868, 0.5674443769426843)
 (0.5316777625406868, 0.6308345621666326)
 (0.5316777625406868, 0.7776348322244716)
 (0.5674443769426843, 0.6308345621666326)
 (0.5674443769426843, 0.7776348322244716)
 (0.6308345621666326, 0.7776348322244716)

The iuppert function is taken from another answer by @lmiq .

PS this tuple vector is read-only unfortunately and is linked to the original vec.

1 Like

a slightly different wording used in the same post @Dan is probably referring to

v=1:10
n=size(v)[1]
function n_over_2(v,idx)
    j = Int(ceil(sqrt(2 * idx + 0.25) - 0.5))+1
    i = Int(idx - (j-2) * (j-1) / 2)
    (v[i],v[j])
end

n_over_2.([v],1:n*(n-1)/2)

but perhaps it is less suitable for this case

1 Like

Thank you! You’re right, I didn’t need a vector. That makes more sense.

I was a bit confused by what the docs say on generators. Is it just another name for an iterator? Are they are specific type of iterator?

Many thanks for these. Given those numbers it seems one can gain much more from threading comapred to inbounds? Can you use both, like

@inbounds Threads.@threads 

basically an “iterator” is just a fuzzy concept for types that implement a few iteration related methods (including iterate)

a “generator” here refers to the type Base.Generator which has implemented all those methods, and is also privileged in the sense that it is the type of the object constructed from comprehensions (i for i in ...)

1 Like

Wow. That’s really interersting. Read-only is fine in my case, exactly what I want I think. But, I’m guessing that this is not possible without the use of MappedArrays? My initial goal was to remove external dependencies, but that seems quite powerful.

Ahh okay. Thank you, that makes sense. I feel like the Julia documentation could do a better job of being explicit with that kind of information and detail. The section on them is pretty minimal for some reason.

1 Like

Is it possible to make a generator with if statements in it?

you mean like (i for i in 1:10 if iseven(i)) ?

1 Like

Probably you want the ternary operator.

1 Like

But can one do that like in a full-multi line if statement? Say I had a f(i) that I had to call two times in the if statement in the example you provided. In a multiline statement I could assign it to a variable and use that, but how can one do such a thing in a single line if statement without calling something twice? Sorry if thats a dumb question lol

julia> f(x) = rand()
f (generic function with 1 method)

julia> collect( let u = f(i); i%2==1 ? u : u*3; end for i=1:10)
10-element Vector{Float64}:
 0.07940946494560674
 1.6035072917117752
 0.4815119250120512
 1.381748619911222
 0.9124211172296329
 1.665817797002973
 0.49427982995996966
 1.7073952286259733
 0.06165322397334305
 0.05246104092133874
1 Like

it is a good question, and one that has been discussed in the past

I don’t think such a feature was ever ruled out, but it has also not yet been implemented

If you have this need, it may be difficult to do it as a snazzy comprehension one-liner.

You may notice that (i for i in 1:10 if iseven(i)) internally uses Iterators.Filter, but you are free to call Iterators.Filter directly

  • (foo(x) for x in itr if cond(foo(x)))

becomes

  • Iterators.Filter(cond, (foo(x) for x in itr))
1 Like

As an example: right after getting the pairs choose 2, I had used another vector in the same manner to store information obtained from these. I tried a couple different forms but none worked. It has variables being declared and also conditionals from these, so I don’t know how I could conceivably produce a generator with the same logic and effect.

intersections = Vector{Tuple{Tuple{GeometryBasics.Line{2, Float32}, GeometryBasics.Line{2, Float32}}, GeometryBasics.Point{2,Float32}}}() 
  
    for (seg1, seg2) in choose2_generator(segments)  
    x1, y1, x2, y2, x3, y3, x4, y4, t, u = line_data1(seg1, seg2,)
    
        if is_intersecting(t, u)
            point = lineintersection(x1, y1, x1-x2, y1-y2, x3, y3, x3-x4, y3-y4, t, u)
            if point != false && not_on_endpoint(seg1, seg2, point)
    
                push!(intersections, ((seg1, seg2) , point) )
            end
   
        end
            
    end 
    
    return intersections  

If you define a function first:

function intersect_p(seg1, seg2)
    x1, y1, x2, y2, x3, y3, x4, y4, t, u = line_data1(seg1, seg2)
    is_intersecting(t, u) || return nothing
    point = lineintersection(x1, y1, x1-x2, y1-y2, x3, y3, x3-x4, y3-y4, t, u)
    if point != false && not_on_endpoint(seg1, seg2, point)
        return point
    else
        return nothing
    end
end

then you can create a generator that performs your desired function using filter as suggested by @adienes above:

myit = Iterators.filter((vec[i], vec[j], intersect_p(vec[i],vec[j])) for i in 1:length(vec) for j in 1:i) do x
    !isnothing(x[3])
end
1 Like

Thank you. I tried this and, and it is much much faster itself. But, trying the generator out now with other functions, in graphing the points, getting the number of points, getting the number of unique points, in creating a matrix etc, the generator seems to allocate much more, and it takes longer such that the gain from these new functions is outdone by the large increase in time when manipulating them. Is there a tradeoff with the generators for using them in this regard?

I assume that the allocations and increased execution time are due to the functions being called in intercept_p. The generator puts off calling them until it is iterated, but sooner or later you have to pay for the calculations you are doing. You could try to optimize these functions for memory and speed, perhaps with help from this forum. You could also multi thread the loop over the generator.