Thanks, very useful. I will try it in my code. I like the one-liner, which can be memorized easily, but also the looping function could see good use, as I will probably be throwing this into functions regularly.
On my laptop, when placed in a function, the above code performs like:
vcat([a[b.==s] for s in selection]...)
281.848 ns (11 allocations: 720 bytes)
3.334 ms (62 allocations: 1.71 MiB) on actual data (see below)
a[b .∈ Ref(selection)]
77.137 ns (3 allocations: 208 bytes)
4.472 ms (5 allocations: 149.33 KiB) on actual data
loop function
60.285 ns (1 allocation: 112 bytes)
10.826 ms (1 allocation: 4.25 KiB) on actual data
[a[i] for i in eachindex(a,b) if b[i] in selection]
135.664 ns (3 allocations: 208 bytes)
4.537 ms (7 allocations: 11.12 KiB) on actual data
I also tried this shorter version of Jonas’ loop function:
function test5(a, b, selection)
out = zeros(eltype(a), length(b))
k = 1
for i ∈ eachindex(a, b)
if b[i] ∈ selection
out[k] = a[i]
k += 1
end
end
filter!(x->x!=0.0,out)
return out
end
56.548 ns (1 allocation: 128 bytes)
5.489 ms (3 allocations: 4.40 MiB) on actual data
(allocates unnecessarily for entire vector, which is why Jonas' first loop is needed, but it is faster)
On the other hand, my vectors may not be sorted at all (sorted by time, while vector a
could be latitude, not lined up with the order of b
but the same length). I still have to figure out eachindex(a,b)
and its uses.
Update: I have now listed also the performance when applied to one iteration of my actual data, where a
and b
are of length 1153405, and selection of length 12.