Hi! First of all, you shouldn’t write broadcasting like this. This is what the dot-notation is for: More Dots: Syntactic Loop Fusion in Julia
Instead, you should write it like this:
minimum(abs.(P .- Ds[i])) == 0
This is more readable, and also faster, since your code first creates a temporary array with the Ds[i]
subtracted from P
, and then a new one with the absolute values. Then finally your code iterates over the entire result vector to find the minimum, and compares it to zero.
So this is clearly inefficient (and the example with the dots is also inefficient, though slightly less so.)
The correct answer is to use
any(==(Ds[i]), p) # ==(y) is the same as x -> (x==y)
This does not create any temporary arrays, and will stop the moment it finds a match.
Benchmarking: Always use BenchmarkTools for serious benchmarking, and especially for code with short run-times. Here are some timing examples:
julia> p = rand(1:1000, 1000);
julia> dsi = rand(1:1000);
julia> @btime minimum(broadcast(abs, broadcast(-, $p, $dsi))) == 0
1.467 μs (2 allocations: 15.88 KiB)
false
julia> @btime minimum(abs.($p .- $dsi)) == 0
904.152 ns (1 allocation: 7.94 KiB)
false
julia> @btime any(==($dsi), $p)
402.750 ns (0 allocations: 0 bytes)
false
The code with dots fuses the operations, and you get half the number of allocations, and also better performance. The any
code allocates zero memory.
As you can see there is no match here, so this is the worst case performance scenario for the any
code. Let’s insert a match early in the vector to see how that affects performance:
julia> p[5] = dsi; # early in the vector
julia> @btime minimum(broadcast(abs, broadcast(-, $p, $dsi))) == 0
1.499 μs (2 allocations: 15.88 KiB)
true
julia> @btime minimum(abs.($p .- $dsi)) == 0
898.548 ns (1 allocation: 7.94 KiB)
true
julia> @btime any(==($dsi), $p)
3.386 ns (0 allocations: 0 bytes)
true
No improvement for the broadcasting versions, but the any
version finds the match almost immediately and bails out.
This example showcases some general performance guidelines in Julia: Don’t create unnecessary arrays, try to iterate over the data as few times as possible, and bail out once you have the answer.
(Actually, this is mostly true for all programming languages, though there are some, like Matlab or numpy, where calling into a fast library function offsets the time-waste of creating redundant temporary arrays. In Matlab, for example, I would have implemented this as any(p == dsi)
, even though it makes an extra temporary vector.)