Why is the latter implementation slower? These results surprise me especially given the fact that we have less allocations.
using BenchmarkTools
mat = randn(1000, 1000)
@btime sum(sum(x -> x > 1, mat, dims=1)) # 135.166 ΞΌs (2 allocations: 7.95 KiB)
@btime sum(x -> x > 1, mat) # 138.792 ΞΌs (1 allocation: 16 bytes)
1 Like
I think youβve got some sampling error. I get results that are the other way around with the single summation about 3% faster.
But β¦ this is probably an artifact of how @btime
works, which is to report the minimum time. Using @benchmark
instead I get these results:
julia> @benchmark double(mat)
BenchmarkTools.Trial: 8606 samples with 1 evaluation.
Range (min β¦ max): 228.354 ΞΌs β¦ 2.375 ms β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 526.527 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 572.941 ΞΌs Β± 224.422 ΞΌs β GC (mean Β± Ο): 0.04% Β± 0.85%
βββββββββββββ
β
ββββ
βββ
ββββββββββββββββββββ
β
ββββββββββββββββββββββββββββββββ β
228 ΞΌs Histogram: frequency by time 1.32 ms <
Memory estimate: 7.95 KiB, allocs estimate: 2.
julia> @benchmark single(mat)
BenchmarkTools.Trial: 8556 samples with 1 evaluation.
Range (min β¦ max): 227.849 ΞΌs β¦ 2.756 ms β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 525.841 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 576.029 ΞΌs Β± 222.980 ΞΌs β GC (mean Β± Ο): 0.00% Β± 0.00%
ββββββ
βββββββ
ββ
β
ββ
β
ββ
β
ββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββ β
228 ΞΌs Histogram: frequency by time 1.32 ms <
Memory estimate: 16 bytes, allocs estimate: 1.
What you can see is that the tiny difference in minimum times is dwarfed by the variation in timings to the point that Iβm not sure thereβs any meaningful difference.
Which is slightly interesting in that one is two function calls with intermediate allocations and the other is (seemingly) more efficient. But I suspect that the Julia code for sum
is just very well optimized in both cases.
3 Likes
what is faster is count(x->x>1,mat)
or equivalently count(>(1),mat)
1 Like