tk3369
August 22, 2021, 7:43pm
1
Interesting behavior - minimum
is about 4x faster than argmin
. Perhaps it knows how to use more cores? The question came up from the HoJ Discord server.
julia> y = rand(100_000);
julia> @btime argmin($y);
181.410 μs (0 allocations: 0 bytes)
julia> @btime minimum($y);
47.173 μs (0 allocations: 0 bytes)
2 Likes
Sukera
August 22, 2021, 8:02pm
2
minimum
does mapreduce(identity, min, x, :)
under the hood, argmin
for the identity ends in some mapfoldl
call, after going through findmin
(which returns both index and element). I suspect it has a bit to do with having to keep track of both the index and the element?
They also don’t return the same thing:
julia> y = rand(100_000);
julia> argmin(y) == minimum(y)
false
argmin
returns an index, whereas minimum
returns an element. See https://github.com/JuliaLang/julia/issues/39203 for reference.
tk3369
August 22, 2021, 8:09pm
3
Yes, y[argmin(y)]
should equal minimum(y)
. It seems disproportionately more expensive just to keep track of the index though, hence the question.
1 Like
jling
August 22, 2021, 8:11pm
4
conceptually minimum should be faster right? because in minimum you only need to hold & compare 1 value, in argmin, you need to do everything minimum does and also have another.
But currently the findminmax!
looks super complicated…so probably there’s a way to make argmin faster.
julia> @benchmark findfirst(==(minimum($y)), $y)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 82.196 μs … 128.444 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 83.789 μs ┊ GC (median): 0.00%
Time (mean ± σ): 84.363 μs ± 3.408 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
█
█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▁
105 μs Histogram: frequency by time 87 μs <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark argmin($y)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 116.721 μs … 162.068 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 121.490 μs ┊ GC (median): 0.00%
Time (mean ± σ): 121.643 μs ± 2.368 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
█
█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▂▂ ▃
121 μs Histogram: frequency by time 122 μs <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark minimum($y)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 18.906 μs … 43.192 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 19.517 μs ┊ GC (median): 0.00%
Time (mean ± σ): 19.567 μs ± 460.606 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█
█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▁
43.2 μs Histogram: frequency by time 19.5 μs <
Memory estimate: 0 bytes, allocs estimate: 0.
LOL…
1 Like
Sukera
August 22, 2021, 8:11pm
5
The code in question can be found here:
https://github.com/JuliaLang/julia/blob/e37e2906a4b13c6ff671f52e8ef88ea3fc815515/base/reduce.jl#L862-L863
You’re welcome to profile & PR, though I can’t find anything obviously wrong with it. Might have to do with mapfoldl
requiring left folds?
Sukera
August 22, 2021, 8:12pm
6
How did you arrive at findminmax!
?
jling:
LOL…
You’re comparing hot-caches, careful! mapfoldl
only has half the useful cache space, since it has to keep track of two elements.
jling
August 22, 2021, 8:13pm
7
nvm, I mis-looked, it’s mapfoldl()
as you pointed out
jling
August 22, 2021, 8:16pm
8
depending on where the first element is actually is:
julia> @benchmark findfirst(==(minimum(x)), x) evals=1 setup=(x=rand(10^5))
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 18.976 μs … 280.282 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 46.909 μs ┊ GC (median): 0.00%
Time (mean ± σ): 48.030 μs ± 19.616 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
█
█▁▁ ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▁▁▁▁▁▁ ▁▁▁▁▁▁▁▁▁▁▁ ▂
50.3 μs Histogram: frequency by time 24.1 μs <
julia> @benchmark argmin(x) evals=1 setup=(x=rand(10^5))
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 117.904 μs … 177.407 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 120.619 μs ┊ GC (median): 0.00%
Time (mean ± σ): 120.919 μs ± 2.371 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
█ ▅
▂▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁█▁▁▄▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▁▁▁▁█▁▁▁▁▁▃▁▁▁▃▁▁▁▁▁ ▃
118 μs Histogram: frequency by time 121 μs <
julia> @benchmark minimum(x) evals=1 setup=(x=rand(10^5))
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 18.194 μs … 38.804 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 18.736 μs ┊ GC (median): 0.00%
Time (mean ± σ): 18.929 μs ± 855.817 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█
█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▁ ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▁
19.1 μs Histogram: frequency by time 18.6 μs <
Memory estimate: 0 bytes, allocs estimate: 0.
median and mean are still significantly smaller…
(btw this wall of text is really less than ideal… I don’t need histogram and GC at all, just make people wanna scroll past…)
You can use @btime
instead of @benchmark
if you’re only interested in the time.
jling
August 23, 2021, 2:23am
10
btime
shows you the minimum time, which is not too helpful in this case where the location of the minimal element matters a lot. btime
will show you the time when that element is near the beginning. Looking at the range of time and mean/median is helpful (more than they usually are) in this case.
3 Likes
Elrod
August 23, 2021, 11:37am
11
For fun
using LoopVectorization, BenchmarkTools
function findmin_turbo(x)
indmin = 0
minval = typemax(eltype(x))
@turbo for i ∈ eachindex(x)
newmin = x[i] < minval
minval = newmin ? x[i] : minval
indmin = newmin ? i : indmin
end
minval, indmin
end
y = rand(1000);
@btime findmin_turbo($y)
@btime findmin($y)
@btime minimum($y)
@btime argmin($y)
I get:
julia> y = rand(1000);
julia> @btime findmin_turbo($y)
99.998 ns (0 allocations: 0 bytes)
(0.00021165397762579197, 79)
julia> @btime findmin($y)
2.984 μs (0 allocations: 0 bytes)
(0.00021165397762579197, 79)
julia> @btime minimum($y)
744.863 ns (0 allocations: 0 bytes)
0.00021165397762579197
julia> @btime argmin($y)
2.954 μs (0 allocations: 0 bytes)
79
julia> versioninfo()
Julia Version 1.8.0-DEV.383
Commit eb83c4d25b* (2021-08-20 20:03 UTC)
Platform Info:
OS: Linux (x86_64-redhat-linux)
CPU: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
This requires LoopVectorization >= v0.12.66
.
10 Likes
Sukera
August 23, 2021, 11:40am
12
Sadly not suitable for Base
right now
1 Like