Why is minimum so much faster than argmin?

Interesting behavior - minimum is about 4x faster than argmin. Perhaps it knows how to use more cores? The question came up from the HoJ Discord server.

julia> y = rand(100_000);

julia> @btime argmin($y);
  181.410 μs (0 allocations: 0 bytes)

julia> @btime minimum($y);
  47.173 μs (0 allocations: 0 bytes)

minimum does mapreduce(identity, min, x, :) under the hood, argmin for the identity ends in some mapfoldl call, after going through findmin (which returns both index and element). I suspect it has a bit to do with having to keep track of both the index and the element?

They also don’t return the same thing:

julia> y = rand(100_000);     
                              
julia> argmin(y) == minimum(y)
false                         

argmin returns an index, whereas minimum returns an element. See https://github.com/JuliaLang/julia/issues/39203 for reference.

Yes, y[argmin(y)] should equal minimum(y). It seems disproportionately more expensive just to keep track of the index though, hence the question.

conceptually minimum should be faster right? because in minimum you only need to hold & compare 1 value, in argmin, you need to do everything minimum does and also have another.

But currently the findminmax! looks super complicated…so probably there’s a way to make argmin faster.

julia> @benchmark findfirst(==(minimum($y)), $y)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  82.196 μs … 128.444 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     83.789 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   84.363 μs ±   3.408 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █                                                             
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▁
  105 μs          Histogram: frequency by time           87 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark argmin($y)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  116.721 μs … 162.068 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     121.490 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   121.643 μs ±   2.368 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █                                                              
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▂▂ ▃
  121 μs           Histogram: frequency by time          122 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark minimum($y)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  18.906 μs …  43.192 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     19.517 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   19.567 μs ± 460.606 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █                                                             
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁  ▁
  43.2 μs         Histogram: frequency by time         19.5 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

LOL…

The code in question can be found here:

https://github.com/JuliaLang/julia/blob/e37e2906a4b13c6ff671f52e8ef88ea3fc815515/base/reduce.jl#L862-L863

You’re welcome to profile & PR, though I can’t find anything obviously wrong with it. Might have to do with mapfoldl requiring left folds?

How did you arrive at findminmax!?

You’re comparing hot-caches, careful! mapfoldl only has half the useful cache space, since it has to keep track of two elements.

nvm, I mis-looked, it’s mapfoldl() as you pointed out

depending on where the first element is actually is:

julia> @benchmark findfirst(==(minimum(x)), x) evals=1 setup=(x=rand(10^5))
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  18.976 μs … 280.282 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     46.909 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   48.030 μs ±  19.616 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █                                                             
  █▁▁ ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁  ▁▁▁▁▁▁ ▁▁▁▁▁▁▁▁▁▁▁ ▂
  50.3 μs         Histogram: frequency by time         24.1 μs <

julia> @benchmark argmin(x) evals=1 setup=(x=rand(10^5))
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  117.904 μs … 177.407 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     120.619 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   120.919 μs ±   2.371 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

                      █                        ▅                 
  ▂▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁█▁▁▄▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▁▁▁▁█▁▁▁▁▁▃▁▁▁▃▁▁▁▁▁ ▃
  118 μs           Histogram: frequency by time          121 μs <

julia> @benchmark minimum(x) evals=1 setup=(x=rand(10^5))
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  18.194 μs …  38.804 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     18.736 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   18.929 μs ± 855.817 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █                                                             
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▁ ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▁
  19.1 μs         Histogram: frequency by time         18.6 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

median and mean are still significantly smaller…

(btw this wall of text is really less than ideal… I don’t need histogram and GC at all, just make people wanna scroll past…)

You can use @btime instead of @benchmark if you’re only interested in the time.

btime shows you the minimum time, which is not too helpful in this case where the location of the minimal element matters a lot. btime will show you the time when that element is near the beginning. Looking at the range of time and mean/median is helpful (more than they usually are) in this case.

For fun

using LoopVectorization, BenchmarkTools
function findmin_turbo(x)
  indmin = 0
  minval = typemax(eltype(x))
  @turbo for i ∈ eachindex(x)
    newmin = x[i] < minval
    minval = newmin ? x[i] : minval
    indmin = newmin ?   i  : indmin
  end
  minval, indmin
end
y = rand(1000);
@btime findmin_turbo($y)
@btime findmin($y)
@btime minimum($y)
@btime argmin($y)

I get:

julia> y = rand(1000);

julia> @btime findmin_turbo($y)
  99.998 ns (0 allocations: 0 bytes)
(0.00021165397762579197, 79)

julia> @btime findmin($y)
  2.984 μs (0 allocations: 0 bytes)
(0.00021165397762579197, 79)

julia> @btime minimum($y)
  744.863 ns (0 allocations: 0 bytes)
0.00021165397762579197

julia> @btime argmin($y)
  2.954 μs (0 allocations: 0 bytes)
79

julia> versioninfo()
Julia Version 1.8.0-DEV.383
Commit eb83c4d25b* (2021-08-20 20:03 UTC)
Platform Info:
  OS: Linux (x86_64-redhat-linux)
  CPU: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz

This requires LoopVectorization >= v0.12.66.

Sadly not suitable for Base right now :confused: