@LaurentPlagne and myself recently noticed the following surprising performances:
julia> using BenchmarkTools
julia> N = 100_000
julia> x32 = rand(Float32, N);
julia> x64 = Float64.(x32);
julia> sumExpStd(x) = sum(exp, x)
sumExpStd (generic function with 1 method)
julia> versioninfo()
Julia Version 1.2.0
Commit c6da87ff4b (2019-08-20 00:03 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Environment:
JULIA_PROJECT = @.
julia> @btime sumExpStd($x64)
1.141 ms (0 allocations: 0 bytes)
171535.09093746456
julia> @btime sumExpStd($x32)
983.589 ÎĽs (0 allocations: 0 bytes)
171535.1f0
We’d have expected the Float32
exponential to be significantly faster than its Float64
counterpart.
With julia-1.3.0, the observation is mostly the same:
julia> versioninfo()
Julia Version 1.3.0
Commit 46ce4d7933 (2019-11-26 06:09 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Environment:
JULIA_PROJECT = @.
julia> @btime sumExpStd($x64)
1.122 ms (0 allocations: 0 bytes)
171798.45158585
julia> @btime sumExpStd($x32)
1.010 ms (0 allocations: 0 bytes)
171798.47f0
And indeed, when changing the implementation of exp from Base.exp
to SLEEF.exp
, we observe significantly lower computing times for Float32
exponentials, while Float64
computing times remain in the same ballpark:
julia> import SLEEF
julia> sumExpSLEEF(x) = sum(SLEEF.exp, x)
sumExpSLEEF (generic function with 1 method)
julia> versioninfo()
Julia Version 1.2.0
Commit c6da87ff4b (2019-08-20 00:03 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Environment:
JULIA_PROJECT = @.
julia> @btime sumExpSLEEF($x64)
1.145 ms (0 allocations: 0 bytes)
171535.09093746456
julia> @btime sumExpSLEEF($x32)
205.272 ÎĽs (0 allocations: 0 bytes)
171535.08f0
and there seems to have been something happening in v1.3.0 here, but Float32
exponentials are still faster than Float64
exponentials.
julia> versioninfo()
Julia Version 1.3.0
Commit 46ce4d7933 (2019-11-26 06:09 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Environment:
JULIA_PROJECT = @.
julia> @btime sumExpSLEEF($x64)
1.145 ms (0 allocations: 0 bytes)
171798.45158585
julia> @btime sumExpSLEEF($x32)
868.180 ÎĽs (0 allocations: 0 bytes)
171798.47f0
Does anyone know what happens here? Is it a performance “bug” in your opinion?