@LaurentPlagne and myself recently noticed the following surprising performances:
julia> using BenchmarkTools
julia> N = 100_000
julia> x32 = rand(Float32, N);
julia> x64 = Float64.(x32);
julia> sumExpStd(x) = sum(exp, x)
sumExpStd (generic function with 1 method)
julia> versioninfo()
Julia Version 1.2.0
Commit c6da87ff4b (2019-08-20 00:03 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Environment:
JULIA_PROJECT = @.
julia> @btime sumExpStd($x64)
1.141 ms (0 allocations: 0 bytes)
171535.09093746456
julia> @btime sumExpStd($x32)
983.589 ÎĽs (0 allocations: 0 bytes)
171535.1f0
We’d have expected the Float32 exponential to be significantly faster than its Float64 counterpart.
With julia-1.3.0, the observation is mostly the same:
julia> versioninfo()
Julia Version 1.3.0
Commit 46ce4d7933 (2019-11-26 06:09 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Environment:
JULIA_PROJECT = @.
julia> @btime sumExpStd($x64)
1.122 ms (0 allocations: 0 bytes)
171798.45158585
julia> @btime sumExpStd($x32)
1.010 ms (0 allocations: 0 bytes)
171798.47f0
And indeed, when changing the implementation of exp from Base.exp to SLEEF.exp, we observe significantly lower computing times for Float32 exponentials, while Float64 computing times remain in the same ballpark:
julia> import SLEEF
julia> sumExpSLEEF(x) = sum(SLEEF.exp, x)
sumExpSLEEF (generic function with 1 method)
julia> versioninfo()
Julia Version 1.2.0
Commit c6da87ff4b (2019-08-20 00:03 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Environment:
JULIA_PROJECT = @.
julia> @btime sumExpSLEEF($x64)
1.145 ms (0 allocations: 0 bytes)
171535.09093746456
julia> @btime sumExpSLEEF($x32)
205.272 ÎĽs (0 allocations: 0 bytes)
171535.08f0
and there seems to have been something happening in v1.3.0 here, but Float32 exponentials are still faster than Float64 exponentials.
julia> versioninfo()
Julia Version 1.3.0
Commit 46ce4d7933 (2019-11-26 06:09 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Environment:
JULIA_PROJECT = @.
julia> @btime sumExpSLEEF($x64)
1.145 ms (0 allocations: 0 bytes)
171798.45158585
julia> @btime sumExpSLEEF($x32)
868.180 ÎĽs (0 allocations: 0 bytes)
171798.47f0
Does anyone know what happens here? Is it a performance “bug” in your opinion?