Consider the following MWE of threaded element-wise computation:
using Base.Threads
using LinearAlgebra
using BenchmarkTools
const N = 100000
a = rand(N)
b = rand(N)
function foo(a, b)
@threads for i ∈ eachindex(a)
a[i] = (a[i] + b[i]) ^ (a[i] - b[i])
end
end
@btime foo($a, $b)
I tested it under both Julia 1.1.1 and Julia 1.2.0, both of which are official binaries:
~/codes » /home/opt/julia-1.1.1/bin/julia test.jl pshi@discover
75.924 μs (1 allocation: 32 bytes)
---------------------------------------------------------------------------------------------------------
~/codes » /home/opt/julia-1.2.0/bin/julia test.jl pshi@discover
114.931 μs (133 allocations: 13.53 KiB)
And the computer is:
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Xeon(R) Gold 6136 CPU @ 3.00GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Environment:
JULIA_NUM_THREADS = 20
There is significant more allocation and worse performance in 1.2 if I do so. Is there anything I should modify in the new version (seems not according to the changelog)? Or could you guys reproduce/confirm this?
Thanks!