I’m observing a change in performance in a threaded function which depends on how a write only argument to that function was initialised. Specifically creating the argument by Matrix{Float64}(undef,…) leads to the expected scaling with number of threads and creating the argument with zeros(…) leads to no scaling. A working example and observed output when run across different settings of JULIA_NUM_THREADS is given below. Reordering of the two benchmarked calls does not affect the results.
My understanding is that both dst and dst1 should have the same type (and checking with typeof seems to confirm this), the only difference being that dst1 has been zeroed with dst has not been touched after allocating.
I’m new to julia so I might be missing something rather obvious? Or, if this is unexpected, can anyone reproduce the behaviour?
using BenchmarkTools
function unpack_threaded(src::AbstractMatrix{UInt8}, dst::AbstractMatrix{T}) where T
B,M = size(src)
tbls = collect(Matrix{T}(undef, 4, 256) for n in 1:Threads.nthreads()) #dummy lookup table
Threads.@threads for m in 1:M
tbl = tbls[Threads.threadid()]
for b in 1:B
w = src[b,m]+1
for k in 1:4
@inbounds dst[b*4+k,m] = tbl[k,w]
end
end
end
end
function main()
print("Threads=$(Threads.nthreads()):\n")
B = 100000
M = 512
src = Matrix{UInt8}(undef, B, M)
dst1 = zeros(B*4,M)
print("zeros(...) :")
@btime unpack_threaded($src,$dst1)
dst = Matrix{Float64}(undef, B*4, M)
print("Matrix{Float64}(undef,...) :")
@btime unpack_threaded($src,$dst)
end
main()
Output:
Threads=1:
zeros(...) : 172.220 ms (3 allocations: 8.28 KiB)
Matrix{Float64}(undef,...) : 171.251 ms (3 allocations: 8.28 KiB)
Threads=2:
zeros(...) : 184.228 ms (4 allocations: 16.41 KiB)
Matrix{Float64}(undef,...) : 84.084 ms (4 allocations: 16.41 KiB)
Threads=4:
zeros(...) : 170.576 ms (6 allocations: 32.67 KiB)
Matrix{Float64}(undef,...) : 44.229 ms (6 allocations: 32.67 KiB)
Threads=8:
zeros(...) : 176.688 ms (10 allocations: 65.20 KiB)
Matrix{Float64}(undef,...) : 35.130 ms (10 allocations: 65.20 KiB)
julia> versioninfo()
Julia Version 1.1.1
Commit 55e36cc308 (2019-05-16 04:10 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, haswell)
Environment:
JULIA_NUM_THREADS = 8