Here’s a function that should spend exactly half of its time doing a matrix product, the other half sleeping:
function f(A, B)
starttime = time()
C = A*B
# C = sort(A[:])
# C = svd(A)
endtime = time()
elapsed = endtime - starttime
sleep(elapsed)
return elapsed
end
I make a couple of medium-sized matrices, and profile calling f
on them, and the profiler tells me that roughly 6% of time was spent in the matrix product, the rest in sleep. If I switch the matrix product for something like a bit of sorting or an SVD, everything works as expected, and the profiler reports roughly 50/50 sleeping and working.
I’m using the OpenBLAS that comes with Julia. Here’s a bit from the output of Profile.print
:
13058 @Base/asyncevent.jl:213; sleep
.
.
.
743 ...LinearAlgebra/src/matmul.jl:160; *
724 ...LinearAlgebra/src/matmul.jl:208; mul!
╎ 724 ...inearAlgebra/src/matmul.jl:169; mul!
╎ 724 ...inearAlgebra/src/matmul.jl:597; gemm_wrapper!(::Array{Float64,2}, ::Char...
╎ 724 ...LinearAlgebra/src/blas.jl:1167; gemm!(::Char, ::Char, ::Float64, ::Arra...
╎ 724 ...ib/julia/libopenblas64_.so:?; dgemm_64_
╎ 724 ...b/julia/libopenblas64_.so:?; dgemm_thread_nn
╎ ╎ 724 ...b/julia/libopenblas64_.so:?; gemm_driver
╎ ╎ 724 ...b/julia/libopenblas64_.so:?; exec_blas
╎ ╎ 23 .../julia/libopenblas64_.so:?; exec_blas_async_wait
╎ ╎ 21 ...6_64-linux-gnu/libc.so.6:?; __sched_yield
╎ ╎ 1 ...inux-gnu/libpthread.so.0:?; pthread_mutex_lock
╎ ╎ 1 ...inux-gnu/libpthread.so.0:?; pthread_mutex_unlock
╎ ╎ 701 .../julia/libopenblas64_.so:?; inner_thread
╎ ╎ 2 .../julia/libopenblas64_.so:?;
╎ ╎ 61 .../julia/libopenblas64_.so:?; dgemm_beta_HASWELL
╎ ╎ 70 .../julia/libopenblas64_.so:?; dgemm_itcopy_HASWELL
╎ ╎ 65 .../julia/libopenblas64_.so:?; dgemm_oncopy_HASWELL
╎ ╎ 494 ...6_64-linux-gnu/libc.so.6:?; __sched_yield
Is there a reason why this would happen, or should I consider this a bug and file a report?