The CPU history graph is very interestingâŚDo you know what the troughâs are? Where those caused by sequential runs? Or did they occur during a single run?
If itâs a single run, then it might be some sort of synchronization issue like one of the basic_diff calls is taking significantly longer than the others so the threads mostly go idle waiting for that one call to complete.
Something I would try would be something like this:
function pdiff(threads::Int, b1::Array{Basic,1}, b2::Basic)
dfdx = Array{Basic}(undef,length(b1))
Threads.@threads for i = 0:threads-1
for j = i*threads:max((i+1)*threads, length(b1))
a = Basic()
ret = ccall((:basic_diff, libsymengine), Int, (Ref{Basic}, Ref{Basic}, Ref{Basic}), a, b1[j+1], b2)
dfdx[j+1] = a
end
end
return dfdx
end
Iâm not sure if the @threads breaks the array up in this way or not. But this will give each thread a âchunkâ of the array to process and any synchronization should happen at the end. The other thing to possibly try is:
function pdiff(threads::Int, b1::Array{Basic,1}, b2::Basic)
dfdx = Array{Basic}(undef,length(b1))
local offsets = Channel() do c
for i in 1:length(b1)
put!(c, i)
end
end
Threads.@threads for i = 1:threads
for j in offsets
a = Basic()
ret = ccall((:basic_diff, libsymengine), Int, (Ref{Basic}, Ref{Basic}, Ref{Basic}), a, b1[j], b2)
dfdx[j] = a
end
end
return dfdx
end
Please note: Both these methods are trying to âout thinkâ whatever @threads normally does to divide up the work which generally should be avoided. Also I havenât tested these so I might have messed up the logic and/or fat fingered the code.
The first one should work well if the length of b1 is evenly divisible by the number of threads, and each call to basic_diff takes on average the same amount of time.
The second one should work better if the times taken by basic_diff are wildly different and hopefully one of those long ones isnât at the end of the list. If the durations of basic_diff are wildly different you might also try reversing the order the numbers are placed in the channel, or even randomizing them.