I am new to Julia and I have problems with the speed. What is the fastest way to replace rows in an array? Currently I iterate over any entry of a matrix and when the result of the associated calculation is nonzero, I store the position (row and col) and the value (aval) in an additional array that will serve for sparse matrix allocation at a later stage of the code.
The code here is a nonsense simplified example to get to the point. On my computer, the replacements in the if condition triple the computation time from around 0.04 seconds to 0.12 seconds. What am I doing wrong here?
Thanks in advance.
function testfun02(M)
nrow, ncol = size(M);
spidx = Array{Float64, 2}(undef, nrow, 3);
numit = 1;
for j02 = 1:ncol, j01 in 1:nrow
aval = M[j01,j02]*(j01+j02);
if aval > 0;
spidx[numit,1] = j01;
spidx[numit,2] = j02;
spidx[numit,3] = aval;
numit +=1;
end
end
return(spidx);
end
B = Matrix{Float64}(I, 10000, 10000);
@time testfun02(B);
function testfun02(M)
nrow, ncol = size(M);
spidx = Array{Float64, 2}(undef, nrow, 3);
numit = 1;
for j02 = 1:ncol, j01 in 1:nrow
aval = M[j01,j02]*(j01+j02)
end
return spidx
end
It takes just 0.04 seconds? If I were you, Iβd be surprised it took that long in the first place - after all, that function isnβt doing any real calculation that can be observed after the call.
Exactly. Of course, I know that it does not return anything interesting, I am more interested in the original function, but I wanted to dissect the problem and find out where it takes time. And for this purpose, he function seems reasonable. I checked the iteration order but I think it is fine.
Sorry, but what changed there? Seemed to me that the iteration order was correct initially already.
Something else that could improve the performance is to define spidx as (3,nrow), so the same column is accessed at each iteration for filling the arrays. But in that particular case, with those dimensions, the difference is small.
Youβre right - seems like there was some erronous testing on my part, I apologize!
I now also get the very fast speed with the first version (without the branch, of course). Not sure where the slowdown would come from, though Iβd expect the version with the conditional to be somewhat slow, as the branch and numit variable prevents SIMD (numit introduces a loop dependency on previous iterations).
I just checked and it looked like that Julia did not compile and therefore, there was no substantial speed gain. After restarting, my computer, this now works. With the stripped down version of Sukera or mine, I get the quick results.
However, there seems to be a substantial slowdown when I run the full if-condition. And this is what I still donβt understand (something cannot be efficiently compiled β¦)
There has to be something else going on here. ~70k allocations for a function that (from what youβve posted) doesnβt allocate other than the output array seems very weird to me. Can you post the full code youβre running? Iβve had to do e.g. using LinearAlgebra earlier because of the I youβre using.
Also, which version are you running? Please post the output of versioninfo():
julia> versioninfo()
Julia Version 1.7.0-beta3
Commit e76c9dad42 (2021-07-07 08:12 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-12.0.0 (ORCJIT, skylake)
Environment:
JULIA_PKG_SERVER =
JULIA_NUM_THREADS = 4
(Aside, you donβt need ; at the end of every line - that only suppresses automatic output in the REPL, it has no other effect on code and does nothing when the code is run from a file.)
I compared to an older setup where the number of threads were already set to 4. But this is not the issue here. I can easily adjust the numer of threads.
Since I get a much faster execution time than you do and my CPU is older & has less cache than yours does, I suspect the julia or LLVM version makes a difference. Please test with one of the binaries from the 1.7 beta, since that could be a major difference in the code generated.
You apparently run a benchmark with BenchmarkTools.@benchmark, but you donβt report its output, only the output of the @time macro, which is not part of BenchmarkTools.
Can you report the output from the @benchmark or @btime macro, and remember to interpolate the input argument, like this
BenchmarkTools.Trial: 53 samples with 1 evaluation.
Range (min β¦ max): 86.669 ms β¦ 136.134 ms β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 91.699 ms β GC (median): 0.00%
Time (mean Β± Ο): 95.738 ms Β± 12.138 ms β GC (mean Β± Ο): 0.00% Β± 0.00%
β β
ββββββββββββββββ βββ βββββββββββββββββββββββββββββββββββββββββ β
86.7 ms Histogram: frequency by time 133 ms <
Memory estimate: 2.35 MiB, allocs estimate: 68980.
Why are the numers so different from @time? Sorry, is probably a stupid question but I am new to Julia and this confuses me. Using @benchmark, I would suspect that there are no issues at all, right?
Sorry, the numbers are at the same order of magnitude⦠100 ms are 0.1 seconds. But most of the timing problem seemsto come from the substitution of the elements in the array in the if condition. And this, I find, is surprising.
I think the point is that if your function does work that is not observable from outside the function, then the compiler may decide to skip those processing steps, and finish much faster than you expect.
Therefore itβs important, even in benchmarks that arenβt supposed to do anything useful, that the work of the function is observable from the outside.