The data set i’m processing is organized in chunks, and each chunk is variable length, and i have to filter over all the chunks and create a set of data for several categories. i noticed that my program was starting to take a LONG time. i was appending results into an array and was of course suspicious that was the problem(hint: it was).
I decided to write a quick microbenchmark and I thought the results would be of interest to my fellow Julia users. The obvious question is: should you
- append (no ! lol)
- push!
- pre-allocate an array i know is big enough and then just return a count of the number of items put in that array
quick summary of results:
for 1,000,000 elements total (order of results is append, push!, allocated)
41.963682 seconds (1.23 M allocations: 37.291 GiB, 25.72% gc time)
Any[-0.334331, 0.943863, 0.111534, -2.54138, 0.572677]
0.095106 seconds (1.02 M allocations: 25.614 MiB, 39.89% gc time)
Any[-0.334331, 0.943863, 0.111534, -2.54138, 0.572677]
0.020536 seconds (24.78 k allocations: 9.025 MiB)
[-0.334331, 0.943863, 0.111534, -2.54138, 0.572677]
now the results for 100,000 elements total:
0.575038 seconds (292.42 k allocations: 394.206 MiB, 22.88% gc time)
Any[-1.40151, 1.02218, -0.947789, 1.10541, 0.432075]
0.030933 seconds (124.86 k allocations: 4.881 MiB, 7.59% gc time)
Any[-1.40151, 1.02218, -0.947789, 1.10541, 0.432075]
0.021361 seconds (24.78 k allocations: 2.158 MiB)
[-1.40151, 1.02218, -0.947789, 1.10541, 0.432075]
Look at that append time and how much it drops ! Very interesting. I’m assuming that’s a classic case of “not everything fits into cache anymore”. The other noticeable result is the (kind-of) low penalty for using push!. yeah, i know it’s no really low, after all it’s a factor of 4 in the 10^6 case, but since the total time is 0.1s , who cares ?
Needless to say i changed my code over to push! and it runs much faster now
edit: quick note about pyplot. do NOT invoke repeatedly invoke scatter multiple times on small chunks of x,y data. save it all up and plot it all at once. that was the other thing that was killing the run time of my code.
The benchmark code (for 10^6 elements):
function test1(y)
x = []
for i=1:10000
# same as x= vcat(x, randn(100))
x = [x ; y[(i-1)*100+1:i*100]]
end
x
end
function test2(y)
x = []
for i=1:1000000
push!(x, y[i])
end
x
end
function test3(y)
x = Array{Float64,1}(undef, 1000000)
for i=1:1000000
x[i] = y[i]
end
x
end
y = randn(1000000)
x =@time(test1(y))
println(x[1:5])
x =@time(test2(y))
println(x[1:5])
x =@time(test3(y))
println(x[1:5])