OK, on my machine (MacBookPro, 2.7GHz), and making sure GC has been run and everything compiled, I got:
julia> @time df = DataFrame(idstr = rand([@sprintf "id%03d" k for k in 1:(N/K)], N)
, id = rand(1:K, N)
, val = rand(1:5,N))
14.233183 seconds (15.56 M allocations: 2.704 GiB, 38.10% gc time)
and using ‘rand([“id”*dec(k,3) for k in 1:(N÷K)]’ (note using integer division also!):
13.178580 seconds (3.01 M allocations: 2.397 GiB, 32.94% gc time)
(the times are pretty consistent, using @sprintf
makes the whole operation about 8% slower)
Just comparing the comprehension, the difference is much worse:
julia> f() = ["id"*dec(k,3) for k in 1:(N÷K)]
f (generic function with 1 method)
julia> g() = [@sprintf "id%03d" k for k in 1:(N/K)]
g (generic function with 1 method)
julia> gc(); gc(); @time f();
0.149172 seconds (3.00 M allocations: 129.700 MiB, 45.45% gc time)
julia> gc(); gc(); @time g();
2.519681 seconds (15.55 M allocations: 443.213 MiB, 74.81% gc time)
Almost 17x slower to use @sprintf
in the comprehension!
I got similar times as you did for the CSV.write
, it looks like this should be looked into (compilation times didn’t show up here, CSV is precompiled):
julia> @time CSV.write("df.csv", df);
211.388286 seconds (1.70 G allocations: 64.953 GiB, 15.65% gc time)