Speed gains are visible in single-threaded workloads also. Seemingly they are not purely due to GC
pauses - the timings are similar with GC
disabled. Here is an example of copy and sum a vector
. Having numpy managing memory gives a good speedup on both x86 and Mac, and for different (big-ish) vector size.
Timings:
julia +release --project=. -t 1 copyadd_time.jl
0.256724 seconds (64.30 k allocations: 79.529 MiB, 53.59% gc time, 17.37% compilation time) # compile
2.745950 seconds (4.35 M allocations: 218.689 MiB, 3.69% gc time, 97.00% compilation time) # compile
0.054762 seconds (46 allocations: 1.367 KiB) # numpy/pyarray
0.074416 seconds (4 allocations: 76.294 MiB) # native julia
Code:
ENV["JULIA_CONDAPKG_BACKEND"] = "Null" # use system-wide python installation
# otherwise install numpy for this PythonCall environment:
# ] add CondaPkg; using CondaPkg; ] conda add numpy
using PythonCall
using Random
np = pyimport("numpy")
Random.seed!(42)
function copy_jl(arr)
sum(copy(arr))
end
function copy_np(arr)
pymem = np.empty(length(arr))
pyarr = PyArray(pymem)
pyarr .= arr
ans = sum(pyarr)
return ans
end
arr = rand(10_000_000)
@time copy_jl(arr)
@time copy_np(arr)
GC.gc()
GC.enable(false) # doesn't matter
@time copy_np(arr)
@time copy_jl(arr)