Man, this is a powerful demonstration of how important it is to store the dimensions in a type rather than have it be determined at runtime. Since your code has a couple of different optimizations (including the mutating rand
, which I hadn’t thought of), I decided to go from my code to yours step by step. I also added a line inside each function to fix the seed so that I can make sure all of them produce the same results (they do).
using Interpolations, BenchmarkTools, Random
f(a,b) = a^2 + b^2
const xs = range(0,5,length=100)
const ys = range(-3,3,length=200)
const vals = [f(x,y) for x in xs, y in ys]
const fct = LinearInterpolation( (xs,ys), vals)
const Npt = 10_000;
pts = [rand(1,Npt); zeros(1,Npt)]
function evaluate!(pts,fct)
Random.seed!(100)
Niter = 1_000
nd = size(pts,1)
result = Matrix{Float64}(undef,size(pts,2),Niter)
@inbounds for j in 1:Niter
pts[2,:] .= rand(Npt)*6 .- 3
for p in axes(pts,2)
result[p,j] = fct(pts[:,p]...)
#result[p,j] = fct(pts[1,p],pts[2,p])
end
end
return result
end
function evaluate2!(pts::NTuple{N,<:AbstractVector},fct) where {N}
Random.seed!(100)
Niter = 1_000
Npt=length(pts[2])
result = Matrix{Float64}(undef,Npt,Niter)
@inbounds for j in 1:Niter
pts[2] .= 6rand(Npt) .- 3
for p in 1:Npt
result[p,j] = fct(ntuple(n -> pts[n][p], Val(N))...)
end
end
return result
end
function evaluate3!(pts::NTuple{N,<:AbstractVector},fct) where {N}
Random.seed!(100)
Niter = 1_000
pts2=pts[2]
result = Matrix{Float64}(undef,length(pts2),Niter)
@inbounds for j in 1:Niter
rand!(pts2)
for p in axes(pts2,1)
pts2[p] = 6pts2[p] - 3
result[p,j] = fct(ntuple(n -> pts[n][p], Val(N))...)
end
end
return result
end
function evaluateElrod!(fct, pts::Vararg{<:AbstractVector,N}) where {N}
Random.seed!(100)
Niter = 1_000
pts2 = pts[2]
result = Matrix{Float64}(undef,length(pts2),Niter)
@inbounds for j in 1:Niter
rand!(pts2)
# Need this loop b/c unlike Matlab fct here requires scalar args
for p in axes(pts2,1)
pts2[p] = 6pts2[p] - 3
result[p,j] = fct(ntuple(n -> pts[n][p], Val(N))...)
end
end
return result
end
-
evaluate!
is my original function.
-
evaluate2!
replaces the matrix with a tuple of vectors and splats a tuple instead of of a SubArray (btw, I also tried a vector of vectors instead of a tuple of vectors, and that was 10x slower than my original code, so I’m not even showing that result)
-
evaluate3!
is evaluate2!
but with the mutating rand()
-
evaluateElrod!
is @Elrod 's function, which is like evaluate3!
except it collects dimensions using Vararg
instead of just taking a tuple as an argument.
Here are the results:
julia> @benchmark evaluate!($pts,fct)
BenchmarkTools.Trial:
memory estimate: 1.56 GiB
allocs estimate: 40004004
--------------
minimum time: 1.961 s (6.56% GC)
median time: 2.023 s (15.61% GC)
mean time: 2.095 s (13.33% GC)
maximum time: 2.303 s (17.07% GC)
--------------
samples: 3
evals/sample: 1
julia> @benchmark evaluate2!(ntuple(n->$pts[n,:],2),fct)
@show benchres3
benchres3 = Trial(391.896 ms)
BenchmarkTools.Trial:
memory estimate: 229.19 MiB
allocs estimate: 4010
--------------
minimum time: 391.896 ms (3.32% GC)
median time: 396.804 ms (3.35% GC)
mean time: 414.187 ms (7.33% GC)
maximum time: 586.854 ms (36.01% GC)
--------------
samples: 13
evals/sample: 1
julia> @benchmark evaluate3!(ntuple(n->$pts[n,:],2),fct)
BenchmarkTools.Trial:
memory estimate: 76.45 MiB
allocs estimate: 10
--------------
minimum time: 337.228 ms (0.00% GC)
median time: 350.747 ms (1.46% GC)
mean time: 361.871 ms (5.70% GC)
maximum time: 460.437 ms (26.81% GC)
--------------
samples: 14
evals/sample: 1
julia> @benchmark evaluateElrod!(fct,$pts[1,:],$pts[2,:])
BenchmarkTools.Trial:
memory estimate: 76.45 MiB
allocs estimate: 8
--------------
minimum time: 342.385 ms (0.18% GC)
median time: 352.621 ms (1.47% GC)
mean time: 366.353 ms (5.62% GC)
maximum time: 468.002 ms (26.52% GC)
--------------
samples: 14
evals/sample: 1
The extra allocations in my benchmark of @Elrod 's code relative to his are due to (1) slicing pts
inside the benchmark as opposed to outside (4 allocs), and (2) fixing the seed (2 allocs). These extra allocations do not affect the speed by more than a few milliseconds i.e. within a 95% CI of the mean.
Upshot: by far the most important improvement is from evaluate!
to evaluate2!
, where the matrix is replaced with the tuple of vectors. This is all about N
being in the type declaration. As I mentioned, if I replace the tuple with a vector of vectors and Val(N)
with Val(ND)
where ND = length(pts)
somewhere outside the loops, that takes 20 seconds! But given N
in the type, it doesn’t matter much whether there are variable number of arguments or a tuple passed in. I prefer the tuple because it allows the code calling the function to also be agnostic about the number of dimensions, without splatting.
What would be really nice is if there was some cousin of StaticArray
that would allow you to “type” all but one dimension. E.g. for an MMatrix
, you could say that it will have 5 columns and an unspecified number of rows. Otherwise, I have to keep doing stuff like ntuple(n->$pts[n,:],2)
, which is kinda boilerplate.
Btw, all of these are still marginally slower than MATLAB. I guess the griddedInterpolant
object is just crazy fast. It’s definitely not written in MATLAB itself.