Okay, it’s actually quite weird, I cannot reproduce the problem with straight code, but it somehow happens when I redefine functions (which I was also doing in my own code). Here’s the code I’m running, though I’m not sure anymore which exact part is essential for the problem:
using SparseArrays
const state = rand(1000)
const sparse_matrix = sprand(1000, 1000, 0.001)
function loop(func, args)
loopidx=1
ti = time_ns()
for _ in 1:10000000
@inline choose_j(func, args)
loopidx += 1
end
tf = time_ns()
println("The normal loop took $(tf-ti) ns")
println("Updates per sec: $(loopidx / (tf-ti) * 1e9)")
end
function loopkw(func, args)
loopidx=1
ti = time_ns()
for _ in 1:10000000
@inline choose_j_kw(func, args)
loopidx += 1
end
tf = time_ns()
println("The kw loop took $(tf-ti) ns")
println("Updates per sec: $(loopidx / (tf-ti) * 1e9)")
end
function collectargs(args, j)
(;state, sparse_matrix) = args
cumsum = zero(Float64)
for ptr in nzrange(sparse_matrix, j)
smij = sparse_matrix.nzval[ptr]
i = sparse_matrix.rowval[ptr]
cumsum += state[i] * smij
end
return cumsum*(2*state[j])
end
function collectargskw(args; j)
(;state, sparse_matrix) = args
cumsum = zero(Float64)
for ptr in nzrange(sparse_matrix, j)
smij = sparse_matrix.nzval[ptr]
i = sparse_matrix.rowval[ptr]
cumsum += state[i] * smij
end
return cumsum*(2*state[j])
end
function indirection(@specialize(func), args, j)
@inline func(args, j)
end
function indirection_kw(@specialize(func), args; j)
@inline func(args; j)
end
function choose_j(@specialize(func), args)
j = rand(1:1000)
@inline func(args, j)
# @inline indirection(func, args, j)
end
function choose_j_kw(@specialize(func), args)
j = rand(1:1000)
@inline func(args; j)
# @inline indirection_kw(func, args; j)
end
loop(collectargs, (;state, sparse_matrix))
loopkw(collectargskw, (;state, sparse_matrix))
Straight up running the code gives the following output:
The normal loop took 469143584 ns
Updates per sec: 2.1315438047214136e7
The kw loop took 394801708 ns
Updates per sec: 2.5329173601244908e7
On first run, the one with keyword arguments is actually always faster (why??).
Then, commenting out func
and commenting in the indirection functions, gives:
choose_j (generic function with 1 method)
choose_j_kw (generic function with 1 method)
The normal loop took 434693500 ns
Updates per sec: 2.300471711677308e7
The kw loop took 754985083 ns
Updates per sec: 1.3245296132559482e7
Then, reverting the code back to the original code:
choose_j (generic function with 1 method)
choose_j_kw (generic function with 1 method)
The normal loop took 583411750 ns
Updates per sec: 1.7140554676864155e7
The kw loop took 398809166 ns
Updates per sec: 2.5074651869962286e7
What is going on here? The relative performance differences stay virtually constant as long as I don’t redefine the functions.
This is actually not the exact problem I had in my code, but I’m guessing it has a similar origin.