So here is some code that you can run (after installing the necessary packages):
MWE
using StaticArrays, Static, BenchmarkTools, Random
import NLLSsolver
# Description of BAL image, and function to transform a landmark from world coordinates to pixel coordinates
struct BALImage{T}
pose::NLLSsolver.EffPose3D{T} # Extrinsic parameters (rotation & translation)
sensor::NLLSsolver.SimpleCamera{T} # Intrinsic sensor parameters (just a single focal length)
lens::NLLSsolver.BarrelDistortion{T} # Intrinsic lens parameters (k1 & k2 barrel distortion)
end
NLLSsolver.nvars(::BALImage) = static(9) # 9 DoF in total for the image (6 extrinsic, 3 intrinsic)
function NLLSsolver.update(var::BALImage, updatevec, start=1)
return BALImage(NLLSsolver.update(var.pose, updatevec, start),
NLLSsolver.update(var.sensor, updatevec, start+6),
NLLSsolver.update(var.lens, updatevec, start+7))
end
function BALImage(rx::T, ry::T, rz::T, tx::T, ty::T, tz::T, f::T, k1::T, k2::T) where T<:Real
return BALImage{T}(NLLSsolver.EffPose3D(NLLSsolver.Pose3D(rx, ry, rz, tx, ty, tz)),
NLLSsolver.SimpleCamera(f),
NLLSsolver.BarrelDistortion(k1, k2))
end
BALImage(v) = BALImage(v[1], v[2], v[3], v[4], v[5], v[6], v[7], v[8], v[9])
transform(im::BALImage, X::NLLSsolver.Point3D) = NLLSsolver.ideal2image(im.sensor, NLLSsolver.ideal2distorted(im.lens, -NLLSsolver.project(im.pose * X)))
# Residual that defines the reprojection error of a BAL measurement
struct BALResidual{T} <: NLLSsolver.AbstractResidual
measurement::SVector{2, T} # Landmark image coordinates (x, y)
varind::SVector{2, Int} # Variable indices for the residual (image, landmark)
end
BALResidual(m, v) = BALResidual(SVector{2}(m[1], m[2]), SVector{2, Int}(v[1], v[2]))
NLLSsolver.ndeps(::BALResidual) = static(2) # Residual depends on 2 variables
NLLSsolver.nres(::BALResidual) = static(2) # Residual vector has length 2
NLLSsolver.varindices(res::BALResidual) = res.varind
NLLSsolver.getvars(res::BALResidual{T}, vars::Vector) where T = vars[res.varind[1]]::BALImage{T}, vars[res.varind[2]]::NLLSsolver.Point3D{T}
NLLSsolver.computeresidual(res::BALResidual, im::BALImage, X::NLLSsolver.Point3D) = transform(im, X) - res.measurement
const balrobustifier = NLLSsolver.HuberKernel(2.)
NLLSsolver.robustkernel(::BALResidual) = balrobustifier
Base.eltype(::BALResidual{T}) where T = T
function makeBALproblem()
Random.seed!(1)
problem = NLLSsolver.NLLSProblem(Union{BALImage{Float64}, NLLSsolver.Point3D{Float64}}, BALResidual{Float64})
numcameras = 16
for cam in 1:numcameras
NLLSsolver.addvariable!(problem, BALImage(0., 0., 0., 0., 0., 0., 1., 0., 0.))
end
for lm in 1:22106
NLLSsolver.addvariable!(problem, NLLSsolver.Point3D(0., 0., 1.))
end
for cam in 1:numcameras
for lm in 1:22106
if rand() < 0.25
NLLSsolver.addcost!(problem, BALResidual(SVector(0., 0.), SVector(cam, lm + numcameras)))
end
end
end
return problem
end
struct MyData
vars::Vector{Union{BALImage{Float64}, NLLSsolver.Point3D{Float64}}}
costs::NLLSsolver.VectorRepo{BALResidual{Float64}}
linsystem::NLLSsolver.MultiVariateLSsparse
end
MyData(problem) = MyData(problem.variables, problem.costs, NLLSsolver.makesymmvls(problem, trues(length(problem.variables)), length(problem.variables)))
MyData() = MyData(makeBALproblem())
function evalfunc(data)
NLLSsolver.zero!(data.linsystem)
NLLSsolver.costgradhess!(data.linsystem, data.vars, data.costs)
end
# @benchmark evalfunc(data) setup=(data=MyData()) seconds=15
data = MyData()
@benchmark evalfunc($data)
I put this code in a file, then re-run all the code in the file several times, within the same Julia REPL instance (Iβm just hitting the play button in VS Code). Here is the output of the first 5 runs, directly after starting the Julia REPL:
Benchmark results
BenchmarkTools.Trial: 110 samples with 1 evaluation.
Range (min β¦ max): 44.782 ms β¦ 47.563 ms β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 45.285 ms β GC (median): 0.00%
Time (mean Β± Ο): 45.569 ms Β± 656.759 ΞΌs β GC (mean Β± Ο): 0.00% Β± 0.00%
β ββ β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
44.8 ms Histogram: frequency by time 47.5 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
BenchmarkTools.Trial: 106 samples with 1 evaluation.
Range (min β¦ max): 46.271 ms β¦ 50.480 ms β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 46.974 ms β GC (median): 0.00%
Time (mean Β± Ο): 47.254 ms Β± 797.376 ΞΌs β GC (mean Β± Ο): 0.00% Β± 0.00%
β β
β
ββ ββ
β
β
ββββββββββββββββββ
βββββββββ
βββββββββββββ
βββ
ββ
β
ββ
ββββββββββ β
46.3 ms Histogram: frequency by time 49.5 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
BenchmarkTools.Trial: 116 samples with 1 evaluation.
Range (min β¦ max): 42.548 ms β¦ 45.545 ms β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 43.201 ms β GC (median): 0.00%
Time (mean Β± Ο): 43.205 ms Β± 316.532 ΞΌs β GC (mean Β± Ο): 0.00% Β± 0.00%
ββ β ββ β β βββββββββ ββ β β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
42.5 ms Histogram: frequency by time 43.6 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
BenchmarkTools.Trial: 108 samples with 1 evaluation.
Range (min β¦ max): 45.543 ms β¦ 48.077 ms β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 46.428 ms β GC (median): 0.00%
Time (mean Β± Ο): 46.441 ms Β± 303.921 ΞΌs β GC (mean Β± Ο): 0.00% Β± 0.00%
β β ββββ βββββββββ β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
45.5 ms Histogram: frequency by time 47.2 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
BenchmarkTools.Trial: 92 samples with 1 evaluation.
Range (min β¦ max): 53.781 ms β¦ 58.077 ms β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 54.468 ms β GC (median): 0.00%
Time (mean Β± Ο): 54.681 ms Β± 842.132 ΞΌs β GC (mean Β± Ο): 0.00% Β± 0.00%
ββ
β β β β
βββββββ
ββββββββββββ
βββββββββββββββββββββββ
ββββββββββββββββββ β
53.8 ms Histogram: frequency by time 57.4 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
Please note that the effect is a function of my CPU (Apple M1 Pro), my OS (MacOS), and the current memory layout of my entire system at the time the benchmark was run. There is nothing reliable about this MWE. Sometimes I donβt see any difference. I have zero expectation that you will see the differences I see, or that this MWE will be of any real value as a result.
What I have done is email Prof. Berger and asked if he has any simple code that pathologically demonstrates the problem. It should be possible to write such code. This code will not be normal code. The memory layout will not be a normal layout.
In fact, one MWE is just one sample of code pattern and memory layout in an infinite space of such patterns. It cannot tell you how susceptible the patterns you use are to this problem. It is of no practical use beyond showing that a benchmark tool is invariant to this issue.
Thatβs why I think this isnβt of much use. Anyway, here you go!
@mbauman I also ran the benchmark using the setup block to regenerate the data every evaluation (commented out near the bottom). Here is the first output:
BenchmarkTools.Trial: 123 samples with 1 evaluation.
Range (min β¦ max): 55.301 ms β¦ 61.543 ms β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 56.650 ms β GC (median): 0.00%
Time (mean Β± Ο): 56.956 ms Β± 1.053 ms β GC (mean Β± Ο): 0.00% Β± 0.00%
βββ β βββ βββ β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
55.3 ms Histogram: frequency by time 59.5 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
Youβll note that it doesnβt overlap all the previous histograms, so it doesnβt solve the problem.
There is one small const global variable used, so perhaps changing the address of that is what yields the change.
UPDATE: I realised that some of the data in my MWE changes each run. I made a minor change to fix this, and confirmed that I still see the effect weβre discussing.