How to understand reasons for some unexpected heap allocations

I’m really surprised by some heap allocations being made within a package I maintain. I don’t expect them because:

  1. I’ve use Cthulhu to ensure that all variables in the code are concretely typed.
  2. The variables in question are all fairly small (192 bytes max!), including one Int64.
  3. Some of the variables are mutable structs, but none of these escape the function in which they’re created.
  4. Several of the variables are immutable return values, and annotating the return type of the relevant functions makes no difference; they’re concretely typed anyway.

As well as using Cthulhu, I’ve used the allocation profiler to see where and what the allocations are, but given what I said above, I’m unable to understand why the allocations are occurring (in Julia Version 1.12.6). It’s my belief that I should be able to avoid them, and that the code should (in principle) require no heap allocations at all.

Here is a screenshot of the allocations being made:


Does anyone have any further tips for understanding the cause of such allocations?

If anyone wants to recreate the problem, you can do so with the following code, which should (using latest NLLSsolver, v4.0.5) produce the same graph as above:

using NLLSsolver, Random, Static, StaticArrays, LinearAlgebra, Profile, PProf

# Simple affine projection transform
NLLSsolver.generatemeasurement(pose::EuclideanVector{6, T}, X::EuclideanVector{3, U}) where {T, U} = SVector(dot(@inbounds(view(pose, NLLSsolver.SR(1, 3))), X), dot(@inbounds(view(pose, NLLSsolver.SR(4, 6))), X))
MyResType = SimpleError2{2, Float64, EuclideanVector{6, Float64}, EuclideanVector{3, Float64}}

function create_ba_problem(ncameras, nlandmarks, propvisible)
    problem = NLLSProblem(Union{EuclideanVector{6, Float64}, EuclideanVector{3, Float64}}, MyResType)

    # Generate the cameras on a unit sphere, pointing to the origin
    camoffset = SVector(1.0, 0.0, 0.0, 0.0, 1.0, 0.0)
    for i = 1:ncameras
        addvariable!(problem, randn(EuclideanVector{6, Float64}) .+ camoffset)
    end
    
    # Generate the landmarks in a unit cube centered on the origin
    lmoffset = SVector(-0.5, -0.5, 10.0)
    for i = 1:nlandmarks
        addvariable!(problem, rand(EuclideanVector{3, Float64}) .+ lmoffset)
    end

    # Generate the measurements
    visibility = abs.(repeat(vec(1:ncameras), outer=(1, nlandmarks)) .- LinRange(2, ncameras-1, nlandmarks)')
    visibility = visibility .<= sort(vec(visibility))[Int(ceil(length(visibility)*propvisible))]
    for camind = 1:ncameras
        camera = problem.variables[camind]::EuclideanVector{6, Float64}
        for (landmark, tf) in enumerate(view(visibility, camind, :)')
            if tf
                landmarkind = landmark + ncameras
                addcost!(problem, SimpleError2{EuclideanVector{6, Float64}, EuclideanVector{3, Float64}}(generatemeasurement(camera, problem.variables[landmarkind]::EuclideanVector{3, Float64}), camind, landmarkind))
            end
        end
    end

    # Return the NLLSProblem
    return problem
end

function perturb_ba_problem(problem, pointnoise, posenoise)
    for ind in 1:lastindex(problem.variables)
        if isa(problem.variables[ind], EuclideanVector{3, Float64})
            problem.variables[ind]::EuclideanVector{3, Float64} += randn(SVector{3, Float64}) * pointnoise
        else
            problem.variables[ind]::EuclideanVector{6, Float64} += randn(SVector{6, Float64}) * posenoise
        end
    end
    return problem
end

Random.seed!(1)
problem = create_ba_problem(3, 1, 1.0)
problem = perturb_ba_problem(problem, 0.003, 0.0)   
options = NLLSOptions(numthreads = static(1))
function myfun(problem, options)
    result = optimize!(problem, options, 4)
    problem = perturb_ba_problem(problem, 0.003, 0.0)
    Profile.clear_malloc_data()
    Profile.Allocs.clear()
    Profile.Allocs.@profile sample_rate=1.0 optimize!(problem, options, 4)
end
myfun(problem, options)
PProf.Allocs.pprof(from_c = false)

I’d be interested in tooling that exposes escape analysis because proving bounded lifetimes is apparently not as simple as what we intend. AllocCheck.jl is nice but it doesn’t say much.

In my experience, one also needs that all functions that are called with the mutable struct as an argument are inlined. Do you have that?

EDIT: Also, are you sure that no mutable struct escapes? For example, one MVector{3,Float64} is created in line @NLLSsolver/src/linearsystem.jl:71 as part of the struct UniVariateLSstatic_x{3} and then returned to line @NLLSsolver/src/linearsystem.jl:25. Ditto for the other one and the MMatrix{3,3,Float64,9}, both of which are created in line @NLLSsolver/src/linearsystem.jl:63 and returned to @NLLSsolver/src/linearsystem.jl:25.

JET.jl reports a possible type error exactly at the location of at least some of the allocations (@NLLSsolver/src/optimize.jl:117)

julia> using JET, NLLSsolver

julia> @report_call optimize!(problem, options, 4)
[ Info: tracking NLLSsolver
[ Info: tracking Base
═════ 1 possible error found ═════
┌ optimize!(problem::NLLSProblem{…}, options::NLLSOptions{…}, unfixed::Int64) @ NLLSsolver /mnt/data/git/NLLSsolver.jl/src/optimize.jl:2
│┌ optimize!(problem::NLLSProblem{…}, options::NLLSOptions{…}, unfixed::Int64, callback::typeof(nullcallback)) @ NLLSsolver /mnt/data/git/NLLSsolver.jl/src/optimize.jl:2
││┌ setupsinglevarls(func::typeof(NLLSsolver.optimizeinternal!), problem::NLLSProblem{…}, options::NLLSOptions{…}, unfixed::Int64, startstats::NLLSsolver.Stats, trailingargs::typeof(nullcallback)) @ NLLSsolver /mnt/data/git/NLLSsolver.jl/src/optimize.jl:108
│││┌ setupstaticvarls(func::typeof(NLLSsolver.optimizeinternal!), problem::NLLSProblem{…}, options::NLLSOptions{…}, unfixed::Int64, startstats::NLLSsolver.Stats, varlen::StaticInt{…}, trailingargs::Tuple{…}) @ NLLSsolver /mnt/data/git/NLLSsolver.jl/src/optimize.jl:117
││││┌ optimizeinternal!(problem::NLLSProblem{…}, options::NLLSOptions{…}, data::NLLSsolver.NLLSInternal{…}, iteratedata::NLLSsolver.LevMarData, callback::typeof(nullcallback)) @ NLLSsolver /mnt/data/git/NLLSsolver.jl/src/optimize.jl:228
│││││┌ optimizeloop!(problem::NLLSProblem{…}, options::NLLSOptions{…}, data::NLLSsolver.NLLSInternal{…}, iteratedata::NLLSsolver.LevMarData, callback::typeof(nullcallback)) @ NLLSsolver /mnt/data/git/NLLSsolver.jl/src/optimize.jl:154
││││││┌ costgradhess!(linsystem::NLLSsolver.LinearSystem{…}, vars::Vector{…}, costs::NLLSsolver.VectorRepo{…}) @ NLLSsolver /mnt/data/git/NLLSsolver.jl/src/cost.jl:55
│││││││┌ kwcall(::@NamedTuple{…}, ::typeof(sum), fun::NLLSsolver.Bind{…}, vr::NLLSsolver.VectorRepo{…}) @ NLLSsolver /mnt/data/git/NLLSsolver.jl/src/VectorRepo.jl:54
││││││││┌ sum(fun::NLLSsolver.Bind{…}, vr::NLLSsolver.VectorRepo{…}; kw::@Kwargs{…}) @ NLLSsolver /mnt/data/git/NLLSsolver.jl/src/VectorRepo.jl:54
│││││││││┌ vrsum(fun::NLLSsolver.Bind{…}, vr::NLLSsolver.VectorRepo{…}, kw::@Kwargs{…}, T::Type{…}) @ NLLSsolver /mnt/data/git/NLLSsolver.jl/src/VectorRepo.jl:56
││││││││││┌ vrsum(fun::NLLSsolver.Bind{…}, kw::@Kwargs{…}, v::Vector{…}) @ NLLSsolver /mnt/data/git/NLLSsolver.jl/src/VectorRepo.jl:52
│││││││││││┌ kwcall(::@NamedTuple{…}, ::typeof(sum), f::NLLSsolver.Bind{…}, a::Vector{…}) @ Base ./reducedim.jl:980
││││││││││││┌ sum(f::NLLSsolver.Bind{…}, a::Vector{…}; dims::Colon, kw::@Kwargs{…}) @ Base ./reducedim.jl:980
│││││││││││││┌ kwcall(::@NamedTuple{…}, ::typeof(Base._sum), f::NLLSsolver.Bind{…}, a::Vector{…}, ::Colon) @ Base ./reducedim.jl:984
││││││││││││││┌ _sum(f::NLLSsolver.Bind{…}, a::Vector{…}, ::Colon; kw::@Kwargs{…}) @ Base ./reducedim.jl:984
│││││││││││││││┌ kwcall(::@NamedTuple{…}, ::typeof(mapreduce), f::NLLSsolver.Bind{…}, op::typeof(Base.add_sum), A::Vector{…}) @ Base ./reducedim.jl:326
││││││││││││││││┌ mapreduce(f::NLLSsolver.Bind{…}, op::typeof(Base.add_sum), A::Vector{…}; dims::Colon, init::Float64) @ Base ./reducedim.jl:326
│││││││││││││││││┌ _mapreduce_dim(f::NLLSsolver.Bind{…}, op::typeof(Base.add_sum), nt::Float64, A::Vector{…}, ::Colon) @ Base ./reducedim.jl:331
││││││││││││││││││┌ mapfoldl_impl(f::NLLSsolver.Bind{…}, op::typeof(Base.add_sum), nt::Float64, itr::Vector{…}) @ Base ./reduce.jl:36
│││││││││││││││││││┌ foldl_impl(op::Base.MappingRF{NLLSsolver.Bind{…}, Base.BottomRF{…}}, nt::Float64, itr::Vector{SimpleError2{…}}) @ Base ./reduce.jl:40
││││││││││││││││││││┌ _foldl_impl(op::Base.MappingRF{NLLSsolver.Bind{…}, Base.BottomRF{…}}, init::Float64, itr::Vector{SimpleError2{…}}) @ Base ./reduce.jl:50
│││││││││││││││││││││┌ (::Base.MappingRF{…})(acc::Float64, x::SimpleError2{…}) @ Base ./reduce.jl:92
││││││││││││││││││││││┌ (::NLLSsolver.Bind{typeof(NLLSsolver.costgradhess!), Tuple{…}})(args::SimpleError2{2, Float64, SVector{…}, SVector{…}}) @ NLLSsolver /mnt/data/git/NLLSsolver.jl/src/utils.jl:20
│││││││││││││││││││││││┌ costgradhess!(linsystem::NLLSsolver.LinearSystem{…}, vars::Vector{…}, cost::SimpleError2{…}) @ NLLSsolver /mnt/data/git/NLLSsolver.jl/src/cost.jl:52
││││││││││││││││││││││││┌ gradhesshelper!(linsystem::NLLSsolver.LinearSystem{…}, cost::SimpleError2{…}, vars::Tuple{…}, blockind::SVector{…}, varflags::Int64) @ NLLSsolver /mnt/data/git/NLLSsolver.jl/src/cost.jl:37
│││││││││││││││││││││││││┌ valuedispatch(lower::StaticInt{…}, upper::StaticInt{…}, val::Int64, fun::NLLSsolver.Bind{…}) @ NLLSsolver /mnt/data/git/NLLSsolver.jl/src/utils.jl:9
││││││││││││││││││││││││││┌ valuedispatch(lower::StaticInt{…}, upper::StaticInt{…}, val::Int64, fun::NLLSsolver.Bind{…}) @ NLLSsolver /mnt/data/git/NLLSsolver.jl/src/utils.jl:9
│││││││││││││││││││││││││││┌ valuedispatch(lower::StaticInt{…}, upper::StaticInt{…}, val::Int64, fun::NLLSsolver.Bind{…}) @ NLLSsolver /mnt/data/git/NLLSsolver.jl/src/utils.jl:5
││││││││││││││││││││││││││││┌ (::NLLSsolver.Bind{typeof(NLLSsolver.gradhesshelper!), Tuple{…}})(args::StaticInt{0}) @ NLLSsolver /mnt/data/git/NLLSsolver.jl/src/utils.jl:20
│││││││││││││││││││││││││││││┌ gradhesshelper!(linsystem::NLLSsolver.LinearSystem{…}, cost::SimpleError2{…}, vars::Tuple{…}, blockind::SVector{…}, varflags::StaticInt{…}) @ NLLSsolver /mnt/data/git/NLLSsolver.jl/src/cost.jl:25
││││││││││││││││││││││││││││││┌ computecost(residual::SimpleError2{2, Float64, SVector{…}, SVector{…}}, vars::Tuple{SVector{…}, SVector{…}}) @ NLLSsolver /mnt/data/git/NLLSsolver.jl/src/residual.jl:44
│││││││││││││││││││││││││││││││┌ computerescost(residual::SimpleError2{2, Float64, SVector{…}, SVector{…}}, kernel::NoRobust, vars::Tuple{Tuple{…}}) @ NLLSsolver /mnt/data/git/NLLSsolver.jl/src/residual.jl:51
││││││││││││││││││││││││││││││││ no matching method found `computeresidual(::SimpleError2{2, Float64, SVector{6, Float64}, SVector{3, Float64}}, ::Tuple{SVector{6, Float64}, SVector{3, Float64}})`: r = computeresidual(tuple(residual::SimpleError2{2, Float64, SVector{6, Float64}, SVector{3, Float64}})::Tuple{SimpleError2{2, Float64, SVector{6, Float64}, SVector{3, Float64}}}, vars::Tuple{Tuple{SVector{6, Float64}, SVector{3, Float64}}}...)
│││││││││││││││││││││││││││││││└────────────────────