Julia callback not invoked by C library. Works manually (in script) but not in function

I’m developing a Julia wrapper for KIM-API called kim_api.jl and encountering a puzzling issue where my Julia callback function isn’t being invoked by the C library during computation.

So KIM-API is supposed to register a callback function as a void *, and call it whenever needed. Here idea was to write that function in Julia and provide pointer to it using ccall from julia. So essentially the call is supposed to look like:
Julia (ccall) → KIM-API C function (julia_fptr) → (*julia_fptr)()

Now,

  1. The callback registration succeeds
  2. C callbacks work fine
  3. The Julia callback works when called manually

but when my module functions try to do the same, nothing happens, no errors, no faults, it just never calls the julia function.

See in the script below, specially these 4 line:

    data_obj_ptr = pointer_from_objref(containers)
	fptr= kim_api.@cast_as_kim_neigh_fptr(kim_api.kim_neighbors_callback)
	set_callback_pointer!(args, kim_api.GetNeighborList, kim_api.c, fptr, data_obj_ptr)
	compute!(model, args)
using kim_api, StaticArrays

positions = [SVector(0.0, 0.0, 0.0), SVector(1.1, 1.0, 0.0)]
cell = [5.43 0.0 0.0; 0.0 5.43 0.0; 0.0 0.0 5.43]
pbc = [true, true, true]
species = ["Si", "Si"]

is_manual = false

if length(ARGS) > 0 && ARGS[1] == "manual"
    is_manual = true
end

if is_manual
	units_in = kim_api.get_lammps_style_units(:metal)
	model, accepted = create_model(kim_api.Numbering(0), units_in...,"SW_StillingerWeber_1985_Si__MO_405512056662_006")
	args = create_compute_arguments(model)
	n_cutoffs, cutoffs, will_not_request_ghost_neigh = get_neighbor_list_pointers(model)
	species_support_map = get_species_map_closure(model)
	containers, all_positions, all_species, contributing, atom_indices = 
						create_kim_neighborlists(positions, cutoffs, species;
						cell=cell, pbc=pbc, 
						will_not_request_ghost_neigh=will_not_request_ghost_neigh)
	particle_species_codes = species_support_map(all_species)
	coords = reduce(hcat, all_positions) # Vector{SVector} to Matrix{Cdouble/Float64}
	n = size(coords, 2)
	n_ref = Ref{Cint}(n)
	set_argument_pointer!(args, kim_api.particleContributing, contributing)
	set_argument_pointer!(args, kim_api.coordinates, coords)
	set_argument_pointer!(args, kim_api.numberOfParticles, n_ref)
	set_argument_pointer!(args, kim_api.particleSpeciesCodes, particle_species_codes)
	energy = Ref{Cdouble}(0.0)
	forces = zeros(3, n)
	set_argument_pointer!(args, kim_api.partialEnergy, energy)
	set_argument_pointer!(args, kim_api.partialForces, forces)
	data_obj_ptr = pointer_from_objref(containers)
	fptr= kim_api.@cast_as_kim_neigh_fptr(kim_api.kim_neighbors_callback)
	set_callback_pointer!(args, kim_api.GetNeighborList, kim_api.c, fptr, data_obj_ptr)
	compute!(model, args)
	println(energy)
else
	model = kim_api.KIMModel("SW_StillingerWeber_1985_Si__MO_405512056662_006")	
	result = model(species, positions, cell, pbc)
	println("Energy: ", result)
end

when I call it as

julia script.jl manual

Everything works fine and I get the energy, but when I run it inside my module closure, I always get energy as zero, and the function output is never printed.

What am I doing wrong? First time writing a Julia package, so please let me know if anything trivial missing. ChatGPT was of no help!

I have narrowed it down to the fact that when I call create_compute_arguments in the parent function, then use it in the closure it fails, but when I create the compute arguments inside the closure it works fine. My guess is for some reason it is getting garbage collected, but I am not sure it how it is supposed to be for julia scoping rules. Can someone comment? I am copy pasting the offending module below:

# highlevel.jl

"""
    KIMModel(model_name::String; units=UNIT_STYLES.metal, 
             neighbor_function=nothing, compute=[:energy, :forces]) -> Function
Create a KIM model function that computes properties for given species and positions.
The function returned will accept species codes and positions, and return requested computed properties.
"""
function KIMModel(model_name::String; 
                  units::Union{Symbol, Tuple{LengthUnit, EnergyUnit, ChargeUnit, TemperatureUnit, TimeUnit}} = :metal,
                  neighbor_function=nothing, #TODO: Handle neighbor function properly
                  compute=[:energy, :forces])
    # Initialize model
    if typeof(units) == Symbol
        units_in = get_lammps_style_units(units)
    elseif typeof(units) == Tuple
        # Ensure units are in correct order
        if length(units) != 5
            error("Units tuple must have exactly 5 elements: (length_unit, energy_unit, charge_unit, temperature_unit, time_unit)")
        end
        sorted_units = []
        for unit_order in [:length, :energy, :charge, :temperature, :time]
            sorted_units = push!(sorted_units, getfield(units, Symbol(unit_order)))
        end
        units_in = (sorted_units...,)
    end
    
    model, accepted = create_model(Numbering(0), #TODO use one based numbering
                                   units_in...,
                                   model_name)
    if !accepted
        error("Units not accepted")
    end
    
    # args = create_compute_arguments(model) # this does not work <----------------------
    if length(compute) == 0
        error("At least one compute argument must be requested: :energy or :forces")
    end
    if length(compute) > 2
        error("Only :energy and :forces can be requested together")
    end

    # TODO: Add support for other compute arguments if needed

    # Set neighbor callback if provided
    n_cutoffs = 0
    cutoffs = Float64[]
    will_not_request_ghost_neigh = Int32[]
    
    if neighbor_function === nothing
        # default neighbor function
        n_cutoffs, cutoffs, will_not_request_ghost_neigh = get_neighbor_list_pointers(model)
        println("Model needs $n_cutoffs neighbor lists with cutoffs: $cutoffs")
    else
        error("TODO: Handle custom neighbor function")
    end

    
    species_support_map = get_species_map_closure(model)

    # Return closure
    return function _compute_model(species, positions, cell, pbc)
        # Create neighbor lists if needed
        if neighbor_function === nothing
            containers, all_positions, all_species, contributing, atom_indices = 
                create_kim_neighborlists(positions, cutoffs, species;
                cell=cell, pbc=pbc, 
                will_not_request_ghost_neigh=will_not_request_ghost_neigh)
            println("Created neighbor lists with $(all_positions) total positions including ghosts. $(containers)")
        else
            error("TODO: Neighbor function is not provided.")
        end
        
        particle_species_codes = species_support_map(all_species)
        # Set pointers
        
        coords = reduce(hcat, all_positions) # Vector{SVector} to Matrix{Cdouble/Float64}
        n = size(coords, 2)

        args = create_compute_arguments(model)  # This works <----------------------
        n_ref = Ref{Cint}(n)
        set_argument_pointer!(args, numberOfParticles, n_ref)
        set_argument_pointer!(args, particleSpeciesCodes, particle_species_codes)
        set_argument_pointer!(args, coordinates, coords)
        set_argument_pointer!(args, particleContributing, contributing)
        energy = Ref{Cdouble}(0.0)
        forces = zeros(3, n)
        
        # Results storage
        results = Dict{Symbol,Any}()
        set_argument_pointer!(args, partialEnergy, energy)
        set_argument_pointer!(args, partialForces, forces)

        # Set neighbor list if provided
        if neighbor_function === nothing
            GC.@preserve containers coords forces energy all_positions particle_species_codes begin
                fptr = @cast_as_kim_neigh_fptr(kim_neighbors_callback)
                data_obj_ptr = pointer_from_objref(containers)                            
                set_callback_pointer!(args, GetNeighborList, c, fptr, data_obj_ptr)
                compute!(model, args)
            end
        else
            error("TODO: Handle custom neighbor function")
        end        
        results[:energy] = energy[]
        results[:forces] = forces
        return results
    end
end

I think you’re right about the garbage collection. If you create a pointer p = pointer(X) and do not reference X anymore, julia will assume it can be GC’ed. If this happens inside a function, julia knows that the X is not used anymore after the function returns (actually after the last use of X in the function). If you do it in a script, every variable is global, and julia can’t know if you’ll use it later, so it’s kept alive.

1 Like

Prior to the move, the variable args (not its object directly) would’ve been captured by the closure, which is a reference that should keep its object safe from GC. You can dump a closure instance to see what got captured and the state of their objects.

julia> dump(let a=1, b=2; b=3; () -> (a,b) end)
#28 (function of type var"#28#29"{Int64})
  a: Int64 1
  b: Core.Box
    contents: Int64 3

Far from being an expert on those subjects, so I might be very wrong, however, if I remember correctly, in case of problems related to interaction of C and Julia with regard to GC, Julia would inform you about that in a form of an error message; but really, please take it with a grin of salt, I’m not sure about that.

Thank you for the comments, I narrowed down the issue to only one line change. It was not about the closure, but issue is that if I create references in between the set_argument_pointer! function it gives the mentioned issue,

i.e.

If I did something like

energy = Ref{Cdouble}(0.0)
forces = zeros(3, n)

set_argument_pointer!(args, numberOfParticles, n_ref)
set_argument_pointer!(args, particleSpeciesCodes, particle_species_codes)
set_argument_pointer!(args, coordinates, coords)
set_argument_pointer!(args, particleContributing, contributing)

It works fine,
but

set_argument_pointer!(args, numberOfParticles, n_ref)
energy = Ref{Cdouble}(0.0)
forces = zeros(3, n)
...

Will give issues. So I am not sure if it is Julia GC. This looks suspiciously like out of bound access issues. Though KIM-API pretty defensively written so I doubt it has any such wrong pointer problems. Will try with address sanitizer again to see if it sees something.

Bizarre, you were doing that before in the script and closure, the latter of which also worked with a local args instead of a captured args. Maybe check the state of args between those calls, see what gets changed?

If you’re only doing one call and it doesn’t allocate too much, you could temporarily GC.enable(false) before a test run and GC.enable(true) right after. If that doesn’t fix a failed run, then the problem occurred without the GC running.

1 Like

So I tried using a debugger, and checked the C++ side of code extensively, it behaves exactly the same in both cases.

I also tired wrapping the entire script in GC.enable(false), and GC.enable(true); no change.

But it is related to julia scoping for sure. Specifically due to the energy reference.

There is a small example script I ran in the “faulty version”:

# using Debugger
using kim_api, StaticArrays

positions = [SVector(0.0, 0.0, 0.0), SVector(1.1, 1.0, 0.0)]
cell = [5.43 0.0 0.0; 0.0 5.43 0.0; 0.0 0.0 5.43]
pbc = [true, true, true]
species = ["Si", "Si"]
model = kim_api.KIMModel("SW_StillingerWeber_1985_Si__MO_405512056662_006")	
result = Dict{Symbol, Any}()
# @enter 
result = model(species, positions, cell, pbc)
println(result)

image

Now when I run the same in with Debugger

@enter result = model(species, positions, cell, pbc)

image

So the model is returning proper values (atleast when run in Debugger), but for some reason the dict only sees 0.0 (default initialization?, as it is not the original Ref value, I did energy = Ref{Cdouble}(999.0), and output was still zero)

Here is the real head scratcher, if I use the energy, ref anywhere after computation I get valid results. i.e. before this line:

            results[:energy] = energy[]

If I do

        println(energy) # works
        results[:energy] = energy[]

or

        results[:energy] = energy # works
        results[:energy] = energy[]

or

        results[:energy] = energy # works
        results[:energy] = results[:energy][]

or

        results[:energy] = energy # not equivalent, but works, I can see correct values now

it all works, but equivalent codes below does not

        tmp = energy # does not work
        results[:energy] = tmp[]

even more puzzling for me: how is problem with energy reference influencing another variable force?

Tested in julia version 1.10.0, 1.11.6

Does println(result) still print zeros after a Debugger @enter result = model(...) run? And please copy and paste text of full printouts instead of cropping screenshots of a fraction of them.

Also try dump(model) to see the structure of the closure. I don’t expect surprises but it might help explain why moving args fixed the issue earlier.

This is indeed strange. I would normally assume from seemingly pointless lines being necessary for valid results that results[:energy] = ... was somehow reordered to run after initialization before computation, but you did energy = Ref{Cdouble}(999.0) and it still turned out as zero instead of 999.0. You could also initialize forces = 999 .* ones(3, n) for a similar zeroing check.

GC.enable(false) didn’t fix it (or I assume you’re doing it for every script experiment) so it’s not a dangling reference caused by GC, which shouldn’t be possible across the lines directly using a variable and Ref anyway. That suggests something is intentionally zeroing out your result.

Bit of an aside for now, but I tried to add your package and failed precompilation because of the missing binary, and I haven’t tried figuring that out. Down the road, you’ll want to make a versioned JLL package containing the binary artifact, which your user-level wrapper package will depend on. That’s how automatic installs usually work, and more importantly a particular version of the wrapper package can specify compatible JLL packages holding binaries. I’ve never done it myself though.

Ah thank you for your comments. Let me try and address them one by one

Does println(result) still print zeros after a Debugger @enter result = model(...) run?

This is how I was running the debugger

...
model = kim_api.KIMModel("SW_StillingerWeber_1985_Si__MO_405512056662_006")	
result = Dict{Symbol, Any}()
@enter result = model(species, positions, cell, pbc)
println(result)

Got result:

julia> include("tmp.jl")
In ##thunk#230() at /home/amit/Projects/COLABFIT/kim_api.jl/tmp.jl:10
  6  pbc = [true, true, true]
  7  species = ["Si", "Si"]
  8  model = kim_api.KIMModel("SW_StillingerWeber_1985_Si__MO_405512056662_006")	
  9  result = Dict{Symbol, Any}()
>10  @enter result = model(species, positions, cell, pbc)
 11  println(result)

About to run: <_compute_model(["Si", "Si"], SVector{3, Float64}[[0.0, 0.0, 0.0], [1.1, 1.0, 0.0]], [5.43 0.0 0.0; 0....>
1|debug> n
In ##thunk#230() at /home/amit/Projects/COLABFIT/kim_api.jl/tmp.jl:10
  6  pbc = [true, true, true]
  7  species = ["Si", "Si"]
  8  model = kim_api.KIMModel("SW_StillingerWeber_1985_Si__MO_405512056662_006")	
  9  result = Dict{Symbol, Any}()
>10  @enter result = model(species, positions, cell, pbc)
 11  println(result)

About to run: <return Dict{Symbol, Any}(:energy => 8.404488781104776, :forces => [-31.39238995260516 31.392389952605...>
1|debug> n
Dict{Symbol, Any}()

Without the result = Dict{Symbol, Any}() I get error saying that variable result is not defined in Main. so I thought this is a debugger limitation, that it cannot properly do assignments.

...
model = kim_api.KIMModel("SW_StillingerWeber_1985_Si__MO_405512056662_006")	
# result = Dict{Symbol, Any}()
@enter result = model(species, positions, cell, pbc)
println(result)

...
About to run: <_compute_model(["Si", "Si"], SVector{3, Float64}[[0.0, 0.0, 0.0], [1.1, 1.0, 0.0]], [5.43 0.0 0.0; 0....>
1|debug> n
In ##thunk#230() at /home/amit/Projects/COLABFIT/kim_api.jl/tmp.jl:10
  6  pbc = [true, true, true]
  7  species = ["Si", "Si"]
  8  model = kim_api.KIMModel("SW_StillingerWeber_1985_Si__MO_405512056662_006")	
  9  # result = Dict{Symbol, Any}()
>10  @enter result = model(species, positions, cell, pbc)
 11  println(result)

About to run: <return Dict{Symbol, Any}(:energy => 8.404488781104776, :forces => [-31.39238995260516 31.392389952605...>
1|debug> n
ERROR: LoadError: UndefVarError: `result` not defined in `Main`
Suggestion: check for spelling errors or missing imports.
Stacktrace:
 [1] top-level scope
   @ ~/Projects/COLABFIT/kim_api.jl/tmp.jl:11
 [2] include(fname::String)
   @ Main ./sysimg.jl:38
 [3] top-level scope
   @ REPL[1]:1
in expression starting at /home/amit/Projects/COLABFIT/kim_api.jl/tmp.jl:11

You could also initialize forces = 999 .* ones(3, n) for a similar zeroing check.

This was bit surprising. I did

        energy = Ref{Cdouble}(999.0)
        forces = ones(3, n) * 999.0

And got result

julia> include("tmp.jl")
Dict{Symbol, Any}(:energy => 0.0, :forces => [999.0 999.0; 999.0 999.0; 999.0 999.0])

So I guess forces are not even being touched.

Down the road, you’ll want to make a versioned JLL package containing the binary artifact

Thank you for trying, I am still figuring out Julia distribution system, we are usually Python + Fortran/C++ group so use conda extensively. Will pack it properly in the future. Let me add better instructions for installing. If you have conda, easiest method to get started is:

# get the code
git clone https://github.com/ipcamit/kim_api.jl

# create conda env and install dependencies, will pull gcc, gxx, gfortran
conda create -n kim-api -c conda-forge kim-api=2.4
conda activate kim-api

# export the required binary for now
export KIM_API_LIB=$CONDA_PREFIX/lib/libkim-api.so

# Download and install model code from openkim
# Note, this will put compiled artifacts in a ~/.kim-api folder
kim-api-collections-management install user  SW_StillingerWeber_1985_Si__MO_405512056662_006

# Now launch julia and run the model
cd kim_api.jl
julia --project=./ 
julia> import Pkg
julia> Pkg.develop(path="./")

And then running the script:

using kim_api, StaticArrays

positions = [SVector(0.0, 0.0, 0.0), SVector(1.1, 1.0, 0.0)]
cell = [5.43 0.0 0.0; 0.0 5.43 0.0; 0.0 0.0 5.43]
pbc = [true, true, true]
species = ["Si", "Si"]

model = kim_api.KIMModel("SW_StillingerWeber_1985_Si__MO_405512056662_006")	
result = model(species, positions, cell, pbc)
println(result)