Letting a package use external user-created functions

Some comment, which might or not apply to the use case here, but which are more or less what is typical in particle simulations:

Generally when one thinks on changing the type of interaction, (lennard-jones, or gravitational), we define a new function:

lennard_jones(p1::Particle,p2::Particle) = ...
gravitational(p1::Particle,p2::Particle) = ...

Then one would pass one or the other to the function that does the simulation,

sim = run_simulation(p,pot=lennard_jones)
function run_simulation(p::Vector{Particle};pot=default_potential)
  ...
     u += pot(p1,p2)
  ...
end

Futhermore, it is common that each potential energy depends on different parameters, and then you can use this syntax to pass the function receiving the parameters in a closure:

lennard_jones(p1,p2,eps,sig) = ....
sim = run_simulation(p, pot = (p1,p2) -> lennard_jones(p1,p2,eps,sig)

That will give the user complete control on the implementation of the potential energy function independently on how the call to it is implemented in the run_simulation function.

If the user wants now to use the sum of the interactions, he/she would be able to write, for example,

totpot(p1,p2,eps,sig) = lennard_jones(p1,p2,eps,sig) + gravitational(p1,p2)

pass totpot to run_simulation, avoiding the type-instability problems in the interactions and run-time dispatch. If the “types” of interactions are many, that may be even more transparent to the user.

I think that if one goes to the direction of trying to implement a general simulation function that accepts lists of interactions of different “types” and of particles of different “types”, one inevitably will hit type instability problems and run-time dispatch and that will degrade performance. The fastest way to dispatch on multiple types of particles or interactions will be at the end manual splitting of the loops with if isa T1 -elseif..., but even that ends to be highly suboptimal relative to restructuring the data (we have recently discussed this quite extensively here). Thus if you “hard-code” the splitting inside your run_simulation function (with for interaction in interactions..., you are driving the user to implement a suboptimal solution to his/her problem.

If you want to give flexibility on the types of interactions and of particles, I think that a reasonable strategy would be to check which is less fragmented (probably the interactions). For example, if I had thousands of particles of 5 types of particles and 2 types of interactions, the data structure I would use would be to define 5 structs containing each all the data of the particles of that type, such as:

struct NeutralAtoms <: Atoms
  p :: Vector{SVector{3,Float64}} # length = number of particles of type NeutralParticle
  eps :: Vector{Float64}
  sig :: Vector{Float64}
end
struct ChargedAtoms <: Atoms
  p :: Vector{SVector{3,Float64}} # length = number of particles of type NeutralParticle
  eps :: Vector{Float64}
  sig :: Vector{Float64}
  charge :: Vector{Float64}
end

and the functions that compute the interactions would perform all the work for a specific pair of types of particles each:

lennard_jones(atoms1::Atoms,atoms2::Atoms) = compute all interactions of the sets of particles
coulomb(atoms1::ChargedAtoms,atoms2::ChargedAtoms) = compute all interactions 

You still have to loop over the types of particles and types of interactions, but the costly part will be inside the functions that compute each set of interactions over all pairs of particles of the sets with specific types, and these functions will be type-stable. Run-time dispatch can occur but only for the types, not for each particle individually, thus much less frequently.

2 Likes