What is a Julian approach to data structures for a simulation package?

I’m working on my first major Julia project which is a physics simulation package. The overall goal is to develop this package to be as neat, efficient, and extensible as possible using good Julian code conventions so people can easily add to it in the future.

The user will be able to specify the calculation parameters and physical system in an input file, then ask for various properties to be calculated. As these properties are expensive to compute, I’m aiming for the package to calculate them in an intelligent way.

As an analogy, if the physical system was a circle then we should calculate and store the area only if this field is accessed, not when the object is created and not every time the field is accessed. I have come up with one way to do this by overloading Base.getproperty (see MWE below).

A further complication (that may need a separate post later) is that several properties (a, b, c) can be calculated from the same mathematical object M. Getting M is the expensive part of the calculation so I’d like to set things up such that when the user asks for a or b or c, the package calculates all three.

The current data structure looks like the following where all capitalised fields are structs with their own fields:

Output  # mutable, highest level struct
    Calculation  # immutable, defined from input file
        Parameters
        Options
        System
    Parameter1  # calculated when requested with output.parameter1 or similar
        anintegervalue
        SubParam1  # calculated when SubParam1 or SubParam2 is requested
        SubParam2  # calculated when SubParam1 or SubParam2 is requested
        ...

My questions are:

  1. What is an appropriate layout for the various structs here? Putting everything in one big nested object seems like a clunky way to operate but I haven’t yet worked out anything better.
  2. Should the user get data with circle.area (as in the MWE) or area(circle)? The latter seems more in line with what I’ve seen of Julia so far.
  3. Is there a better way to calculate properties on demand without repeating the same calculations twice?

MWE:

mutable struct Circle
    radius::Number
    area::Union{Number, Missing}
    Circle(radius::Number) = new(radius, missing)
end

""" Extends Base.getproperty such that missing data is filled in when accessed """
function Base.getproperty(object::Circle, field::Symbol)
    if ismissing(getfield(object, field))
        setfield!(object, field, getfieldondemand(object, field))
    end
    return getfield(object, field)
end

function getfieldondemand(C::Circle, field::Symbol)
    if field === :area
        return areaofcircle(C)
    end
end

function areaofcircle(C::Circle)
    return pi * C.radius^2    # some long and complicated calculation
end

circle = Circle(1.0)
@info("circle.area = $(getfield(circle, :area))")
circle.area
@info("circle.area = $(getfield(circle, :area))")
2 Likes

Yeah, I would recommend area(circle). The most common convention in Julia is that accessing the fields of a struct (e.g. circle.area) is generally reserved for the private API, while methods (e.g. area(circle) are used for the public API.

A method-based API might look like:

function area(c::Circle)
  if c.area === missing
    c.area = areaofcircle(c)
  end
  return c.area
end

If you want to compute multiple fields at once, you could do something like:

function area(c::Circle)
  if c.area === missing
    set_area_and_perimeter!(c)
  end
  return c.area

where set_area_and_perimeter! sets both c.area and c.perimeter. You could then also implement a perimeter function that checks the c.perimeter field in the same way. The set_area_and_perimeter! will only be called once, no matter how many times you ask for area(c) or perimeter(c).

If you have multiple fields that work like this, it might get tedious to repeat this pattern of checking a field for missing and then calling a function to populate the value of that field if it’s missing. One option would be to use a higher-order function (a function taking in a function as its argument), something like:

function cached(obj, field_name::Symbol, update_function)
  if getfield(obj, field_name) === missing
    update_function(obj)
  end
  return getfield(obj, field_name)
end

You could use this function like:

julia> function set_area!(c::Circle)
         c.area = π * c.radius^2
       end
set_area! (generic function with 1 method)

julia> area(c::Circle) = cached(c, :area, set_area!)
area (generic function with 1 method)

Another option would be to write a macro, which would let you define functions like this in an even more compact form. For example:

julia> macro cached(T, field_name, update_function)
         quote
           function $(esc(field_name))(obj::$(esc(T)))
             if obj.$(field_name) === missing
               $(esc(update_function))(obj)
             end
             obj.$(field_name)
           end
         end
       end
@cached (macro with 1 method)

The @cached macro just expands to the function definition we wrote above, and we can call it like this:

julia> @cached(Circle, area, set_area!)
area (generic function with 1 method)

julia> area(c)
3.141592653589793

You could even write a better macro that operates on a function definition, something like:

@cached area(c::Circle) = set_area!(c)

Having a macro operate on something that already looks like a function definition might make it easier for readers of your code to guess what’s going on.

You might also find the various memoization packages in Julia, like GitHub - JuliaCollections/Memoize.jl: @memoize macro for Julia and GitHub - marius311/Memoization.jl: Easily and efficiently memoize any function, closure, or callable object in Julia. useful for inspiration, since what you are doing is essentially memoization.

8 Likes

Thanks for the quick and detailed response! I will implement things using a more method-based approach.

The definition of macros looks like an incredibly powerful feature of the language that I haven’t explored yet and I appreciate your clear introduction to how macros would be useful for solving this problem. Knowing that what I’m aiming to do has a specific name is also going to be very helpful as I continue to work on this.

Did you have any thoughts about the overall data structure? Maybe it would be best if I just set things up as I laid them out in my last post then do some profiling to see where I can make things more efficient.