Structure and performance questions from a Fortran programmer: functions, global variables, array allocations, input files, and structs

Just a side note, that’s not an example of first-class functions, since no function there is being fed a function as it’s argument (unless g or h happen to return a function). Rather, g is being fed the output of h(x), and the output of g(h(x)) is being fed into f.

Things like map(sin, [1, 2, 3]) would be an example of a first class function.

I was just trying to help OP with my own interrogation about best way of reading a parameter file, so I asked to a well known « expert » (I will not cite its name, see last part of this post)

In Julia, reading a configuration file vs a parameter (input) file, recommended packages?

I got a rather good and clear (IMHO) answer (about a page of text, no code), that I summarized in 15 lines.
I now re-compress it here in one line

For Julia specifically, TOML is almost always the right call unless you’re interfacing with an external system that uses YAML.​​​​​​​​​​​​​​​​

Aside, I got my post flagged as AI

Your post was flagged as **spammy or AI** : the community feels it is either an advertisement or the verbose and relatively content-free output of a generative automated tool.

I am quite surprised to be flagged « spammy/AI », as there is no code in my post, rather a factual analysis. I agree I did not perform myself the analysis, but I give the question and the source of the answer to be reproducible. Is that forbidden by community rules? How to convey information in an acceptable way for Julia community policy ?

Yes, see the discourse guidelines:

Don’t post generative AI outputs (but direct human language translation and minor editing is ok).

See also the recent discussion here: Non-native English speakers and the use of language assistance

Not a direct answer to your questions, but I think it’s worth to mention the package DrWatson.jl
It is very useful to manage simulations and the results. Other advantage, it imposes very little on the user and allows for different types of workflows.

One useful example of a struct that you will understand is Complex{T}. (Do I vaguely recall that Fortran has complex numbers natively? So does Julia in the sense that they’re in Base, but it still has to define their behavior somewhere. If you want a “non-native” example, neither has native support for Quaternions.jl)

A Complex{T} (where T <: Real, such as Int or Float64) is, data-wise, a pair of real numbers. But, unlike any generic pair of real numbers, they are endowed with useful operations specific to them. Existing functions such as abs and * are overloaded to have appropriate behavior for Complex. New functions are defined with new behavior such as conj (which is a bad example because it’s actually defined for Real, although there it’s a no-op – in fact, virtually every operation on Complex is also defined on Real so I can’t actually provide examples for this case).

It’s much easier to write (and read and understand)

x = Complex(1,2)
y = Complex(3,4)
z = x * conj(y)

than to manually inline the whole operation using primitive types

xr = 1
xi = 2
yi = 3
yr = 4
zr = xr * yr  - xi * -yi
zi = xr * -yi + xi * yr

Another useful feature of Complex being a struct is that you don’t need to
convert a vector of complex values into vector with interspersed real and complex
parts:


julia> x=[1.0+2im, 3+4im,5+6im]
3-element Vector{ComplexF64}:
 1.0 + 2.0im
 3.0 + 4.0im
 5.0 + 6.0im

julia>x_real = reinterpret(Float64,x)
6-element reinterpret(Float64, ::Vector{ComplexF64}):
 1.0
 2.0
 3.0
 4.0
 5.0
 6.0

This does not allocate new memory, it just creates another pointer to the same block of data, but annotated “treat this as a real vector”. The original vector x remains intact.

Edit: x and x_real actually refer to the same data in memory:

julia> x[1]=conj(x[1]);
julia> x_real
6-element reinterpret(Float64, ::Vector{ComplexF64}):
  1.0
 -2.0
  3.0
  4.0
  5.0
  6.0
julia> x_real[3]=-x_real[3];
julia> x
3-element Vector{ComplexF64}:
  1.0 - 2.0im
 -3.0 + 4.0im
  5.0 + 6.0im

Just a comment about structs and parameters. I do have code with about 30 parameters, i.e. sizes, values of various constants, algorithm choices etc. Rather than pass them down through every function which may or may not need them, there’s something in julia called “scoped values”. They work as follows.

Define a struct containing the parameters, (the @kwdef makes it possible to supply defaults):

@kwdef struct Parameters
    vectorsize::Int = 128
    zilch::Float64 = sqrt(pi)
    algorithm::Int = 23
    ...
end

It’s important that every parameter has a concrete type, otherwise access will be slow.

Then some globals for frequently used parameter sets, and a ScopedValue:

const defaultparams = Parameters()
const alg17params = Parameters(algorithm=17)
const longvecparams = Parameters(vectorsize=1024, algorithm=8)
const peterstrick = Parameters(peterstrick=true)


using Base.ScopedValues
const parameters = ScopedValue(defaultparams)

Now, here’s the trick with a ScopedValue. It can be changed on the run, and picked up from anywhere:

function myweirdmul(x, y)
    par = parameters[]
    ...
    if par.algorithm == 4
        ...
    elseif par.algorithm > 17
        ...
    end
end

Now, say my main function calls deep into something, and deep down there myweirdmul is called, and many other functions accessing my global parameters. I can choose which parameter set to use for the entire run by calling main as follows:

@with parameters => alg17params main()
@with parameters => longvecparams main()

or I may setup my own parameters, either via a file as suggested above, or programmatically:

mypar = Parameters(xyzdiffusion=true, stepsize=1e-5, outputfile="FOR008.DAT")
@with parameters => mypar main()

This way of handling parameters saves you from passing them down deep into the call chain.
It’s possible to have multiple scoped values, and set them individually with the @with clause.

The point is it doesn’t matter that much how you collect your inputs, you are free to use whatever method you find most ergonomic. I most often define a main() function that prompts me to enter inputs in the terminal and select input files / output folders using NativeFileDialog.jl.

However you get your input data, as long as you pass the inputs to your main compute functions (either directly or pre-processed into struct containers), they will still be fast. “Avoid non-constant global variables” really just means make sure you pass them as arguments to your functions instead of relying on inheriting their values from the global scope.

It’s for lumping together things that belong together, and to define new operations on collections of simple types. The Complex datatype is mentioned above. Here’s another one, the Dual numbers which are used for automatic differentiation:

#=
A Dual number is a real number and an epsilon part, i.e. like a + b*eps,
where the eps is an "infinitesimal" with the property eps^2 = 0
It's used for differentiation according to the rule:

   f'(x) = (f(x + eps) - f(x))/eps

This formula can be rewritten as

   f(x + eps) = f(x) + f'(x) * eps

Thus, we need to make rules for evaluating functions at values of the form
a + b*eps. Since functions are either simple arithmetic or elementary, we
just define simple arithmetic on our new Dual numbers, and the elementary functions which
are not written in julia.

The following is the idea, a more complete implementation is in the package ForwardDiff.jl.
For gradient computations there are better solutions at https://github.com/juliadiff, and in
various packages typically associated with machine learning.
=#

struct Dual <: Number
    re::Float64  # real part
    eps::Float64 # infinitesimal part
end

# Here are the elementary operations (remember, eps^2 = 0):
Base.:(+)(a::Dual, b::Dual) = Dual(a.re + b.re, a.eps + b.eps)
Base.:(-)(a::Dual, b::Dual) = Dual(a.re - b.re, a.eps - b.eps)
Base.:(*)(a::Dual, b::Dual) = Dual(a.re * b.re, a.re*b.eps + a.eps*b.re)
Base.:(/)(a::Dual, b::Dual) = Dual(a.re / b.re, (a.eps*b.re - a.re*b.eps)/b.re^2)
Base.:(-)(a::Dual) = Dual(-a.re, -a.eps)

# If we mix a Dual and an ordinary Real, e.g. with multiplication, the Real should be promoted to Dual
Base.promote_rule(::Type{Dual}, ::Type{<:Real}) = Dual
# Conversion of a Real to a Dual is a Dual with zero eps part
Base.convert(::Type{Dual}, a::Real) = Dual(a, 0.0)

# Some elementary functions, a more complete set is in the package ChainRules.jl
Base.sin(x::Dual) = Dual(sin(x.re), cos(x.re)*x.eps)
Base.exp(x::Dual) = Dual(exp(x.re), exp(x.re)*x.eps)
Base.log(x::Dual) = Dual(log(x.re), x.eps/x.re)

# Compute function and its derivative at x by computing f(x + eps)
function valanddiff(f, x::Real)
    val = f(Dual(x, 1.0))
    return (val.re, val.eps)
end

# Here's my function of the day, we can differentiate any real function in this way
f(x) = 2x^2 * exp(sin(x)^2)

# compute its value and derivative at x=0.3:
julia> value, deriv = valanddiff(f, 0.3)
(0.1964266429789332, 1.4204217787251963)

As a flip side to the aforementioned benefits of Julia structs (or other kinds of named composite types in other languages), I want to caution against defining structs when there is no clear benefit, especially since tutorials tend to make up composite types for illustrative purposes. A concept with components does not need to be represented by a struct with similar components; a car has 4 wheels, but a vehicle registration program probably does not need a Car struct with 4 Wheel fields. Some composite data don’t warrant a struct with distinct operations; overall complex arithmetic is faster in rectangular form, so there isn’t a struct or arithmetic for polar form, just abs and angle when needed for separately treated components. In some contexts like entity-component systems, some composites (entities) aren’t represented by any type. The decision is easy in most cases, and it’s easily reversible in most of the rest. If you’re accustomed to a programming style without structs and don’t have any obvious refactors, you could bundle and unbundle input parameters with tuples, named tuples, or dictionaries instead of structs.

I’ll also caution against defining primitive types in general. The same logic applies; if your program is based on existing primitive types and their operations e.g. 32-bit floats, there’s no point in making new types to represent concepts like heights and widths. In many cases, functions provide enough context to not mix data up e.g. area(height::Float32, width::Float32). When we really do need a new type for distinct operations, a concept without multiple components is almost always better as a 1-field struct, so we can base algorithms on the primitive types and operations supported by conventional compilers and hardware. New primitive types are only justified for exotic low-level data formats.

That’s how I’d probably read parameter from a TOML file into my script. I wouldn’t put unrelated values into a struct. Note some parameter are not provided, in which case the default values are used, and some provided parameter are not used anywhere and ignored.

#####
# the functions are in separate file(s) and part of a package MyPath #

function acceleration(; g=9.81, p=1e5, m=1, s=1e-4, kwargs...)
    a = ((p * s)/m) - g
    return a
end

function path(a; v₀=0, t=10, kwargs...)
    return v₀ * t + a * t^2
end

function to_symb(d::Dict{String, Any})
    return Dict(Symbol(k) => v for (k, v) in d)
end

#####

# using MyPath

# for our example we define parameter directly inside the skript instead of reading them from a file #
# d = TOML.tryparsefile("datapath/parameter.toml")

d = Dict(
    "p" => 3.5e5,
    "m" => 1.6,
    "s" => 1.3e-4,
    "v₀" => 3.15,
    "experiment" => "yesterday"
)

d = to_symb(d)

a = acceleration(; d...)
p = path(a; d...)

@show p
;

For a physical calculation I’d in any case use Unitful values. That’s a different story, but actually I did that just to check myself:

a = acceleration(; g=9.81u"m/s^2", p=1e5u"Pa", m=1u"kg", s=1e-4u"m^2")
p = path(a; v₀=0.1u"m/s", t=10u"s")
@show a, p

# (a, p) = (0.1899999999999995 m s⁻², 19.99999999999995 m)

@ducksoverip I’m in the same boat as you, except that I started a few years earlier than you. So, my experience might be of some help to you. Instead of answering all your questions, I’ll answer the easiest one (to me) first:

First of all, the struct is less “necessary” in Julia than in most other languages including F90, C, C++, Java, etc.

In a fully object oriented programming style, the struct (or “class”) is essential to bundle multiple variables to treat them as a single object. In a not fully object oriented style, bundling multiple variables can be useful, for example, if you want to handle a point in a 3D space as a single object:

type Point
    real:: x, y, z
end type Point
type(Point):: p
! use p.x, p.y, p.z

In other languages than Julia, you may also want to bundle “a bunch of parameters to pass to a function”, as you say, but in Julia, the NamedTuple is more flexible and easier to use:

function myfunc(; x, y, z)
   # . . . use x, y, z and produce a, b, c
   a = . . . 
   b = . . .
   c = . . . 
   return (; a, b, c)
end

# call myfunc with an explicit bundle
pars = (; x = 3.14, y = 2.0, z = -999.9) # NamedTuple
ret = myfunc(; pars...) # "expand" it as arguments
# . . . use ret.a, ret.b, ret.c

# call myfunc with implicit NamedTuple
ret = myfunc(; x = 3.14, y = 2.0, z = -999.9)
# . . . use ret.a, ret.b, ret.c

# Same, except the returned NamedTuple is expanded:
(; a, b, c) = myfunc(; x = 3.14, y = 2.0, z = -999.9)
# . . . use a, b, c

NamedTuple is a light-weight mechanism to create a struct-like object. It can be used like a struct: println(pars.x), for example. Yet, unlike the real struct, it doesn’t need a formal definition such as type Point; real:: x, y, z; end type Point.

Moreover, you can choose between expanding it or not doing so each time you use the function: You don’t force the user of your function to formally create a struct.

So, I recommend using structs only where you would use structs in Fortran. In other situations, you can just use NamedTuples.

@ducksoverip I switched from F90 to Julia several years ago and has had the same problem as yours:

You can basically use the same strategy in Julia. The largest difference you may notice would be that in Julia, often, functions are compiled when it’s called for the first time. So,

# someglobal a global variable whose value is determined from an input file.
if someglobal == 1
   func1()
else
   func2()
end

may take some more run time than the Fortran version.

In some other cases, as you may already know, global variables may hit the run-time performance.

Otherwise, you don’t have to worry too much. You can just program as you do in Fortran.


Having said that, dependency like this on global variables would become quite messy in any language. As/if your program grows, managing changes tend to become more and more complicated in the presence of global variables.

If at all possible, it would be much better to make explicit the dependency on the contents of the parameter file:

pars = read_pars(parmeter_file)
func1(; x = pars.x, y = pars.y)
func2(; zz = pars.zz, . . . )
if pars.p == 1
   # . . .
else
   # . . . 
end

whether it’s Julia or Fortran or any language.

My $0.02 on parameter files: if there is no need of interoperability with other languages leading to the need to use toml, yaml, json, python (yes, python could do here as well) or the like, it IMHO can make sense to use just Julia code in input files. This would avoid another potential two language problem and allow for straightforward use of units etc.
It could go like this:

main.jl:

using DynamicQuantities

Base.@kwdef struct Frobnicator
    a::Int=9
    b::Int=10
    frob::Bool=true
end

function readparams(paramfile)
    return include(paramfile)
end

function run(params)
    println(params)
end

function @main(ARGS)
    return run(readparams(ARGS[1]))
end

input.jl

(
    x = 3u"cm",
    y = 5,
    z = 6,
    f = Frobnicator(frob = false),
)

Note, that we just use the return mechanism of input instead of trying to access variables defined in the input file directly. Here, we return a tuple containing a struct, we also could use a struct like Frobnicator for return values.

I’d like to add an additional condition that all involved Julia code is strictly written or generated in advance by yourself or by very trusted individuals. That includes the Julia module that include is evaluating into. Otherwise, include allows eval injection, which is dangerous in every dynamic language that has it, and unintended execution of buggy mid-development methods.

If the risk of arbitrary code execution should be avoided, then limited parsing of data formats is the way to go.