Structure and performance questions from a Fortran programmer: functions, global variables, array allocations, input files, and structs

I’m a physicist used to using Fortran90 for number crunching and Python for plotting and glue code. I’m learning Julia because of its ease of use, promised efficiency, and excellent math support. That said, there’s some habits and methods I’ve learned from Fortran programming that I’m unsure how to translate to Julia, as well as some aspects of Julia whose practical implications I don’t quite understand. So, without further ado, here goes:

  1. Do I need a main()function? Fortran is not like C and thus has no main() function, but I don’t know how far to take the “put everything in a function” recommendation from the manual.
  2. Relatedly, for global variables, is it sufficient to avoid the performance hit if they’re inside a main() function, and/or if they’re declared const or given a fixed type? Fortran is statically typed and compiled in advance, so this was never an issue there.
  3. If I know my array sizes in advance (or at least can get them based off of input parameters), do I gain any benefit from pre-allocating/initializing them, eg, a = zeros(3,3,2)? In Fortran this is mandatory, and even if you’re using allocatable arrays, they still need to be explicitly allocated with a fixed size, type, and dimension before usage, as far as I can tell. I know that the StaticArrays package exists, but that’s recommended for small arrays (< 100 elements) and I can have array sizes easily top 1 million elements.
  4. A standard method of modifying a Fortran program without changing the source code and recompiling it is to provide an input file of parameters, which it then uses to adjust behavior and control flow. Is this a good practice in Julia, or is it better to simply modify the script?
  5. Other than conveniently bundling a bunch of parameters to pass to a function, what is the point of structs? I believe they are a feature of more recent Fortran versions, but as they never appeared in any of the Fotran90 code I’ve written or used, I’m completely unfamiliar with them. They seem to get regular mention as a means of improving performance, but I don’t understand how.

I know this is a lot of questions, so don’t feel like you need to answer all of them at once! If you have insight on any of things I’ve asked above, I’d greatly appreciate it. I want to learn good Julia programming habits from the outset, even if that means letting go of some of my old Fortran ways. Thanks in advance!

  1. No, you don’t need a main() function. It is quite common to write scripts that just call lots of functions written by you or others. The important thing is to ensure that the functions that you are calling do not reference any nonconstant global variables.
  2. Global variables, by definition, are not defined inside of functions, so the first part of your question doesn’t parse for me. But yes, if you declare global constants as const then you won’t hit any performance problems. Similarly you can use type declarations on global parameters (whose values might change) to avoid performance hits in recent versions of Julia.
  3. With some exceptions provided by specialized packages, any array of dimension greater than 1 cannot change its size/extents after it is initially allocated. So this is similar to Fortran. However, the array can be declared in the body of a function (or script) using variables for the extents, and hence can be dynamically sized in that sense. One-dimensional arrays (Vectors) can be resized at will during program execution using push!, pop!, append! and friends. Allocation costs execution time, so yes, pre-allocating arrays and passing them to functions for use is cheaper than allocating them anew each time the function is called repeatedly.
  4. Usually, one writes a function to accept either a parameter list if there are only a few parameters to change, or a struct (or named tuple) that aggregates a large number of parameters, thus not requiring any changes to the source code.
  5. Structs are used for aggregating parameters, but also in Julia one of the prime uses is multiple dispatch. There are many examples of the usefulness of m.d.: try asking Google.

I like using Julia very similarly to Fortran, input files, function @main(ARGS) end to make an entry point, looping, preallocate as much as possible ect ect, once you have a low level code you can try to add high level to refactor and carefully try to keep perf as much as possible.
Note that you will find the exact inverse if someone comes from python I would tell him to use high level functions and even global if he wants and then get addicted with perf improvement.

One thing you may have a hard time to get used to is multiple dispatch it’s really another way of thinking about a problem but if you take the time I’m sure you will like it.

Not exactly about Fortran, but could be helpful:

Also should be helpful:

Structs are used to define types. Types is a pretty complex but a foundational aspect of Julia - see the Julia manual chapter on types.

Generally, reading the Jula manual as a whole, though time consuming, may save you a lot of time.

Thank you very much for answering all of my questions! The only thing I’m unclear on is your answer to number 4. For example, to take input parameters in Fortran, I would typically have a file named something like parameters.inp, then the following code:

open(10, "parameters.inp", status="old")
read(10, *) par1
read(10, *) par2
.
.
.
close(10)

What’s the recommended way to pass a list of input parameters to a Julia script at a runtime? I understand the idea of passing a struct of parameters inside the script to a function, but presumably that struct would in turn need to be defined somewhere else.

It can be a struct, you can also just manually write a script defining all the parameters, you can also put all the parameter inside a Dict and pass it as set of keyword arguments to a function, you can put the parameter set(s) into an Excel spreadsheet or a CSV file and feed it to your script, you can…

Those are all valid reasonable choices

It’s been a while (while, actually 60 years) since I did much in Fortran, but I think you will find Julia refreshing for the reasons you gave. I came to Julia from R and Python and was immediately delighted with its ergonomics and how well R and Python scripts port over and get running quickly in their usual procedural mode. Then, it’s a matter of school algebra y = f(x) to do the tuning. Although Julia will happily defer typing I found that it makes great sense for me to declare those as parameters.

Just so I don’t trip over my shoelaces, I’ll take a stab at parameters to mean how a function is defined as to the values it will take and arguments the values provided to the function at runtime. So, I think what you are asking is can you define a function to take an object composed of one or more sub-objects. And the answer is yes.

function process(data)
    # Uses data["key"] or data.field access
end

# All of these work:
process(Dict("key" => value))         # Dict
process((key=value,))                 # NamedTuple
process(MyStruct(value))              # Struct
process([val1, val2])                 # Vector

and you can work from JSON or TOML, which are more analogous to what you are used to

using YAML

config = YAML.load(open("parameters.yml"))
par1 = config["par1"]
par2 = config["par2"]

using TOML

config = TOML.parsefile("parameters.toml")
par1 = config["par1"]

or from the CLI

```julia
# script.jl
par1 = parse(Float64, ARGS[1])
par2 = parse(Int, ARGS[2])

# Call: julia script.jl 1.5 42

or use a struct in combination

struct Params
    par1::Float64
    par2::Int
end

# Option A: hardcoded/interactive definition
params = Params(1.5, 42)

# Option B: from config file
config = YAML.load(open("params.yml"))
params = Params(config["par1"], config["par2"])

# Then pass to function
result = my_function(params)

As to multiple dispatch just think of it for now as a function able to detect a combination of argument types and treat it appropriately.

I want to emphasize that encapsulating information in data structures has been fundamental to nearly all programming languages for many decades now, and is a widely accepted principle in computer science. (What you are describing is really a Fortran 77 style, in which there is no composite data structure other than an array.) You should really learn to think about data structures regardless of what language you learn, including Fortran 90, if you want to do any significant amount of modern software development.

For example, a classic computer-science reference is Structure and Interpretation of Computer Programs (SICP) (readable online). In Chapter 2 of SICP, they write:

(Chapter 2, and indeed the whole book, is worth reading in its entirety. There are many other books on programming with similar messages, of course, such as Chapter 5: Adding Structure in How to Design Programs.)

Thank you for the added background on structs. To a clarify, a common (albeit not universal) aspect of being a computational physicist is that we’re physicists first and programmers second, so we’re frequently taught the small set of things that work and then little else. Learning Julia is my first attempt at learning how to program, as opposed to simply modifying inherited code and writing my own based on that—which is to say, I know I have a lot to learn.

Thank you very much! I’ll probably work from YAML or TOML for future input files, as that should let me give some more informative structure to the files (as oppose to Fortran-style “here’s some numbers, check the source code to see what they mean”), while still keeping a familiar interface.

Oh, I know where you are coming from. (My PhD is in physics.) A friend of mine in computer science liked to joke that “The definition of ‘legacy code’ is any code written by a physicist.”

Glad that works. You might want to think about a little error trap routine to do the type checking on the passed object to return a more informative error message than the default “No method … .” It could tell you, e.g., argument 2 is type Int32 when it should be Int64.

I’ll focus on your fifth question, since your others have been well covered in the other responses.

For me, struct (or type on F9+) provides a high-level view of the problem. They do this by aggregating parameters into a single object that is closer to the problem. For example, I can describe a layered earth using a bunch of vectors to hold information about layer thickness, resistivity and other electrical parameters, as I had to in F77. Naturally, every appropriate subroutine call has to carry those vectors.

One reason not to use them was that the F90 standard disallowed allocatable vectors from membership, greatly limiting their usefulness. That changed with the F95 standard.

Structures allow me to think about the layered earth model that has components rather than components that make up a layered earth model.

If you haven’t been using these in F9+, then perhaps it’s time to reconsider.

This is a side by side comparison of simple code for particle simulation in Julia and Fortran. Might be of your interest: 2021_FortranCon/benchmark_vs_fortran/README.md at main · m3g/2021_FortranCon · GitHub

Thank you! That’s actually very handy. Could you explain why the example program has a main() function, whereas @PeterSimon 's answer says the main() function is unnecessary?

I usually put runtime configuration into a JSON file. Very easy to read and write.

Switching away from Fortran is very worthwhile. Example: Replacing legacy code with Julia - Julia Community 🟣

main() is just a convenient wrapper. You can just as easily do f(g(h(x))) because functions are first class objects that can be fed to other functions.

I believe Leandro wanted to keep the structure of the two codes as similar a possible. One advantage of using the main function in Julia is that it keeps all the little (local) variables from polluting the global namespace, and from any possible unforeseen “action-at-a-distance” type bugs that might result. It also guarantees that these local variables are just that, local, so that there aren’t any performance gotchas from using global variables inappropriately. However, as I said, it isn’t unusual to see simple scripts that are not inside a function.