Reading input parameters from a text file

Hello,

I’m writing a simple simulation program, and I want the input parameters to be read from a text file at runtime. My strategy is to read all of the parameters into a dictionary, which works, but feels very clumsy. I’m curious to know if anyone else has a better solution.

The text file containing the inputs would look something like this:

# comments are indicated by '#'
num1 = 10
num2 = 4.6  
num3 = 1e20
bool1 = true
bool2 = F   # interpreted as false
option1 = string1

These parameters are parsed and stored in a dictionary of the form:

params = Dict(
    "num1"     =>  10,
    "num2"     =>  14.6,
    "num3"     =>  1e20,
    "bool1"    =>  true,
    "bool2"    =>  false,
    "option1"  =>  "string1"
)

The difficulty arises when I go to use these parameters, because I need to reference the dictionary every time I need one of these options. A simple calculation using the above values could look like:

params.num1 * params.num2 * params.num3

Which is messy to say the least.
In the actual case, my I/O code is contained in a module separate from the code that uses the simulation parameters.

Thanks in advance,
Patrick

PS: For anyone who may be interested, here is the full parsing function I’ve written:

Summary
"""
    parseinputs(filename, dsettings)

Parses parameters from a text file and stores them as a dictionary.

Opens the file specified by `filename`, which must include the path.
Parameters must be setting-value pairs delimited by `=`.
Blank lines are ignored, as is any text following `#`.
Whitespace does not matter, nor does text case.  
Booleans can be entered as "true/false", "T/F", or "1/0".  

A dictionary `(dsettings)` containing all parameters must be provided. Another
dictionary is returned containing those same parameters, with values different
from the default only if a matching identifier was found in the input file.
"""

function parseinput(filename::AbstractString, dsettings::Dict=defaults)
    usettings = copy(dsettings)
    for line in eachline(filename)
        line = lowercase(strip(line))
        if isempty(line) || line[1] == '#'
            continue
        end
        setting, value = split(line, ['=','#'], keepempty=false)
        setting = strip(setting)
        value = strip(value)
        if setting in keys(usettings)
            dtype = typeof(dsettings[setting])
            if dtype == Bool
                if value == "false" || value == 'f'  || value == 0
                    value = false
                elseif value == "true" || value == 't' || value == 1
                    value = true
                else
                    error(string("Unable to parse value entered for ", setting))
                end
            elseif dtype <: Number
                value = tryparse(dtype, value)
                if isnothing(value)
                    error(string("Unable to parse value entered for ", setting))
                end
            elseif dtype <: AbstractArray
                value = split(value, [',',' '], keepempty=false)
                value = tryparse.(Float64, value)
            end
            usettings[setting] = value
        end
    end
    return usettings
end

I don’t have any advice on how to store or parse parameters from text files, but if the number of parameters you need to read aren’t too large, it may be better to store them in either a custom struct (if only the values of the parameters change, and not the set of parameters itself) or a NamedTuple. This would entail writing either:

# Struct
@kwdef struct Params
    num1::Int = 10
    num2::Float64 = 14.6
    num3::Float64 =  1e20
    bool1::Bool = true
    bool2::Bool =  false
    option1::String = "string1"
end

params = Params() # Default values
params = Params(num1 = 3) # Change one of the values 

# Named Tuple 
params = (;
    num1 = 10,
    num2 = 14.6,
    num3 = 1e20,
    bool1 = true,
    bool2 = false,
    option1 = "string1"
)

If you do this, you can use destructuring by name inside functions, which can be handy for carrying out computations with the parameters:

function my_function(x, params)
    (; num1, num2, num3) = params 
    x + num1*num2*num3
end

This may also yield some performance benefits as the Dict you have above carries a bunch of different types, which means Julia will not typically be able to infer types when using it.

5 Likes

@JonasWickman, those are interesting ideas, thanks.

The number of parameters I have is ~20. That number is fixed according to the design of the simulation, but the value of each parameter may differ with each run. That is why I have chosen to read those values from file - so that I can easily change the configuration between runs.

Any given parameter may be used multiple times in different contexts. I wonder if it would be best just to pass the Dict/struct/tuple around to all my functions and destructure as you have described. I was actually unfamiliar with that destructuring syntax you described, that could be very helpful for a lot of reasons.

Although it’s not clear to me how the custom struct and the named tuple differ under the hood, or why I would choose one over the other.

This reminds me of a question Trixi.jl, which is also a simulation package, faced several years ago. Back in the beginning of Trixi.jl, input parameters were stored in .toml files. However, this turned out to be unhandy, error-prone, and not flexible enough. Think, e.g., of the case where an input “parameter” is a function, e.g., an initial condition. It’s much easier to provide this as a Julia function. So a question you could ask yourself: Do I really need to rely on text files as input files or could I also use Julia scripts as the main entry point to your program? If your main entry point is a Julia script, you already have access to num1, num2 etc. This does not mean that structuring the input parameters into logical entities isn’t helpful anymore. It could still make sense to bundle them into a struct or similar. Depending on your application it might also make sense to bundle different sets of parameters together rather than every parameters into one data structure. For instance, some parameters could be physical parameters, which would make sense to bundle together and others could be solver related and so on. This could additionally help you organizing the parameters. Once you have (a) datastructure(s) with your parameter, I agree with @JonasWickman that destructuring syntax can be helpful for cleaner computations.

6 Likes

I write a lot of simulation code, and I always use yaml files as input for the simulation parameters. They are easy to use in your script (just parse them using the package YAML.jl , and easy to edit manually (and you can include comments, which is not so easy if you use toml files).

And with three lines of code you can convert your .yaml file into a nested struct:

    data = YAML.load_file(filename)
    wind_data = data["wind"]
    wind = convertdict(Wind, wind_data)

OK, this assumes that a struct Wind is defined that matches the structure of your yaml file. But you can create such a struct using AI from your yaml file.

The nice thing about structs are:

  • they are much faster than dicts
  • you can use dot completion (type wind. <TAB> and you see the elements of your struct)
4 Likes

Perhaps you can use something like JLD2.jl 's @load?

julia> using JLD2

julia> begin
           num1 = 10
           num2 = 4.6
           num3 = 1e20
           bool1 = true
           bool2 = false
           option1 = "string1"
       end
"string1"

julia> @save "jld2file.jld2" num1 num2 num3 bool1 bool2 option1  # or JLD2.jldsave("jld2file.jld2"; num1, num2, num3, bool1, bool2, option1)

In another REPL:

julia> num1
ERROR: UndefVarError: `num1` not defined in `Main`
Suggestion: check for spelling errors or missing imports.

julia> using JLD2

julia> @load "jld2file.jld2"
6-element Vector{Symbol}:
 :num1
 :num2
 :num3
 :bool1
 :bool2
 :option1

julia> num1
10

I’m not sure if you can make a jld2 file human-readable/writeable, but you could create a similar macro for your text format.

1 Like

My advice:

  1. Use a common text based file format. YAML, TOML, JSON, and XML are all valid options. I usually prefer YAML, but there are pros and cons with all of them. Choose whichever works best for you.
  2. Reading into a Dict, or nested Dicts if you have a hierarchical structure, is standard routine and easy to do with at least the first three formats.
  3. Internally converting to a struct is usually a good idea if you have a sufficiently stable set of parameters, both for destructuring features and type stability. If you need to be more dynamic, wrapping the Dict in a type with dot overloading (getproperty, setproperty!) can be useful to get access to destructuring.
  4. If you don’t use an internal struct, use type assertions or function barriers to avoid type instability.
7 Likes

In some sense, you want a collection of objects from a different context to be used in the simulation context. Well, that’s a module. So you could use something like

module Parameters
export num1, num2, num3, bool1, bool2, option1

num1 = 10
num2 = 4.6  
num3 = 1e20
bool1 = true
bool2 = F
option1 = string1

end

Then in your simulation just use

include("path/to/Parameters.jl"); using .Parameters

num1 * num2 * num3
# ...

then you can use num1, num2, etc. without needing to getproperty from a struct or getindex from a dictionary. Additionally, as @JoshuaLampert pointed out, you could also include julia functions in there as well.

2 Likes

@GunnarFarneback is spot on - go with TOML or YAML for readability.

I’ll just add (in relation to point 3) that StructUtils.jl makes it really easy to define structs to store your parameters and to convert between Dicts and your own structs.

This enables validating your parameters before running your main code, plus it makes accessing your parameters type stable.

2 Likes

Just to mention one more option complementing the many great suggestions above (more choice is not always better, but I’ll say it anyways :sweat_smile: )

There is also Configurations.jl which serves as tool for parsing structs from text files via dictionaries. The actual dictionaries can be read from YAML, TOML, etc. (I would not recommend writing your own parser unless you have a clearly defined and limited set of possible inputs).

The advantage is similar to StructUtils.jl that you can define the struct as an “interface” between your code and the saved file. Within the code you can rely on the correct field types and when reading the file, you only have to do parsing once to check that the file is actually meaningful (it’s usually better to “parse and not validate”). Additionally, there are from_dict methods you can overload to parse the raw dict entries to some more complicated Julia objects. One drawback of Configurations.jl in my experience, though, is that debugging and constructing certain (more complex) structs might be unintuitive, mostly due to lost of generated functions being used internally.

2 Likes

Thanks for the replies everyone, YAML seems like a great approach. After staring at the docs for a minute, I do have one remaining question - how to handle mismatch between my internal struct and the YAML file.

I originally took the dictionary approach because it was easy to iterate over the input file searching for keys, which I could compare to the dictionary of default values. This was also very convenient for the user, because only those values which differed from the default needed to be present in the input file.

With the YAML approach, I am presented with data (as a dictionary? It’s actually not clear to me from the docs). If the data from file exactly match the form of my struct, then there is no issue. But I’m unsure how to handle the case where my input file does not exactly match. I initially assumed there would be a straightforward way to iterate over structs, so I could compare fields, but it turns out that is not at all the case.

With Configurations.jl you can define the structs directly with defaults. Then the dict (and in turn the YAML file) only have to contain the struct fields that should deviate from the defaults. The same should be possible without much hassle by using Base.@kwdef.

I’d usually use an XLSX spreadsheet with headers, and read it into a DataFrame. Then you can select a row, which can be de-structured just like a NamedTuple. You can e.g. take the last row by default, and add new parameter sets as you go.

I can use the same spreadsheet for some additional bookkeeping, some fast calculations, writing the simulations summary into the same row or in a separate table etc.

To my relief I must probably say I use LibreOffice :wink:

No one has mentioned Preferences.jl yet, just otssing in that 2 cents.

No, that’s a data structure.

For example, suppose you wanted to change the parameters in a loop. Or generate and save an ensemble of thousands of different parameter sets. A data structure can be updated dynamically or stored in an array, but a module is not designed for this.

I second @JoshuaLampert’s comment above:

  • Structure your code to provide functions and pass data via parameters (possibly packaging related parameters into data structures for convenience), not as scripts where the parameters are global variables.
  • Create a Julia package+module to hold your simulation functions and data structure definitions (but not the parameter values from individual runs). See Best practise: organising code in Julia - #2 by stevengj
  • The prototypical way of saving parameters is probably a Julia script (that imports your module for the simulation functions, and then calls the functions with particular parameter values).
  • Once you have parameters packed into Julia data structures, you can also save them into any compatible file format, e.g. JLD files or JSON.
3 Likes

I have been running simulations for several years as part of my PhD. I started in text files but I eventually moved to the following system. I don’t know if its particularly good but it has worked okay for me.

I create a “param” type struct. I store each of these in their own file. So something like this.

Case123 = param_struct_1(
    param1 = ...,
    param2 = ...,
)

Within the param_struct_1 I might set various defaults or perform calculations. The param struct is then passed to other functions that formulate the solver input.

I also have a file that looks like this

include("Case123.jl")
include("Case124.jl")
... # and so on

CaseType1 = Dict(
   "Case123" => Case123,
   "Case124" => Case124,
   ...
)

I have various CaseType1 like Dicts that all get merged into a single Dict which I then use to access parameters and start a simulation.

1 Like

After parsing yaml you get a dictionary, or a dict of dicts, but it can also contain arrays, just as you defined it in the yaml file.

If they do not match exactly it often just works. In your structs you can also define default values.

And iterating over dict entries manually is easy:

for (key, value) in my_dict
    println("Key: $key, Value: $value")
end

Example for a complex yaml file:

Function for parsing it:

It actually does not return one struct, but five of them and an array, but that is a matter of taste, I could also have written it such that it just returns one large struct of structs.

And then if, inside a function, you want to use parx instead of StructA.parx

"""
    @fields_to_vars(t,x)

Utility macro to convert struct fields to local variables (for readibility, so that we can write `parameterx` instead of using everywhere `p.parameterx`).
"""
macro fields_to_vars(t::Symbol, x)
    type = Core.eval(__module__, t)
        if !isstructtype(type)
            throw(ArgumentError("@fieldvars only takes struct types, not $type."))
        end 
    esc(:( (; $(fieldnames(type)...)) = $x::$type ))
end

I have a very simple set-up: I first write a function with all arguments default arguments

function run_stuff(; a = 1, b = 1, c = 1)
# do something
end

Now say I have for a given simulation settings recorded as a vector with variable names var_names and a vector with values var_values (these don’t need to contain all of the arguments to run_stuff. I turn them into a named tuple

named_vars = NamedTuple(zip(Symbol.(var_names), var_values))

and then simply run

run_stuff(; named_vars...)

If you need to pass more complicated structs, you would need some additional steps to automatically generate these, or encode their information as strings, if possible.

This can be achieved with dicts if the keys are symbols:

julia> p = Dict( :a => 1, :b => 2 )
Dict{Symbol, Int64} with 2 entries:
  :a => 1
  :b => 2

julia> f(x; a, b) = a + b*x
f (generic function with 1 method)

julia> f(3; p...)
7

(ps: personally I use the solution suggested in Reading input parameters from a text file - #2 by JonasWickman)