Hand-editable serialization: JSON, Fortran's namelist, etc

I don’t know the issue well enough to be able to come up with an apt title to this thread.

My question is, what are the common, hand-editable data formats to initialize variables?

Suppose you want to supply various parameters and values to your Julia program from a hand-editable text file. What format would you use?

When I was using Fortran, the namelist was the obvious and most convenient format, because you can specify the names of the variables and their values in the text file. Instead of showing the exact format of Fortran’s namelist, I show a pseudo code:

# namelistfile.txt
arr = 1,3,5,9
s = "hellow world"
c = 3.0 + 2im
# pseudo Fortran code
integer:: a[4]
string:: s
complex:: c
namelist/myparameters/ arr, s, c
filehandle = open("namelistfile.txt")
read(filehandle, namelist=myparameters) # -> arr, s, and c are initialized

What do julia programmers use in such a case as this? You don’t want to invent an ad-hoc data format and write an ad-hoc parser. Perhaps you use JSON?

In my particular applications, I need to be able to express repetition:

a = 1, 2, 5*3, 4

instead of

a = 1,2,3,3,3,3,3,4

If there is no such a convenient format, I would perhaps just write a julia module that contains the constants and use it as if it were a datafile and load it via:

include(ARGS[1])
using .Parameters

A downside of this approach (shared by Fortran’s namelist) is that it’s very hard to use the datafile from other languages. If that’s important enough, perhaps I should use some common format like JSON. . . .

I usually prefer to just use Julia source as the serialization format, if Julia is going the be the only consumer.

Note (perhaps this is tidier than the more obvious approach of includeing an entire script or module), it’s possible to include a Julia expression:

expr.jl:

vcat(1, 2, [3 for i ∈ 1:5], 4)

main.jl:

const a = (include("expr.jl"))

Small note: I think the using .Parameters is redundant, although it might be necessary in other cases (for example if the module hierarchy is more complex so you have to do something like using ..Parameters).

I know writing a parser could sound intimidating at first, specially coming from languages like Fortran or C++, but it is actually very easy to do in Julia.

I’m right now working on some parsing, and I found out that, for simple cases, all you need are a few string functions (like split), maybe store stuff in a Dict with variable names and values, and eventually just run eval.

I’m not aware that this is currently implemented in any present data format, so perhaps you do need to parse your text file with a custom function, which could be just a few lines of Julia.

For example, I would start with something like this.

So let’s say you read the first line in your file and get

str = "a = 1, 2, 5*3, 4"

You can then do:

var, exp = strip.(split(str,"="))
L = strip.(split(exp,","))

and then loop over the elements of L to find wether occursin("*",element) is true, and process that element.

One straightforward option for config files in Julia is a TOML file, which is used by Julia itself for Projects and Manifests.
A TOML file reader is part of the standard library.

2 Likes

Well, for files that are hand editable I prefer YAML: GitHub - JuliaData/YAML.jl: Parse yer YAMLs

Update: Both TOML and YAML support comments.

2 Likes

For simple settings, without arrays of deep nested structures or other complex structures, I think TOML is simpler and less error prone (not white-space/tabs dependent).
But things get really ugly with arrays or complex structures.
For more complex cases I do think YAML is a better choice.

2 Likes

I think the using .Parameters is redundant,

But I don’t know how to make names in a module available without using using:

# -- contents of samplemodule.jl ---
module Sample
export isample, csample
const isample = 3
const csample = 4 + 5im
end
# -- The main program ---
include("samplemodule.jl")
# using .Sample # -- doesn't work without this line.
println(Sample.csample) # works
println(isample) # fails without `using .Sample`

In addition, sometimes I want to import only some of the names:

using .Sample: isample

Sorry that I wasn’t clear in my initial post. I perfectly know what you say.

It’s not the difficulty of wring a parser for a simple grammar. An ad-hoc grammar is fragile for future changes. When you write the parser, you make a lot of implicit assumptions about the input. Then, in the future, when you extend the format of your input file, you break some of the assumptions you didn’t know you made and your program would sometimes fail until you realize the error and fix the parser.

A friend of mine is a programmer. Her program receives information from various external machines (hardware) in various, very simple, ad-hoc grammars. Sometimes the maker of a machine changes the format of its output, causing her program to fail in a mysterious way. After some debugging she discovers this change and modifies her parser. That’s one of her frequent problems. If hardware makers used one or other well-known formats such as YAML, changes to the data would not silently introduce strange values to her program. Depending on what changes are made to the input data, her program would likely detect the change and issue an appropriate error message.

For this reason, it’s often better to adopt one or another widely-used format, rather than writing a parser for an ad-hoc format. If you need only simple values, you may want to use INI, for example. It doesn’t seem hard to write a parser for INI files, but then there is already a package for that.

Thank you all for your ideas and helps!

For using various formats like TOML, I have a question. How do you turn the values in the file to julia variables? (Using a julia module as an input file doesn’t involve that problem.)

Thinking of that, I’m struck by this comment:

I don’t know how eval works in Julia, but guessing from other languages, you first build a text string which is a snippet of julia code, and evaluate it, thereby turning values in the text file into julia variables.

Perhaps, julia packages for YAML, TOML, etc. already do that for you?

You could make a struct for the parameters which will predefine the variables instead of using eval. For example,

struct params 
   a
   b 
   c
end
const p = params() # create an instance of params (you could also create this instance locally within a function and pass it around as arguments

function init_params() 
   toml_data = read_toml_file() 
   p.a = toml_data["a"] 
   p.b = toml_data["b"] 
   p.c = toml_data["c"]
end

note that the above is pseudocode, but this is usually how I do it.

1 Like

It depends on whether you know the names or your variables in advance or not. If I remember correctly, in Fortran namespaces you do. You declare a namespace with a list of variable names.

If this is the case, instead of using eval, you can “unpack” your Dict. I’m not in a computer now so I can’t check if the same syntax for unpacking named tuples works, but it is likely.

(; a, b, c, x, y, z) = Params

Where Params is a Dict (which is what you get when parsing Toml for example).

My bad, Dicts cannot be unpacked directly as named tuples, but a Dict can be converted to a Named Tuple with a helper function like this one

dict2ntuple(d) = NamedTuple{Tuple(Symbol.(keys(d)))}(values(d))

and then unpacked

Sorry, I prefer import over using in my packages, so as not to pollute the global namespace, so it simply hadn’t occured to me that’s what you’re after. Nevermind.

(; a, b, c, x, y, z) = Params

Thank you for your help! I get the idea. I think I could implement that.

But that means that there is no existing library that does all these for you, perhaps by using eval internally.

You can always use a julia module as a parameter file:

include("Pars.jl")
# use Pars.a, Pars.b, etc.

Here the include() function doesn’t know what variables are in the module Pars.

Likewise, writing a TOML file, you would be able to say something like:

import_vars("Pars.toml")
# use Pars.a, Pars.b, etc.

where the function import_vars() doesn’t know what variables are in the TOML file. . . .

I think you could write such a function by using eval (and perhaps Meta.parse ?) . . . but perhaps that would be overkill unless you really use this method often.