Global variables and program structure

I am not proud of my program structure.

My code is contained within a single module,
my main variables are contained in 8 mutable composite types
sometimes with as many as 30 type-annotated fields.
Some of my functions create the main variables as const globals
and these are conveniently used at various places in other functions.
Exported variables and functions are used from the REPL.

I know that this is considered bad programming practice
and type inference may fail at some places.
So I have tried hard to pass these variables as arguments
to all my functions but I have never succeded.
With several call levels the code becomes too ugly
and quick program development becomes a pain.

I really expect the arrival of type-annotated globals,
and hope that any performance penalty will be gone.

But please advise:

  • Is this program structure really that terrible?
  • How do you arrange your code in a better way?
1 Like

One of the main problems with that style is that your logic is non-local and hard to debug.

Put your variables into structs, and pass those around (see Parameters.jl for some helper macros). Break your functions up into smaller ones which only do one thing and thus only need access to a few variables. Compose those smaller functions into larger ones.

There is definitely a trade-off between quick programming and writing maintainable code. But at least for bigger projects it does pay off as eventually there will be code maintenance and refactoring.

4 Likes

My variables are already packed into structs.
Passing them around is the problem.

When variables are packed logically into 8 different structs
passing around various combinations of 8 different structs
as function arguments can become really complex.

Consider the extreme case:
One may pack ALL main variables (hundreds of them) into a SINGLE struct,
then only a single extra argument must be passed at each function call.

But is this really a better programming style?

It is much easier to talk about program structure with concrete code in mind. If you post a minimal working example, you may get more specific advice. Sometimes the effort to create a MWE itself is helpful in reorganizing your program.

3 Likes

I think this is an unlikely venue to get you a satisfactory answer to such a broad and not really Julia specific question. I see two possibilities how this forum could help you along: recommended books and example codes. If you are into physical models then this relatively simple code which calculates steady state subglacial drainage may be of use: Bitbucket.

Maybe someone knows a good book? Or more/better example packages?

Thank you for the code example.
And yes, other examples are welcome.

1 Like

turtle: What did you ultimately decide to do with your program structure? I have the exact same problem that you mentioned here, and am very interested in what you ultimately did.

mauro3: I disagree that this is not a “Julia specific” conversation. Passing parameters around, and dealing with the scope of parameter spaces, and dealing with modules vs classes (in other languages) vs methods (in other languages), and the structure of using/passing the parameters in functions, and organizing functions within classes or within modules or what not, is quintessentially specific to the language. A true “object oriented” language would execute these tasks differently than a “function-based” language, which in turn would be different than whatever Julia is emerging to become. As Julia emerges into the world, it’s important for people to discuss the best program structure and methods for passing parameters.

1 Like

I was unable to find a more convenient option.
My program still defines some global structures within functions,
just at v0.7.0 I was forced to change const global to global.
Of course I tried to minimize their occurrences and tested that the speed penalty is tolerable

So my use case is still this:
All my functions live in a single module.
I use the REPL for interactive calculations,
and most calculations are based on functions of the module.
A few functions of this module create some global structures,
and these are accessible to all other functions of the module.

This program structure saves an awful lot of typing in the REPL,
because some main variables are not part of all function argument lists.
Also, as a single function call is shorter, a single command line
can contain several function calls as a “mini project” – without any visual clutter.

Again, hard to say without seeing some code, but it is almost certain that there is a better way to do it. Eg even just grouping all those global variables in a struct and passing them around could be a huge improvement with little extra typing.

All my important variables are already grouped into 3 logically disjunct global structs.

Do you suggest to group all unrelated variables into a single global struct?
(Long ago I tried it, but it did not feel logical.)

You keep missing this, so for the third time: it is very difficult to suggest anything concrete without seeing the code, or an MWE.

I think I have the exact same problem as turtle, and here is my example solution: define structures (which behave exactly like const globals) and hope that I’m not taking a performance hit for doing so. See code below:

module workspace
    module stuff1
        struct teststruct
            x::Array{Float64,1} #this will be a 1000-element array that behaves like a const global array. I can mutate it's elements!
            y::Array{Float64,1} #this will be a 1-element array that behaves like a const global array. I can mutate it's elements!
            z::Float64          #this will be a value that I cannot mutate!
        end
        data=teststruct(rand(1000),[6.7],9.1);
        println(data.x[1:10])
    end
    module run_calculations1
        import ..stuff1.data
        #define further parameters here
        r=8.0
        function run_simulation(data,r)
            s = 0.0
            y=1.0
            for i in 1:length(data.x)
                for j in 1:length(data.x)
                    data.x[i] = rand(1)[1]*y*data.z+data.y[1]+r
                end
            end
            return data
        end
    end
    time() = @time y=run_calculations1.run_simulation(stuff1.data,run_calculations1.r)
    time()
    time()
    time()
    time()
end

That hope is unfortunately in vain. data and r are global variables and should be const if you want them to not slow down your code.

1 Like

I believe the variable “r” is local scope inside the function because I passed it into the function as an argument to the function. As far as “data”, as far as I’ve seen and heard, Julia structures are automatically constant globals! Constant globals run just as fast as locals – as far as I’m aware.

I’ve done the benchmarking, and this assumption appears to be true, because the benchmark times for the following code shows great performance with my global structure “data” vs local variables.

Compare this code and benchmark:

module test
    module stuff1
        struct teststruct
            x::Array{Float64,1} #this will be a 1000-element array that behaves like a const global array. I can mutate it's elements!
            y::Array{Float64,1} #this will be a 1-element array that behaves like a const global array. I can mutate it's elements!
            z::Float64          #this will be a value that I cannot mutate!
        end
        data=teststruct(rand(1000),[6.7],9.1);
        println(data.x[1:10])
    end
    module run_calculations1
        import ..stuff1.data
        #define further parameters here
        r=8.0
        function run_simulation(data,r)
            s = 0.0
            y=1.0
            for i in 1:length(data.x)
                for j in 1:length(data.x)
                    data.x[i] = rand(1)[1]*y*data.z+data.y[1]+r
                end
            end
            return data
        end
    end
    time() = @time y=run_calculations1.run_simulation(stuff1.data,run_calculations1.r)
    time()
    time()
    time()
    time()
end
OUTPUTS:
[0.323492, 0.235494, 0.51633, 0.784921, 0.66276, 0.312472, 0.72803, 0.40744, 0.974881, 0.852874]
  0.139204 seconds (1.04 M allocations: 93.487 MiB, 22.53% gc time)
  0.101172 seconds (1000.00 k allocations: 91.553 MiB, 31.57% gc time)
  0.097960 seconds (1000.00 k allocations: 91.553 MiB, 31.24% gc time)
  0.085701 seconds (1000.00 k allocations: 91.553 MiB, 27.62% gc time)

with this one:

module test
   module stuff1
       struct teststruct
           x::Array{Float64,1} #this will be a 1000-element array that behaves like a const global array. I can mutate it's elements!
           y::Array{Float64,1} #this will be a 1-element array that behaves like a const global array. I can mutate it's elements!
           z::Float64          #this will be a value that I cannot mutate!
       end
       data=teststruct(rand(1000),[6.7],9.1);
       println(data.x[1:10])
   end
   module run_calculations1
       import ..stuff1.data
       #define further parameters here
       function run_simulation()
           s = 0.0
           y=1.0
           x=[1.0]
           for i in 1:1000
               for j in 1:1000
                   x[1] = rand(1)[1]*y*9.1+6.7+8.0
               end
           end
           return x
       end
   end
   time() = @time y=run_calculations1.run_simulation()
   time()
   time()
   time()
   time()
end
OUTPUTS:
[0.598739, 0.238504, 0.570769, 0.768084, 0.30716, 0.0337915, 0.361736, 0.715133, 0.96916, 0.987557]
  0.100308 seconds (1.00 M allocations: 91.553 MiB, 26.71% gc time)
  0.091171 seconds (1.00 M allocations: 91.553 MiB, 21.15% gc time)
  0.077898 seconds (1.00 M allocations: 91.553 MiB, 22.52% gc time)
  0.084541 seconds (1.00 M allocations: 91.553 MiB, 22.64% gc time)

@kristoffer.carlsson I’d be excited to learn some way to improve this example code’s performance. It’s the best I got now.

I missed that you passed them into the function. If that’s the case, it is fine (this technique is called a function barrier).

1 Like

Not sure how helpful this comment is, but I have taken to always including global variables as default keyworld arguments. For example

const GLOBAL_VALUE = 1

function somefunction(a, b; global_value=GLOBAL_VALUE)
    # do stuff
end

Of course, this only works for global variables which are basically immutable (I say “basically”, because you could, for example, initialize some global container during __init__()). An even better practice is to set the default to the call of another function that reads in a config file (I like TOML), but this is only really appropriate with memoization.

Another thing that I tend to do to eliminate global values is to stick things which originally I had envisioned as global into my structs. About 90% of the time this is the better coding choice anyway (i.e. because the value wasn’t really global).

The vast majority of the time, I find that global variables really are unnecessary, and I was better off without them in the first place. The one thing I have yet to really feel like I’ve cracked is memoization. I really want a nice drop-in memoization scheme that I feel comfortable with, and haven’t figured out anything I’m totally satisfied with yet. I think for me the ultimate solution is to have a @moizable macro or something like that that one calls when declaring structs which makes that struct appropriate for memoizing values into. The details, however, are messy.

2 Likes