On modules and globals

Hello everybody. We have read everything we could find about globals, including this very helpful post How to correctly define and use global variables in the module in Julia? and are grappling with a use case that makes the absence of globals result in clumsy code. I was wondering if the community had any suggestions.

So in a module we have an expensive calculation that is done once and then returns an immutable tuple, and then we have a lot of reports made with that tuple.

The way we read the best practices is that one should write the code this way, and that is what we do.

using my_module
Data=ExpensiveCalculation()
ReportA(Data)
ReportB(Data)

But it feels very clumsy and the syntax in Python (like self.Data) and C++ is more elegant. It would be much better if one could just write

using my_module
ReportA()
ReportB()

And having the first call to a Report() (regardless of which report function one calls first) calculate Data and then make it available for all subsequent invocations. One could do that by having the first line of each report function do
Data=getData()

And then getData() would do the calculation on the first call and then just return what has been calculated on subsequent invocations.

Is there any way one can do that in a recommended manner in Julia?

Best regards, Jack

Why don’t you define Data as a constant global in your module? You could make it a Ref and set it on first Report call. See the global const machine_topology and https://github.com/JuliaParallel/Hwloc.jl/blob/549a5109da1c8ba7b2478f4dcf9855cf49438318/src/highlevel_api.jl#L60 as an example.

1 Like

Thanks carstenbauer
We did think of that, but we have all this discussion (in the linked to discussion and elsewhere) saying that such globals should not be done for a variety of reasons. So we got confused as to what is best practise.

(Disclaimer: I haven’t read the entire linked thread.)

Whether this caching approach with a global const is a good idea or not depends on the situation IMO. My feeling was that the author of the other thread wanted to use this strategy somewhat excessively (100 global constants) and primarily to maintain his Fortran way if thinking and programming. That’s probably not a great idea. But if you only want to cache one big thing, like in Hwloc.jl, I don’t see any real issues with this approach.

As an alternative, you could define a callable struct that internally caches the data (instead of globally). Think of a function that has an internal cache.

3 Likes

Having a const x = Ref{SomeType}() and setting that value later in code via x[] = value is a “trick” to have a type stable scalar value that can still be mutated. const x = SomeType() on the other hand means the value of x can’t be changed to another SomeType later, so that can only be done if the value is calculated immediately at definition.

2 Likes

You could also maybe use memoization with the function that creates your desired value? Then in your code you’d just call that function but the result would only be calculated the first time. https://github.com/JuliaCollections/Memoize.jl

1 Like

carstenbauer Thanks a lot!

That makes sense. We were made excessively worried by by all the discussions.

on Ref, I understand one needs to know the type beforehand? and if (like in our case) the structure of the Data variable is only determined when the calculation is made, so Ref would not work? (we have only seen mentions of Ref in a couple of posts here, so only have a fuzzy idea of what it is and how it works)

I find your idea of a callable struct that internally cashes data interesting. Feels like an objective in Python/C++ ? how would one go about doing such a beast?

all the best, Jack

1 Like

Thanks jules

My understanding is that with Ref{} one needs to know the type before hand, so if the type is determined by the calculation it will not work?

best, Jack

Wow jules I did not know the Memoize, its very cool. May solve our issue, and many others besides.

thanks for pointing us towards it!

best Jack

1 Like

I think you can go with:

const x = Ref(expression_returning_object)

But if your expression can return different types based on its input then it is type-unstable and this is already a red flag. Theoretically you should be able to infer the type of the result.

Just my two cents, but I prefer this structure much better. It is clear what is the report about, and you can change the data if needed without unpredictable outcomes.

4 Likes

leandromartinez98
I see where you are coming from. We may be influenced by having coded in C++ and python where our objects calculate Data when its instantiated. There is always the temptation to see a module as an object which is not right. But we equally don’t want to contaminate the global namespace with a variable that only has meaning to a module, and then having to keep track of the name. It can be safer to keep the data hidden. We get enough namespace collusions as is.

Another pattern that may be interesting is

module Mod
    export ReportA, ReportB
    const Data = Expensive()
    function ReportA(data=Data) ... end
    function ReportB(data=Data) ... end
end

With that you keep the “default” data hidden and internal to the module, but the functions still operate without global arguments.

2 Likes

leandromartinez98

Indeed, I can see that. An alternative to Memoize.jl.

Any views on which of these approaches would have best/worst performance if the Report() calculations were complicated and Data a structure with several dataframes and vectors?

I don’t think you can choose that because of performance, they may or may not be the same because of implementation details. But, in general, you wil be in a safer ground if the data is passed as a parameter to the functions. That also is a good practice for code modularity and maintaince.

I would guess that the name clashes you are reporting are another problem, which has its own ways in Julia.

1 Like