Global variables / performance / data passing


#1

One of the performance tips of Julia is not using global variables. I have a problem in which
I need to pass a lot of data to a function, but the function needs to be called in a specific way,
and the data must be passed “under the hood” to the function. In Fortran I used modules for that
purpose (or commons, before that), and the variables of the data were visible only for the functions
that explicitly appealed to them (i. e. use Data). I cannot find, or understand, a satisfactory alternative in Julia. For example:

struct Data
  a :: Int64
end
data = Data(2)

function foo(x :: Int64)
  return data.a * x
end

x = 1
foo(x)

First, it seems that “data” is a global variable. I am not sure if passing data like this results in a performance loss. If so, how to improve that, given that I cannot change the syntax of the function call? Is the use of data as in the example above worse than the use of modules in Fortran, or is it the same sort of thing?

Second, it bothers me that I do not need to explicitly state inside the function that I will be using the
data of the global variable “data”. Should I add a “global data” to make that explicitly as good practice?
What confuses me here is that declaring the variable as global changes how the function sees the data, as a modification of its values results, or not, in an undefined error:

b=1
function foo()
  global b
  b = b + 1
end
foo()
function foo_error()
  b = b + 1
end
foo_error() # ERROR: b not defined
function foo2()
  return b + 1
end
foo2() # But this works

Additionally, can I change the name of the variable in the local scope? (That is, something like “using data as mydata”).


#2

I’m confused, why can’t you just do

function foo(d::Data, x::Int64) 

?


#3

Because the function will be called by some other routine (an optimization routine, for instance), which was implemented such that it calls the function “function f(x)” and the gradient “function g(x)”.

If I change the call to the function I would need to change that routine, which was not implemented by me. I would not want to modify those routines.

At the same time, I wander if passing the data that way, as an argument, results in any difference in performance. Does it?


#4

Overall, it sounds like you need to re-think the way you are approaching the problem. There is definitely a performance hit to modifying global variables in every function call. It is also hard to reason about code that is always calling back to some global state.

With regards to the thee scoping examples you listed

  1. you can’t modify a variable in that scope without first declaring it global
  2. That is why (2) works.
  3. You can “read” a variables value without modifying it, which is why (3) works.

With regards to “renaming” a variable, there is the @as macro in Lazy.jl, but I would encourage you to be more clear about the more general problem you are trying to solve rather than asking how to implement a particular solution.


#5

Oh, thanks. To be clear, I am not modifying the variables of the data on every function call, I will only be using them.

The problem itself is pretty complex, but it is a typical situation in Fortran programming (I am not sure how typical it is in other languages). I am using a routine, lets say, a conjugate-gradient optimization routine, which itself calls a function that computes the gradient of my objective function. The call for the function in the CG routine has the syntax " g = gradient(x) ".

However, to compute my objective function and its gradient I need to use a lot of data, but I do not want to modify every call “gradient(x)” inside the CG routine. Therefore, the gradient routine must receive only “x” as an argument, but the other data some other way. This is solved in old fortran using “commons” (which where a mess), and in modern Fortran using its modules, which are quite elegant.

The data in the Fortran modules are visible to all routines that “use” the module, such that the data is not “global” in the sense that every routine sees it, but it can be accesses by any function that requires it.

The example above does that in Julia, but the data becomes truly “global”, that is, visible by every function, which is not necessarily a problem, although seems less elegant. And, on the other side, I am not sure if the performance is the same as the passing of variables as in fortran modules.

I admittedly might be in need to think the problem in some way which is more natural in Julia, as for the moment my brain is still trying to translate Fortran code, which is probably not the best alternative.

Thank you.


#6

If the data is not type changing, you can use const to improve performance but scoping rules are just as you see.


#7

The data must be initialized for every problem, so I think the structs are the way to go. Thanks.


#8

Check out my blog for a good technique. You can use a function that accepts the data and return a function. See https://www.codementor.io/zhuojiadai/julia-vs-r-vs-python-simple-optimization-gnqi4njro


#9

It can still be struct

struct Data
  a :: Int64
end
const data = Data(2)

function foo(x :: Int64)
  return data.a * x
end

x = 1
foo(x)

Your simple usage looks like what Ref and const annotation on global variables do

const data = Ref(2)

function foo(x :: Int64)
  data[]=data[]+x
end

x = 1
foo(x)

#10

Read up on “closures” : optim(x - > f(x, data)) is the standard way to pass functions that have parameters, . Beware of a nasty performance bug https://github.com/JuliaLang/julia/issues/15276, and use @code_warntype to diagnostic it.


#11

(While I was writing this, there appeared a lot of responses, so no need anymore XD but
anyway…)

Because I’m also from Fortran background, just my 2 cent for making a similar thing for module variables in Fortran.

# data_m.jl
module data_m

mutable struct data_t
    n :: Int64
end

const data1 = data_t( 1 )  # this is fast

data2 = data_t( 1 )  # this is slow

end
# prog1.jl
import data_m

function test1()
    dat = data_m.data1
    dat.n = 100
end

function test2()
    dat = data_m.data2
    dat.n = 200
end

function output()
    @show data_m.data1.n
    @show data_m.data2.n
end

test1()
test2()
output()

I think the above approach works in a way similar to module variables in Fortran. I usually include such data into a const mutable struct, because otherwise data becomes of type “Any” (which is slow…). This can be checked as

using InteractiveUtils
@code_warntype test1()  # shows no Any
@code_warntype test2()  # shows Any

and a small test

import data_m

function calc1( niter )
    n = data_m.data1.n
    return sum( n for i = 1:niter )
end

function calc2( niter )
    n = data_m.data2.n
    return sum( n for i = 1:niter )
end

using BenchmarkTools
niter = 10^7
@btime calc1( niter )  # 37 ns
@btime calc2( niter )  # 367 ns

But recently, I’ve learned how to use lambda and local functions (as mentioned also in the other replies), which may be more convenient. For example, I make a potential energy and gradient routines that capture various parameters from outside and have a specified interface (to pass the function to other library routines). I think this latter approach is similar to passing internal procedures in Fortran to other library routines.


#12

You almost certainly want to use a closure for this. A closure will (a) completely avoid the performance penalty of global variables and (b) avoid the potential confusion of using global variables.

For example, let’s say you have an optimization routine (perhaps one you didn’t write yourself) that expects a function of one variable:

function optimize(f)
  # not a very good optimizer, but it's trying its best...
  if f(1) < f(2)
    return 1
  else
    return 2
  end
end

And now let’s say you have your own function whose behavior depends on your data struct as well as the input to be optimized:

function foo(data::Data, x::Integer)
  data.a + x
end

To optimize your function foo given a specific data struct, you can do the following:

function choose_the_best_x(data::Data)
  function_to_optimize = (x) -> foo(data, x) # this creates a new function of one variable (x) which "closes over" the current value of `data`
  optimize(function_to_optimize)
end
julia> choose_the_best_x(Data(2))
1

This pattern is extremely common in Julia and it works very well. Closures in Julia are exactly as fast as any other function, so there is no penalty at all for creating the new function (x) -> foo(data, x). In this way, you can pass the optimize function exactly what it’s expecting without relying on any sort of global variables.


#13

Thank you very much for all the answers. This was a true lecture. I got the idea of the constant constructs and closures, and both are excellent.


#14

Another example using closure.

In my case I use different Evolutionary Algorithms to optimize a function. I am going to participate in a competition http://cec2019.org/programs/competitions.html#cec-06, and I have to save in a file the number of evaluations to obtain different degree of accuracy. Because the algorithms are already done, and I do not want to change for storing the information for the table, I have made the following:

function getEvalFun(fitness, optim::Float64, io::IO, prefix="")
counter = 0
digit = 1.0
numdigit = 0

if (prefix != "")
    prefix = "$prefix, "
end

evalsol(solution) = begin
    if numdigit > 100
        return 0
    end

    fit = fitness(solution)
    counter += 1
    dif = fit-optim

    while dif < digit && numdigit < 10
        numdigit += 1
        println(io, "$prefix$numdigit, $counter, $fit")
        digit /= 10.0
        dif = fit-optim
    end

    return fit

end
return evalsol

end

So, I can do

new_fitness_fun = Comp100digit.getEvalFun(original_fitness_fun, optim, io)

And when the optimization function use new_fitness_fun to eval new vectors, all the required is transparently stored into the output file, without changing the original optimization algorithm.


#15

Nice, writing the source code I have detected and fixed an error in the code :grinning:.


#16

A question on closures: I noticed now that a function with one parameter less
can be defined simply as in the example:

const a = 1
foo(x,a) = a*x
fuu = (x) -> foo(x,a)
fii(x) = foo(x,a)

function test(x)
  println(fii(x))
  println(fuu(x))
end

test(2)

Are fii(x) and fuu(x) completely equivalent? Both seem to work identically in simple tests.


#17

Answering my own question: clearly not. If I redefine the constant “a” an evaluate test(2), the result of fii changes, the result of foo does not. If a is not a constant, then both change.

Therefore, if using closures, care must be taken if the data varies anywhere after its first definition, right?

(I understand then the possible performance gain in using const a and closures).


#18

I think you may have a hard time avoiding const.

Closures are wonderful, but, if you define closures as global variables:

const a = 1
foo(x,a) = a*x
fuu = (x) -> foo(x,a)
fii(x) = foo(x,a)

They are just that, variables, whose types cannot be inferred.

julia> @code_warntype fuu(2) # good
Body::Int64
1 ─ %1 = Main.a::Core.Compiler.Const(1, false)
│   %2 = (Base.mul_int)(%1, x)::Int64
└──      return %2

julia> @code_warntype fii(2) # good
Body::Int64
1 ─ %1 = Main.a::Core.Compiler.Const(1, false)
│   %2 = (Base.mul_int)(%1, x)::Int64
└──      return %2

julia> bar(x) = fuu(x) # trying to use fuu from another module
bar (generic function with 1 method)

julia> @code_warntype bar(2) # yikes
Body::Any
1 ─ %1 = (Main.fuu)(x)::Any
└──      return %1

(Note that you can just copy and paste the above code into the REPL; Julia will automatically delete the julia>s.)

You could do

julia> mutable struct Foo{T} <: Function
           a::T
       end

julia> (f::Foo)(x) = f.a * x

julia> const foo_instance = Foo(1)
(::Foo{Int64}) (generic function with 1 method)

julia> foo_instance(2)
2

julia> @code_warntype foo_instance(2)
Body::Int64
1 ─ %1 = (Base.getfield)(f, :a)::Int64
│   %2 = (Base.mul_int)(%1, x)::Int64
└──      return %2

julia> bar2(x) = foo_instance(x)
bar2 (generic function with 1 method)

julia> @code_warntype bar2(2)
Body::Int64
1 ─ %1 = Main.foo_instance::Core.Compiler.Const(Foo{Int64}(1), false)
│   %2 = (Base.getfield)(%1, :a)::Int64
│   %3 = (Base.mul_int)(%2, x)::Int64
└──      return %3

julia> foo_instance.a = 4
4

julia> bar2(2)
8

But then your program is still dependent on global state. Common blocks are bad.
They make managing things more complicated. Especially multithreading, if you ever plan on using @threads.


#19

No. The syntax fii(x) = ... makes fii a const variable bound to a new function object (assuming the name fii is not currently bound). If you later do, e.g.,

fii(x::Number) = x

this adds a method to the same function.
fuu = x->foo(x,a) binds the variable fuu to the function defined on the right-hand side. Since you did not declare fuu to be const, it’s a non-const global, so you will have the same kind of performance problems in functions that use it. Also, if you later write

fuu = x->foo(x,0)

this rebinds the name fuu to a different function rather than adding a method.
So, if you want to define a closure in global scope, you should probably declare it either as fii(x) = ... or const fii = x->....


#20

Ideally, you should make the closure inside a function, and then call the inner function there. This should avoid the issue with globals. Eg mock code:

function find_parametric_optimum(g, a)
    optimize(x -> g(x, a), ...)
end

The contents of a can change (eg it can be used for caching etc), but its type shouldn’t — however, with the above design, that is actually impossible, so you are safe :wink: