In place assignment of scalars vs vectors

Hi all,

I’m writing a performance critical function that will be called many times within an optimization loop.
I have several questions regarding how to make it as efficient as possible, but I’ll break those down so that each has it’s own reply and may be more useful for everyone learning how to code efficiently in Julia (like me).

I read that in-place assignment of vectors is very important to avoid spending time allocating the same variable every time a function is called, and I verified it with simple functions.
This is something that in C or Fortran I would do through static variables, but apparently they are not a thing in Julia (yet?), so we have to pass the variable to be modified as an input to the function as well, and do in-place assignment with lines like

b[:] = a

Now, what if I have to update a scalar as part of the same function?
I tried benchmarking the following function

function f1!(a,b)
           b = a
           return nothing
end

And I got

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     0.001 ns (0.00% GC)
  median time:      0.001 ns (0.00% GC)
  mean time:        0.027 ns (0.00% GC)
  maximum time:     0.100 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

So the function does not allocate as expected, but b is modified only inside the function: outside, where b was created and initialized, b still has the same value as before.

If I create the vector version of this function, I have to explicitly write the in place assignment

function f2!(a,b)
           b[:] = a
           return nothing
       end

which works as expected and does not allocate anything.

I cannot use f2! for scalars, because the indexing operation is not defined on those, so I have a properly(?) working function that works on vectors but not for scalars.

Can someone help me understand what’s wrong in anything I wrote?

Thanks

Numbers in julia are immutable (I assume by “scalar” you specifically mean “number”). So you can’t pass it to a function and have it modified. Doing b = a just assigns to the local variable b (x = y is always assignment, never mutation).
If your function just needs to compute a scalar, there’s probably no point in worrying about the allocation of a single number. Just return it.
If you want to use the same function for modifying both arrays and “mutable numbers” , you can use Ref, which is essentially a 0-dimensional array: f2!(1,Ref(2)). Or (less idiomatically) use an actual 0-dimensional array: f2!(a,fill(2)).
In the body of your function, b[:] = a won’t work for 0-dimensional b. You probably want b .= a, depending on what you’re trying to do. (A Ref can be modified as b[] = a, but that won’t work for vectors).

1 Like

To “update” a scalar, do:

function f(a,b)
   b = a
   return b
end

and call it with:

b = f(a,b)

This will “update” the scalar from the user perspective. It will not really update the variable in the same place in memory, but that is not a problem at all. Actually, that variable may be a temporary variable in the processor register, or the stack, etc, as in any other language.

For vectors, you can update in place, or use static vectors, which are just like scalars in that sense:

using StaticArrays
a = SVector{3,Float64}(0,0,0)
b = SVector{3,Float64}(1,1,1)
b = f(a,b) # use the same f as above

that “updated” b. (if you come for Fortran, you may be interest in these notes: Immutable variables · JuliaNotes.jl)

2 Likes

Hi yha,

so, I thought that by passing b to the function I was passing a variable, and my code was then in-place assigning a new value to it. I didn’t think of that operation as mutating a number (which is immutable).

I just tried with this function

function f1!(x,t0,state,control,tf)
           t0[:] = view(x,1)
           state[:] = view(x,2:8)
           control[:] = view(x,9:12)
           tf[:] = view(x,13)
           return nothing
       end
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     79.938 ns (0.00% GC)
  median time:      81.584 ns (0.00% GC)
  mean time:        101.097 ns (0.00% GC)
  maximum time:     412.346 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     972

which woks with 0 allocations, and does what I need it to do, except that I have to create t0 and tf as single element vectors or they won’t be in-place assigned.
This is probably not the most idiomatic solution, though.

No. Just don’t worry about modifying scalars in-place. Trust the compiler to re-use storage locations for scalars as needed.

The equivalent of a static variable, i.e. a persistent local variable, is to wrap the function in a let block that declares your static variables.

let static_var = 0
    global function foo(x)
        return static_var += x
    end
end

julia> foo(3)
3

julia> foo(3)
6

However, it is typically better to avoid these if possible (in any language, including C or Fortran), because this kind of hidden state can make it more difficult to understand the program, and makes the function non re-entrant and non thread-safe.

1 Like

Hi leandromartinez98,

your suggestion was the first thing I tried, but it was showing some memory allocations. It is obviously not a major point in terms of performance loss, I was just wondering if (or better how) it was possible to write a function that literally did 0 allocations, and to better understand the syntax.

A couple of weeks ago I read the link you kindly provided, but re-reading it now makes much more sense.

Thanks a lot!

1 Like

If you are seeing memory allocations from scalars then you have some other problem, e.g. you are using global variables or have a type instability. Reading the performance tips is essential.

1 Like

It is possible that it is only a benchmarking artifact, because the variable is being returned to the REPL. Try enclosing the same test into a wrapping function and the allocations should go away (if there aren’t other problems).

1 Like

@stevengj I don’t do either. I’m wondering if it is coming from using logical indexing to extract the scalar I need.

function f3!(a,idx)
                  b = view(a,idx)
                  return nothing
       end
@benchmark f3!(a,[false;false;true])
BenchmarkTools.Trial: 
  memory estimate:  192 bytes
  allocs estimate:  2
  --------------
  minimum time:     124.363 ns (0.00% GC)
  median time:      132.780 ns (0.00% GC)
  mean time:        151.210 ns (2.50% GC)
  maximum time:     1.958 μs (88.03% GC)
  --------------
  samples:          10000
  evals/sample:     903
@benchmark f3!(a,[3])
BenchmarkTools.Trial: 
  memory estimate:  96 bytes
  allocs estimate:  1
  --------------
  minimum time:     53.191 ns (0.00% GC)
  median time:      54.407 ns (0.00% GC)
  mean time:        62.398 ns (2.53% GC)
  maximum time:     1.134 μs (91.62% GC)
  --------------
  samples:          10000
  evals/sample:     987

You are allocating the [false;false;true] array on every benchmark iteration. Also you should interpolate global variables when benchmarking to avoid dynamic dispatch:

julia> @btime f3!($a,$([false;false;true]));
  86.430 ns (1 allocation: 96 bytes)

Also, the view object allocates in this case, but it doesn’t allocate in all cases:

julia> @btime f3!($a,$([3]));
  12.524 ns (0 allocations: 0 bytes)

julia> @btime f3!($a,2);
  1.617 ns (0 allocations: 0 bytes)

What are you actually trying to do?

2 Likes

As I said above, I have a vector that encodes the solution of a nonlinear optimization problem (an optimal control problem, actually). I need to extract the values of initial time, final time, states, controls, etc, in order to perform the necessary computations. Some of those values are scalars (initial time, final time) others are vectors. As I was trying to write an efficient function and I stumbled upon this different behavior for vector vs numbers: coming from other languages, I thought the scalar was always allocated, while this is not necessarily true in Julia.

Yes, I think that also may be a possibility: as @stevengj noted I was not interpolating the global variables when benchmarking.

How about something like:

julia> using Parameters # for @unpack

julia> struct Solution
         t0::Float64
         tf::Float64
         vector::Vector{Float64}
       end

julia> function update!(s)
         @unpack t0, tf, vector = s
         t0 = 5
         tf = 10
         vector[1] = 10.
         return Solution(t0,tf,vector)
       end
update! (generic function with 1 method)

julia> s = Solution(0,0,[0.,0.])
Solution(0.0, 0.0, [0.0, 0.0])

julia> s = update!(s)
Solution(5.0, 10.0, [10.0, 0.0])

julia> using BenchmarkTools

julia> @btime update!(v) setup=(v=Solution(0,0,[0.,0.])) evals=1
  26.000 ns (0 allocations: 0 bytes)
Solution(5.0, 10.0, [10.0, 0.0])

1 Like

This is an excellent suggestion! I was not aware of the Parameters/unpack package.

1 Like

it is just a syntax sugar to

t0 = s.t0
tf = s.tf
vector = s.vector

it is not necessarily true in other languages either, but in languages in which one declare every variable (as Fortran) we have the feeling that all is allocated. If it is or not will depend on what the compiler decides.

Every variable in every language is stored somewhere (edit: unless it is optimized away entirely). But the only storage you need to worry about in performance optimization (to a first approximation) is allocation on the heap. “Scalar” variables (as opposed to variable-sized containers) whose types are known to the compiler can be stored on the stack or even in registers, and you should typically think of their “allocation” as essentially free.

As Knuth wrote:

People who are more than casually interested in computers should have at least some idea of what the underlying hardware is like. Otherwise the programs they write will be pretty weird.

I would say the same thing about having some idea of what compilers (and language runtimes) actually do. Otherwise you end up writing weird programs because they are “optimized” for a mental performance model that is completely disconnected from reality.

1 Like

Except those that the compiler can eliminate, like (of course you know that):

julia> function f(x)
         y = 2*x
         z = y
         return z
       end
f (generic function with 1 method)

julia> @code_llvm f(1)
;  @ REPL[2]:1 within `f'
define i64 @julia_f_298(i64 signext %0) {
top:
;  @ REPL[2]:2 within `f'
; ┌ @ int.jl:88 within `*'
   %1 = shl i64 %0, 1
; └
;  @ REPL[2]:4 within `f'
  ret i64 %1
}

julia> function g(x)
         y = 2*x
         return y
       end
g (generic function with 1 method)

julia> @code_llvm g(1)
;  @ REPL[5]:1 within `g'
define i64 @julia_g_319(i64 signext %0) {
top:
;  @ REPL[5]:2 within `g'
; ┌ @ int.jl:88 within `*'
   %1 = shl i64 %0, 1
; └
;  @ REPL[5]:3 within `g'
  ret i64 %1
}

In Fortran one would have done something like

function f(x)
  double precision :: f, x, y, z
  y = 2*x
  z = y
  f = z
end function f

and what happens is that we have all the impression that z (and even y) are allocated somewhere, and that is not necessarily true either, the compiler will probably produce the same code as the above.

In this example the variable y is indeed computed and stored (in a register).

(Of course, the compiler can completely eliminate variables whose values are not used.)

1 Like

I was referring to z in the example.

The value of z is stored too. It just refers to the same register as y.

Of course, the compiler may in some cases be able to combine a series of calculations in your code so that intermediate values are never computed or stored directly:

function foo(x)
    y = 2x
    return 2y
end

julia> @code_llvm foo(1)
define i64 @julia_foo_195(i64 signext %0) {
top:
   %1 = shl i64 %0, 2
  ret i64 %1

i.e. it elides the intermediate computation of 2x (which is indeed never stored anywhere) and just computes 4x directly.

1 Like