Julia in-place modification and memory usages

ZhiyuanYao · June 8, 2020, 1:33am

I’m struggling to understand the in-place modification operator .= in Julia 1.0+ in the following examples,

julia> b = rand(1000, 1000)

julia> @time begin
       Ham = zeros(1000, 1000)
       for i in 1:100
           global Ham .= Ham .+ b
       end
       println(Ham[1, 1])
       end
96.71386850591365
  0.092689 seconds (215 allocations: 7.634 MiB, 1.67% gc time)

julia> @time begin
       Ham = zeros(1000, 1000)
       for i in 1:100
           global Ham .= Ham + b
       end
       println(Ham[1, 1])
       end
96.71386850591365
  0.199746 seconds (416 allocations: 770.580 MiB, 2.23% gc time)

julia> @time begin
       Ham = zeros(1000, 1000)
       for i in 1:100
           global Ham = Ham + b
       end
       println(Ham[1, 1])
       end
96.71386850591365
  0.264974 seconds (215 allocations: 770.577 MiB, 31.54% gc time)

It seems that the second option Ham .= Ham + b still creates a tmp copy of Ham + b, and only pair .= it with .+ will we be able to avoid allocating the tmp copy. Why the implememtation/semantics did it in this way? What is wrong in implementing Ham .= Ham + b to have the same effect as Ham .= Ham .+ b?

Another problem I had is

julia> a = 1;
julia> a .= 1 .+ 1
ERROR: MethodError: no method matching copyto!(::Int64, ::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{0},Tuple{},typeof(+),Tuple{Int64,Int64}})
Closest candidates are:
  copyto!(::AbstractArray, ::Base.Broadcast.Broadcasted{#s617,Axes,F,Args} where Args<:Tuple where F where Axes where #s617<:Base.Broadcast.AbstractArrayStyle{0}) at broadcast.jl:848
  copyto!(::AbstractArray, ::Base.Broadcast.Broadcasted) at broadcast.jl:842
  copyto!(::AbstractArray, ::Any) at abstractarray.jl:720
Stacktrace:
 [1] materialize!(::Int64, ::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{0},Nothing,typeof(+),Tuple{Int64,Int64}}) at ./broadcast.jl:801
 [2] top-level scope at REPL[22]:1

It seems to me .= in-place modification, by definition, need a to be mutable, for example an array instead of a primitive type Int here. However, what surprise me is the error information. What is the relationship between .= and copyto!?

affans · June 8, 2020, 2:14am

Someone else may be better suited to answer you but . is not “in place modification”. It’s the mechanism for broadcasting (or more familiar to R/Matlab users as vectorization).

Oscar_Smith · June 8, 2020, 2:24am

The way to think about .= is that it element-wise copies the right hand side to the LHS. In the case of Ham .= Ham + b, the RHS Ham+b is an array that needs to be allocated somewhere.
Julia has something called “Dot Fusion” which means that an expression like f.(a) .+ g.(a) gets turned (roughly this is lies for simplicity) into map(x ->f(x)+g(x), a) which is why Ham .= Ham .+ b does what you want.
One simple way to do what you want is .+=

ZhiyuanYao · June 8, 2020, 2:47am

I guess the reason not implementing Ham .= Ham + b to have the same effect as Ham .= Ham .+ b is for consistency.

Moreover, nested f.(args...) calls are fused into a single broadcast loop. … Technically, the fusion stops as soon as a “non-dot” function call is encountered; for example, in sin.(sort(cos (X))) the sin and cos loops cannot be merged because of the intervening sort function.

The presence of + instead of .+ in Ham .= Ham + b stop the dot fusion, and a tmp copy of Ham + b is created for semantics consistency.

Even the first part of the problem can be understood in this way, it remains to understand the relationship between .= and copyto!. The dot operation is equivalent to broadcast, when does copyto! enter?

ettersi · June 8, 2020, 2:58am

copyto! is part of how broadcasting is implemented in Julia, see here.

However, the error message for a = 1; a .= 1 .+ 1 is mostly just a really bad one, and I would recommend you do not invest too much time trying to figure out how this particular error message came about if you are only interested in figuring out how to use broadcasting in practice.

ettersi · June 8, 2020, 3:12am

I guess you are right that c .= a + b is probably almost always a typo and c .= a .+ b is what the programmer actually meant to write, but this phenomenon is specific to +. For other operations, whether or not you add the . on the right hand side can make a difference as to what result you obtain. Example:

julia> a = zeros(2,2)
       b = rand(2,2)

       display(a .= exp(b)) # Compute matrix exponential
       display(a .= exp.(b)) # Apply exp to each entry

2×2 Array{Float64,2}:
 1.46926   1.38004
 0.722834  1.96309
2×2 Array{Float64,2}:
 1.16331  2.53122
 1.6265   1.62189

The point then is that yes, Julia could try to be helpful and automatically replace a .= b + c with a .= b .+ c, but that would severely complicate the rules for when you have to add the dot and in the long run that would almost surely make things worse.

ZhiyuanYao · June 8, 2020, 3:25am

Two questions regarding you reply.

is .= playing a role in a .= exp(b) that eliminate the tmp array exp(b)? If not, a = exp(b) will simply do and extra confusion is raised using .=.
By complicating the rule, do you mean breaking the simple rule of dot fusion I quoted and my interpretation of of + in my case in the reply to @Oscar_Smith ?

ettersi · June 8, 2020, 3:28am

See below.
Yes.

ZhiyuanYao · June 8, 2020, 5:19am

I did a test and found that a = exp(b) and a .= exp(b) are not different in terms of memory and performance: in both cases tmp array exp(b) is allocated. Is there a way to overcome this performance bottleneck of allocating tmp array?

baggepinnen · June 8, 2020, 5:35am

You can always figure out what code is lowered to by

julia> Meta.@lower a .= exp(b)
:($(Expr(:thunk, CodeInfo(
    @ none within `top-level scope'
1 ─ %1 = exp(b)
│   %2 = Base.broadcasted(Base.identity, %1)
│   %3 = Base.materialize!(a, %2)
└──      return %3
))))

From this, we can tell that the matrix exponential is computed before it inters the broadcast machinery, and the allocations are thus not saved.

It’s important to keep in mind that the types of a and b could at the lowering stage be anything, i.e., they do not have to be your standard arrays. In particular, they could have methods implemented for both exp and Base.broadcasted(Base.identity, ::CustomTypeOfB) that behaves differently. As a motivating example, consider a “LazyArray” la that returns Applied(exp, la) which overloads Base.materialize!(a, ::Applied(exp, la)) to compute the matrix exponential in-place.

ettersi · June 8, 2020, 5:58am

As pointed out by @baggepinnen, a .= exp(b) is internally translated to the two statements

evaluate tmp = exp(b), and
copy tmp into a.

To avoid the temporary, you would thus have to use a version of exp which takes the output array as an argument to the function call, i.e. something like exp!(a,b) (the ! indicates that the function modifies some of its arguments, in this case a). As far as I know, Julia does not provide such a function.

Nevertheless, writing a .= exp(b) can make sense if your application requires that exp(b) appears in the memory location currently pointed to by a. This could be the case e.g. if a is shared among different threads.
For illustration, consider

julia> a = rand(2,2)
       b = rand(2,2)

       @show pointer(a)
       a .= exp(b)
       @show pointer(a)
       a = exp(b)
       @show pointer(a)
       ;
pointer(a) = Ptr{Float64} @0x00007f5a9909cd00
pointer(a) = Ptr{Float64} @0x00007f5a9909cd00
pointer(a) = Ptr{Float64} @0x00007f5a9909f1c0

and note how the first two @shows print the same memory address but the third prints a different address.

ZhiyuanYao · June 8, 2020, 6:00am

That sounds awesome! However, I’m in my first week of Julia and probably get you wrong,

using LazyArrays, LinearAlgebra
@time begin
Ham = zeros(100, 100)
for i in 1:100
    global Ham .= applied(exp, a)
end
println(Ham[1, 1])
end

8.367715492755122e19
  0.243315 seconds (7.47 k allocations: 145.544 MiB, 1.64% gc time)

still allocate the memory for me. Could you give me a complete code example for me to get started? Thanks a lot!

baggepinnen · June 8, 2020, 6:10am

This is probably due to the LazyArrays package not having implemented the special method required for exp. They do mention in the README GitHub - JuliaArrays/LazyArrays.jl: Lazy arrays and linear algebra in Julia that there are some missing optimizations.

ZhiyuanYao · June 8, 2020, 6:34am

That clarifies, thanks a lot!

ZhiyuanYao · June 8, 2020, 6:35am

I see. Thanks for pointing out LazyArrays, though not matured, it is already quite useful!

lungben · June 8, 2020, 6:36am

An other point: in your benchmark, you are using global variables, which is very slow, see Performance Tips · The Julia Language

DNF · June 8, 2020, 7:24am

They actually are different, and a .= exp(b) is more expensive performance-wise. That is because it first calculates exp(b) into a temporary array, and then copies the values into a. In contrast, a = exp(b) skips this last step. It just calculates exp(b) into a temporary array, and then names that array as a.

Here’s the difference:

With dot:

temp = exp(b)
a .= temp  # this copies values and takes time

Without dot:

temp = exp(b)
a = temp  # this just applies a name, and has no performance cost

The reason you’re not seeing it in your tests is that matrix exp is an expensive operation that dominates the runtime. If you do something simpler instead, like b^2, you can tell the difference.

Also, as @lungben says, your benchmarking is not very good for the purpose of finding correct memory use and performance. Instead

Wrap your code in functions
Don’t use non-const global variables
The @time macro is poorly suited for microbenchmarks, use BenchmarkTools instead.

ZhiyuanYao · June 8, 2020, 9:39am

Thanks a lot! You feedbacks and suggestions are very helpful for beginners like me.

ZhiyuanYao · June 8, 2020, 11:31am

Often I find myself have to use a dozen of global variables in several functions. Passing them as argument list would make the functions ugly. Is packing them in a module or something then importing the recommended way?

mcabbott · June 8, 2020, 11:37am

The simplest is just to make a NamedTuple out of them. You may also like https://github.com/mauro3/UnPack.jl :

julia> tup = (x=1, y=2);

julia> @unpack x = tup;

julia> x
1

julia> y
ERROR: UndefVarError: y not defined

Topic		Replies	Views
Help understanding when .= allocates General Usage memory-allocation	9	854	September 10, 2022
In place updating of arrays General Usage	8	1680	February 25, 2021
Dot equals .= Performance	3	4614	July 26, 2019
Dot operator, 1. with array-indexing expression on the left-hand side and 2. with Matrix*Vector-multiplication Performance	18	824	June 13, 2021
Why is a manual in-place addition so much faster than += and .+= on range-indexed arrays? General Usage	3	5012	April 21, 2017

Julia in-place modification and memory usages

Related topics