Memory allocation inconsistency (again...)

When I run the following

struct Test
     x :: Array{Float64, 1}
     y :: Array{Float64, 1}
end

function test()
   var = Test([1, 2], [1, 2])
    @time @. var.x = var.y - var.x/var.y^2
    @time @. var.x = var.y - var.x/(var.y*var.y)
    return nothing
end

test()

I get

  0.000001 seconds (2 allocations: 16 bytes)
  0.000000 seconds

What’s the difference? Why do I get memory allocation when using var.y^2 (first calculation) but not with (var.y*var.y) (second calculation)? I’m using Julia 1.6.1, and I think I was not getting this difference in behavior in previous Julia versions (may be 1.5) (although the present code seems pretty useless, I’m actually getting something similar in an important piece of code, and I’m getting annoyed because I don’t understand what’s happening).

Please help.

1 Like

Not sure if this helps but it might be weirder

ulia> function test2()
          var = Test([1, 2], [1, 2])
           @time @. var.x =- var.x/var.y^2 + var.y
           @time @. var.x = var.y - var.x/(var.y*var.y)
           return nothing
       end
test2 (generic function with 1 method)

julia> test2()
  0.000000 seconds
  0.000000 seconds

Changing the order removes the allocation

2 Likes

It just confirms the oddity. I’ve tried in version 1.7.0-beta3 and the result is the same. However, in version 1.5.3, I get no memory allocations.

julia> 

  0.000000 seconds
  0.000000 seconds

I think this is a bug, something related to the broadcasting function or the dot macro introduced in the 1.6-version update… It’s a pity because this kind of behavior makes the language feel unreliable. :frowning:

These are most certainly benchmarking artifacts. I would suggest putting each broadcasting in a different function, returning a meaningful result, and using @btime.

No, this is definitely some issue with broadcasting. Since 1.5 was not affected, I would recommend git bisect to find the culprit.

1 Like

The allocation is caused by the RefValue in the lower code.
I use Julia 1.6.1.
You can try the following code.

struct Test
    x :: Array{Float64, 1}
    y :: Array{Float64, 1}
end

function test1(var)
    @. var.x = var.y - var.x/var.y^2
    return nothing
 end
function test2(var)
    @. var.x = var.y - var.x/(var.y*var.y)
    return nothing
end

var = Test([1, 2], [1, 2])
# Run following code twice to exclude the allocation of compiling.
@allocated test1(var)
@allocated test2(var)

test1 allocates 16 bytes and test2 doesn’t allocate.
Check the lower code:

@code_typed test1(var)
│     %240 = Base.getfield(%239, 1, false)::Base.RefValue{typeof(^)}
│            Base.getfield(%240, :x)::typeof(^)
│     %242 = Core.getfield(%239, 2)::Base.Broadcast.Extruded{Vector{Float64}, Tuple{Bool}, Tuple{Int64}}
│     %243 = Core.getfield(%239, 3)::Base.RefValue{Val{2}}

You can see the RefValue here, it causes exactly two allocations, each 8 bytes, while:

@code_typed test2(var)

has no RefValue. And it only creates immutable values.
So I wonder what happens here?
Edit: LLVM IR of test1 has two additional jl_gc_pool_alloc while test2 has no (except on the error branch, but in this case we don’t throw errors).

2 Likes

I am just a Julia user and like the way Julia handles broadcasting. I’ve posted an issue in https://github.com/JuliaLang/julia/issues/. I hope this issue is solved. In the meantime I went back to 1.5.3…

Some even more interesting observations:
If I expand the @. manually, then the allocation is gone:

function test3(var)
    copyto!(var.x,Broadcasted(-,(var.y,Broadcasted(/,(var.x,Broadcasted(^,(var.y,2)))))))
    return nothing
end

@allocated test3(var) is zero…
Edit: the above lowering code is not exactly correct. See the following post to get the correct lowering.

So, the problem must be with the macro.

Noop, I mistake the lower form of the code. It actually should be:

Meta.@lower var.x .= var.y .- var.x./var.y.^2
:($(Expr(:thunk, CodeInfo(
    @ none within `top-level scope'
1 ─ %1  = Base.getproperty(var, :x)
│   %2  = Base.getproperty(var, :y)
│   %3  = Base.getproperty(var, :x)
│   %4  = Base.getproperty(var, :y)
│   %5  = Core.apply_type(Base.Val, 2)
│   %6  = (%5)()
│   %7  = Base.broadcasted(Base.literal_pow, ^, %4, %6)
│   %8  = Base.broadcasted(/, %3, %7)
│   %9  = Base.broadcasted(-, %2, %8)
│   %10 = Base.materialize!(%1, %9)
└──       return %10
))))

copy the lower code you get:

function test3(var)
    v7 = Base.Broadcast.broadcasted(Base.literal_pow,^,var.y,Base.Val(2))
    v8 = Base.Broadcast.broadcasted(/,var.x,v7)
    v9 = Base.Broadcast.broadcasted(-,var.y,v8)
    Base.Broadcast.materialize!(var.x,v9)
    return nothing
 end

It still allocates 16 bytes…

1 Like

Ok, I guess I found the reason. It’s because there is an uninlined function call preprocess_args.
Redefine the function fix the bug:

import Base.Broadcast.preprocess_args
import Base.Broadcast.preprocess
@inline preprocess_args(dest, args::Tuple) = (Base.Broadcast.preprocess(dest, args[1]), Base.Broadcast.preprocess_args(dest, Base.tail(args))...)
@inline preprocess_args(dest, args::Tuple{Any}) = (Base.Broadcast.preprocess(dest, args[1]),)
@inline preprocess_args(dest, args::Tuple{}) = ()

@aaraujo71 Can you try the following code?
On my computer with Julia 1.6.1:

struct Test
    x :: Array{Float64, 1}
    y :: Array{Float64, 1}
end

function test1(var)
    @. var.x = var.y - var.x/var.y^2
    return nothing
end

var = Test([1, 2], [1, 2])
#compile
@allocated test1(var)
@assert var.x == [0,1.5]

var = Test([1, 2], [1, 2])
@allocated test1(var)
@assert var.x == [0,1.5]

import Base.Broadcast.preprocess_args
import Base.Broadcast.preprocess
@inline preprocess_args(dest, args::Tuple) = (Base.Broadcast.preprocess(dest, args[1]), Base.Broadcast.preprocess_args(dest, Base.tail(args))...)
@inline preprocess_args(dest, args::Tuple{Any}) = (Base.Broadcast.preprocess(dest, args[1]),)
@inline preprocess_args(dest, args::Tuple{}) = ()
var = Test([1, 2], [1, 2])
#compile
@allocated test1(var)
@assert var.x == [0,1.5]

var = Test([1, 2], [1, 2])
@allocated test1(var)
@assert var.x == [0,1.5]

You should have 4 allocation number, the first and the third one is a large number including compilation time and the second and the fourth one is 16 (with uninlined function) and 0 (after fix).

1 Like

Github issue for those wanting to follow it Memory allocation inconsistency in broadcasting · Issue #41565 · JuliaLang/julia · GitHub

I am using 1.5.3 now. However, when I run your code I get nothing. I must be doing something wrong.

My mistake… You need to add a println to each @allocoated. I use a REPL so println is not needed. The code would be:

struct Test
    x :: Array{Float64, 1}
    y :: Array{Float64, 1}
end

function test1(var)
    @. var.x = var.y - var.x/var.y^2
    return nothing
end

# compile
var = Test([1, 2], [1, 2])
println(@allocated test1(var))
@assert var.x == [0,1.5]

# before fix
var = Test([1, 2], [1, 2])
println(@allocated test1(var))
@assert var.x == [0,1.5]

import Base.Broadcast.preprocess_args
import Base.Broadcast.preprocess
@inline preprocess_args(dest, args::Tuple) = (Base.Broadcast.preprocess(dest, args[1]), Base.Broadcast.preprocess_args(dest, Base.tail(args))...)
@inline preprocess_args(dest, args::Tuple{Any}) = (Base.Broadcast.preprocess(dest, args[1]),)
@inline preprocess_args(dest, args::Tuple{}) = ()

# compile
var = Test([1, 2], [1, 2])
println(@allocated test1(var))
@assert var.x == [0,1.5]

# after fix
var = Test([1, 2], [1, 2])
println(@allocated test1(var))
@assert var.x == [0,1.5]

Simply save the code in the file and run it with Julia. You should get:

$ julia alloc.jl 
23433466
16
10964697
0

@aaraujo71

I get the following:


23345722
16
10964665
0

(version 1.6.1, first run)


1057944 
0
10964057
0

(version 1.6.1, second run)


20864725
0
9192799
0

(version 1.5.3, first run)


1108545 
0
9192191
0

(version 1.5.3, second run)