Common allocation mistakes

2 Likes

Question, in function that create arrays inside of them, it is better to pass those arrays as arguments? For example:

function f1(x)
    sto = similar(x)
    sto = x .+ 1
    return sto
end

function f2(sto, x)
    sto = x .+ 1
    return sto
end

So that in a loop I can reuse sto without Julia having to allocate memory for every iteration:

function g1(x)
    for i in 1:100
        sto = f1(x)
    end
end

function g2(x)
    sto = similar(x)
    for i in 1:100
        f2(sto, x)
    end
    return sto
end

is that the point?

1 Like

Yes, usually what I do is to let this as an alternative to the user, like this:

function f1(x;aux=similar(x))
    aux .= x .+ 1 # note the broadcasting here!
    return aux
end

Then the function can be called only with x, in which case it allocates aux, or you can pass aux as a keyword parameter, in which it reuses what is given.

@amrods note that you need .= to broadcast, otherwise the preallocated vector is not used for anything (this is incorrect in the code you posted there).

4 Likes

oh nice, i didnā€™t know you could reference other arguments at function definition.

There is also Random.rand! which will fill the array with random values, and also make sure that they have the correct eltype. For example

A = rand(Int8, 3, 4);
rand!(A)

rand! will generate random Int8 values, etc.

4 Likes

These are great tips - thank you for writing this.

Once thing I recently learned related to views - and if I get any of this wrong please someone correct me - is that a view still allocates. For example, if you have a hot inner loop where you use small blocks of a larger vector you might think (or at least I naively did) that you can avoid heap allocations by using something like the following. I thought I could use the view to do allocation-free access and operations on elements of A.

A = rand(100)
B = rand(3)
function view_test(A, B)
    tmp = @view A[2:4]
    tmp .= tmp .* B
end

This is a silly example, but in my own hot inner loop I was reusing that view in places many times and re-creating each iteration of the loop. I ended up with tons of heap allocations everywhere that view was used (sorry if this is obvious to others!). I eventually found some posts indicating that if you have something like this you should just loop over the small block - which is what I ended up doing. In other locations with larger arrays I got around the problem by using copyto! or just made loops there as well.

That specifically should not allocate. But you can have allocations with views if the slice is of a non-contiguous memory block (not sure if this is true, what certainly happens is that views of non-contiguous slices are not very performant).

I get this when using @btime: 20.160 ns (1 allocation: 48 bytes)

What I obtain:

julia> @btime view_test($A, $B);
  169.941 ns (0 allocations: 0 bytes)

Perhaps you didnā€™t interpolate the variables?

julia> @btime view_test(A, B);
  195.690 ns (1 allocation: 48 bytes)

Iā€™m amazed at how fast your system is btw

2 Likes

Which version of Julia are you using?

In older versions (1.0 for sure, then I donā€™t remember), structs with mutable members were heap-allocated. In the current stable and LTS versions (1.7 and 1.6, respectively), views should be stack-allocated but of course the stack-allocated data includes a reference to a heap-allocated backing array (except, of course, the cases when the backing array itself has isbits type, e.g. UnitRange or StaticArray).

Iā€™m using 1.6.2

Even with variable interpolation?

1.6.3:

julia> @btime view_test($A,$B)
  80.945 ns (0 allocations: 0 bytes)

Yes, interpolating then the allocation is zero.

What does that mean then? Where is the allocation coming from when you donā€™t interpolate?

It is like the input variables were non-constant globals. See: julia - When to interpolate in benchmarking expressions - Stack Overflow

(I miss as well a more in depth explantation of these interpolations in BenchmarkTools, as a rule of thumb I understand that they should be always used)

2 Likes

Ah okay - so the point then is that the view did not create an allocation, it was just a mistake in how I was using @btime.

1 Like

Just one more follow-up that I just remembered. I think allocation still may have been happening, in my particular case. I say this because I would use julia --track-allocation=user and Profile (making sure to clear the profiler before running the function) and I would see memory allocations next to the lines with views. Perhaps it was because they were views of non-contiguous sections of memory as you said in an earlier post.

I tried to get allocations in those cases here, without success. Thus, Iā€™m not sure if that still applies (or has anytime after 1.5). But it may.

At the same time I always found the output of --track-allocation=user confusing, I usually check manually the code after profiling with that to be sure of the source of the allocations.

One more approach for the case 2. to avoid allocations when broadcasting. The package LazyArrays allows you to do this and keep the notation simple: it only requires adding @~ to the broadcast, and you donā€™t need to pre-allocate the output.

Example:

using BenchmarkTools, LazyArrays
x,y = rand(100), rand(100)
tmp = similar(x)

#pre-allocating tmp
function f(x,y,tmp)
         z = 0.
         for i in 1:10
           @. tmp = x * y
           z += sum(tmp) 
         end
         z
end
@btime f($x,$y,$tmp) # 144.617 ns (0 allocations: 0 bytes)

#using LazyArrays
function g(x,y)
  z = 0.
  for i in 1:10
    tmp = @~ x .* y
    z += sum(tmp) 
  end
  z
end

f(x,y,tmp) == g(x,y) # true
@btime g($x,$y)      # 107.120 ns (0 allocations: 0 bytes)
3 Likes

Why for case 2 using mapreduce (i.e. z += mapreduce(*, norm, x,y) ) still allocates ? In the doc it is explicitly said that it skip the intermediate array:

    mapreduce(f, op, A::AbstractArray...; dims=:, [init])

Evaluates to the same as `reduce(op, map(f, A...); dims=dims, init=init)`, but is generally faster because the intermediate array is avoided.

I think you have a tricky case there. You are first multiplying the elements of x and y, which would in principle create an intermediate array z .= x .* y and then computing the norm of z.

Computing the norm of an array can be done in-place, but I doubt that the compiler can handle that automatically for an array that is constructed lazily.

But, to be sincere, I donĀ“t completely understand what mapreduce is doing there (I think the docs are not clear on what it does when more than one collection is provided). For instance, if that call to mapreduce makes sense, I would think that these two would give the same result:

julia> x = rand(10); y = rand(10);

julia> mapreduce(*, norm, x, y)
0.015890825295371242

julia> norm(xi*yi for (xi,yi) in zip(x,y))
1.0128277332826312

Or, if (thatā€™s what I understand from the docs), the function f should be applied to each element to the collections, in which case I think that should error, as *(x) is not a defined operation. But that works even with a single collection, so Iā€™m lost here:

julia> mapreduce(*, norm, [1,2])
1.0