Minimizing Allocations - How to? Resources?

loki · August 18, 2020, 10:12pm

Hello, inspired by the conclusion of JuliaCon I wanted to try to push myself to achieve consistent 0 allocation programs. There was a talk that gave recommendations, in addition to referencing the performance guide, on how to achieve this but it seems I’m still having some difficulty in going identifying extraneous allocations. What are the best resources to hunt down allocations besides the traditional @time, @btime and @allocate macros? With these tools its easy to see that functions have excessive allocations but not necessary whats causing them.

For example, in the code below Afwd! has 2 allocations where as Aadj_Afwd has 10. @code_warntype looks good. So It is not clear to me where these extra allocations are coming from. My next thought would be to use either @views or StaticArrays.jl. I don’t believe a view would speed up the Small_Matrix[:] and StaticArrays.jl are not advisable due to the size of Big_Matrix.

Any suggestions would be greatly appreciated!

using LinearAlgebra

function Afwd!(out1,Small_Matrix,Big_Matrix)
    mul!(out1,Big_Matrix,Small_Matrix[:])
end

function Aadj_Afwd(Small_Matrix, Big_Matrix, m)
    out1 = similar(Small_Matrix[:]);
    mul!(out1,Big_Matrix,Small_Matrix[:])

    out2 = similar(out1)
    mul!(out2,transpose(Big_Matrix),out1)

    output = reshape(out2,m)
    return output
end

m = (64,64);
Small_Matrix= randn(m) + 1im.*randn(m);
Big_Matrix= randn(m.^2) + 1im.*randn(m.^2);

out1 = similar(x[:]);
@btime Afwd!(out1,Small_Matrix,Big_Matrix);
@btime Aadj_Afwd(Small_Matrix,Big_Matrix,m);

10.987 ms (2 allocations: 64.08 KiB)
22.268 ms (10 allocations: 256.41 KiB)

rdeits · August 18, 2020, 10:27pm

The most obvious source of allocations is Small_Matrix[:], which makes a copy. That copy has to be allocated, thus you get allocations.

Why not? This seems like a pretty clear use-case for them.

First of all, let’s check your Afwd! function, making sure to use $ to interpolate the argument as the manual says:

julia> @btime Afwd!($out1, $Small_Matrix, $Big_Matrix);
  9.525 ms (2 allocations: 64.08 KiB)

And now let’s replace the copy with a view:

julia> function Afwd_view!(out1, Small_Matrix, Big_Matrix)
           mul!(out1, Big_Matrix, @view Small_Matrix[:])
       end
Afwd_view! (generic function with 1 method)

julia> @btime Afwd_view!($out1, $Small_Matrix, $Big_Matrix);
  9.323 ms (2 allocations: 80 bytes)

It’s (a bit) faster, and the memory allocation is reduced by nearly 1000x. That’s a pretty nice improvement for such an easy change to make.

I’m actually still pretty surprised that there is any allocation at all. I had assumed that the view allocation would be gone in Julia 1.5 in this case, but it seems to still be present in this case.

Does anyone know why the @view in Afwd_view! still allocates?

Have you tried the track-allocation setting? See Profiling · The Julia Language for more. In your particular case, though, the obvious places to start are the copies created by foo[:] and the new arrays created with similar. The copies can be replaced by views to reduce or eliminate allocations. For the new arrays like out1 and out2, you will have to decide where the memory for those lives. I like the pattern of creating a Workspace struct to hold those buffers and then allowing the user to pass in an instance of that workspace to the function. Then you can even make a version of the function that allocates its own workspace, which will be slower but convenient for development and testing.

For example:

struct Workspace
  x::Vector{Float64}
end

function do_work(inputs)
  workspace = Workspace(similar(inputs))
  do_work!(workspace, inputs)
end

function do_work!(workspace::Workspace, inputs)
  # Do something with `inputs`, using `workspace.x` as your local workspace
end

Now you can create a single Workspace and re-use it over and over by calling do_work!, avoiding any new memory allocation. Or you can just use do_work() to test your algorithm without having to worry about managing the workspace.

Azamat · August 18, 2020, 11:06pm

You can also do reshape(Small_Matrix, :) instead of @view Small_Matrix[:], which I find more idiomatic.

loki · August 18, 2020, 11:44pm

@rdeits Thank you for all of those suggestions and pointing out my missing $.

It seems I need to spend more time working on my understanding of views. Could you explain the different use cases for @view compared to @views?

Regarding your workspace buffer example, I am little confused by the Workspace struct pattern that you described. If I understand correctly, the struct is used to allocate the memory for each instance of similar to avoid extraneous allocations. Then in the do_work function the workspace is defined according to the inputs I want to perform calculations on. Lastly do_work! performs the calculations and updates the values in the workspace. Is this the correct way of looking at it? With this understanding I tried to implement it into the second example as it previously used similar. Though since I included the output in the struct it needed to be mutable. I can help but feel like I missed something regarding your explanation.

mutable struct Workspace
    out1::Array{Complex{Float64},1}
    out2::Array{Complex{Float64},1}
    out3::Array{Complex{Float64},2}
end

function do_work(Mat1,Mat2,m)
    workspace = Workspace(similar(Mat1[:]),similar(Mat1[:]),similar(Mat1))
    do_work!(workspace, Mat1, Mat2, m)
end

function do_work!(workspace::Workspace, Mat1, Mat2, m)
    mul!(workspace.out1, Mat2, @view Mat1[:])
    mul!(workspace.out2,transpose(Mat2),workspace.out1)
    workspace.out3 = reshape(workspace.out2,m)
end

@btime do_work($Small_Matrix, $Big_Matrix, $m)
22.281 ms (16 allocations: 320.61 KiB)

Also how do you change a struct after it has been defined? Im using VScode on windows, Julia 1.5.

DNF · August 19, 2020, 12:02am

This is what the vec function is for. vec(Mat1) is preferable to view and reshape.

This makes four allocations, when you only need two.

rdeits · August 19, 2020, 2:35am

@views is just a shortcut. It replaces every instance of the foo[bar] syntax with @view foo[bar] within a given block of code. The following are all equivalent:

@view x[1:10]
view(x, 1:10)
@views begin
   x[1:10]
end

@views is useful if you have a bunch of code (like the body of a function) and you want to replace all of its slices with views.

As for the workspace concept, I should have been more clear: I’m assuming that you are doing some kind of work repeatedly. In that case, you can re-use the same workspace over and over, allocating it only once instead of every iteration of your loop.

For example, you might do:

function main()
  work = Workspace(...)
  for i in 1:num_iterations
    do_work!(work, ...)
  end
end

In this way, you can do num_iterations calls to do_work!() with only a single workspace allocation.

The helper function do_work that constructs a new Workspace is just for testing or interactive use, since it doesn’t actually save you any memory allocation.

Unfortunately you can’t. You’ll need to restart your Julia session (there should be an option to do that in the VSCode Julia menu). It’s annoying, and there’s some discussion about how to improve the situation here: redefining struct · Issue #18 · timholy/Revise.jl · GitHub

Topic		Replies	Views
Help to reduce number of allocations General Usage	23	5459	January 6, 2019
Unnecessary allocations happening at big StaticArray expression Performance	3	554	December 27, 2018
Optimizing Linear Algebra Code? Performance linearalgebra	12	1384	April 9, 2021
Allocations when constructing a matrix from columns Performance memory-allocation	35	1736	April 1, 2020
Best practices to reduce allocations in Julia Performance	6	1533	September 23, 2023

Minimizing Allocations - How to? Resources?

Related topics