Can I decrease allocations here?

fredrikpaues · March 17, 2022, 10:54am

I’m trying to further optimize performance-critical function that locates values on a grid. Here I have tried using that the values will be sorted. And the bulk of the code is fast and has only 3 allocations, but then the final statement hits me with 2131 allocations Could you help me minimize them?

function locate_vector(
    values::AbstractVector{<:Real},
    gridpoints;
    locate_below::Bool=false,
    locate_above::Bool=false,
)
    function incrementalsearch(low, value, gridpoints)
        while value >= gridpoints[low + 1]
            low += 1
        end

        return low
    end

    indices = ones(Integer, size(values))
    weights = Vector{Tuple}(undef, size(values))

    low = 1
    high = length(gridpoints) - 1

    for ix in eachindex(values)
        if values[ix] <= gridpoints[1]
            # indices[ix] = low = 1  # No need as indices and low are initialized to 1

            if !locate_below
                weights[ix] = (1.0, 0.0)
            end
        elseif values[ix] >= gridpoints[end]
            indices[ix:end] .= length(gridpoints) - 1

            if !locate_above
                weights[ix:end] .= repeat((0.0, 1.0), length(weights[ix:end]))
            end

            break
        elseif low == high
            indices[ix] = high
        else
            indices[ix] = low = incrementalsearch(low, values[ix], gridpoints)
        end
    end

    no_weights = filter(ix -> !isassigned(weights, ix), 1:length(weights))
    weights[no_weights] .= map(  # Crazy number of allocations
        ix -> (
            gridpoints[indices[ix] .+ 1] .- values[ix],
            values[ix] .- gridpoints[indices[ix]]
        ) ./ (
            gridpoints[indices[ix] .+ 1]
            .- gridpoints[indices[ix]]
        ),
        no_weights,
    )

    return indices, weights
end

using BenchmarkTools
gridpoints = collect(range(0.1, 0.9, 100));
values = sort(rand(101));
@btime locate_vector(values, gridpoints; locate_above=true);
# 256.400 μs (2134 allocations: 63.00 KiB)

goerch · March 17, 2022, 11:04am

You could use a comprehension (lazy) instead of map (eager constructing a vector) like so

    weights[no_weights] .= ((
            gridpoints[indices[ix] .+ 1] .- values[ix],
            values[ix] .- gridpoints[indices[ix]]
        ) ./ (
            gridpoints[indices[ix] .+ 1]
            .- gridpoints[indices[ix]]
        ) for ix in no_weights)

fredrikpaues · March 17, 2022, 11:12am

That helped, but only by very, very little—allocations dropped by 6

goerch · March 17, 2022, 11:25am

Confirmed (checked that with profiler). Go back to ‘map’ and try

    weights = Vector{Tuple{Float64, Float64}}(undef, size(values))

This looks slightly better here. (Vector{Tuple} means implicitly Vector{Tuple{Any...}} and this is no good).

goerch · March 17, 2022, 11:31am

Afterwards this

    for ix in no_weights
        weights[ix] =
            (
                gridpoints[indices[ix] .+ 1] .- values[ix],
                values[ix] .- gridpoints[indices[ix]]
            ) ./ (
                gridpoints[indices[ix] .+ 1]
                .- gridpoints[indices[ix]]
            )
    end

eliminates the last dynamic dispatch yielding

  770.635 ns (5 allocations: 3.61 KiB)

fredrikpaues · March 17, 2022, 11:53am

Great! Thank you!

fredrikpaues · March 17, 2022, 11:54am

Noob question but how did you check that with the profiler?

goerch · March 17, 2022, 12:01pm

I’m still using Atom/Juno for this with a script like

using Profile
Profile.clear()
gridpoints = collect(range(0.1, 0.9, 100));
values = sort(rand(101));
locate_vector(values, gridpoints; locate_above=true);
@profile for i in 1:1000; locate_vector(values, gridpoints; locate_above=true); end
# Profile.print()
Juno.profiler()

Here is before

and here is after

Color coding: yellow is dynamic dispatch, red is allocation…

DNF · March 17, 2022, 1:29pm

I understand that you fixed Vector{Tuple} to get a concrete element type, right? But did you also fix indices? Integer is an abstract type, you should use

indices = ones(Int, size(values))

instead.

DNF · March 17, 2022, 2:01pm

Wait a minute! Did you check the correctness of the results?

Changing from this:

weights = Vector{Tuple}(undef, size(values))

to

weights = Vector{Tuple{Float64, Float64}}(undef, size(values))

may have some unintended consequences. Later in the code @fredrikpaues uses !isassigned(weights, ix), but that makes no sense for the latter, because of this:

1.7.2> v1 = Vector{Tuple}(undef, 3)
3-element Vector{Tuple}:
 #undef
 #undef
 #undef

1.7.2> isassigned(v1, 1)
false

1.7.2> v2 = Vector{NTuple{2,Float64}}(undef, 3)
3-element Vector{Tuple{Float64, Float64}}:
 (2.5e-323, 3.5e-323)
 (4.0e-323, 4.4e-323)
 (5.0e-323, 5.4e-323)

1.7.2> isassigned(v2, 1)
true

There is no #undef value for bitsvalues like these, and isassigned is always true.

I’m not 100% sure if the last code version still has isassigned because the image was very small and blurry, but if so, the algorithm should be redesigned, without relying on isassigned, which is not a great idea.

One way is to initialize like this:

weights = fill((NaN, NaN), size(values))

and check for isnan instead of isassigned. Also, I wouldn’t create no_weights first, and then iterate over them. Just loop over weights and check for NaN in each iteration.

After doing this, there are only 2 allocations, one for indices and one for weights, as expected.

(Also, what’s up with all the broadcasting dots in that loop, are they needed?)

DNF · March 17, 2022, 2:45pm

Oh, and this one:

This needlessly creates, not just one, but two arrays on the right side. Firstly, length(weights[ix:end]) allocates a whole array, just to measure its length. And then repeat creates another one.

But there’s no reason to do this. Just assign the same value to all elements on the left. You’ll have to wrap the right hand side in a Ref though, to avoid broadcasting:

weights[ix:end] .= Ref((0.0, 1.0))

goerch · March 17, 2022, 2:56pm

@DNF: thanks for the code review. You are right on all counts, especially the unintended change in results! @tbeason : thanks!

For future reference and @fredrikpaues : here are the missing pieces

function locate_vector1(
    values::AbstractVector{<:Real},
    gridpoints;
    locate_below::Bool=false,
    locate_above::Bool=false,
)
    function incrementalsearch(low, value, gridpoints)
        while value >= gridpoints[low + 1]
            low += 1
        end

        return low
    end

    indices = ones(Integer, size(values))
    weights = Vector{Tuple}(undef, size(values))

    low = 1
    high = length(gridpoints) - 1

    for ix in eachindex(values)
        if values[ix] <= gridpoints[1]
            # indices[ix] = low = 1  # No need as indices and low are initialized to 1

            if !locate_below
                weights[ix] = (1.0, 0.0)
            end
        elseif values[ix] >= gridpoints[end]
            indices[ix:end] .= length(gridpoints) - 1

            if !locate_above
                weights[ix:end] .= Ref((0.0, 1.0))
            end

            break
        elseif low == high
            indices[ix] = high
        else
            indices[ix] = low = incrementalsearch(low, values[ix], gridpoints)
        end
    end

    no_weights = filter(ix -> !isassigned(weights, ix), 1:length(weights))
    weights[no_weights] .= map(  # Crazy number of allocations
        ix -> (
            gridpoints[indices[ix] .+ 1] .- values[ix],
            values[ix] .- gridpoints[indices[ix]]
        ) ./ (
            gridpoints[indices[ix] .+ 1]
            .- gridpoints[indices[ix]]
        ),
        no_weights,
    )

    return indices, weights
end

function locate_vector2(
    values::AbstractVector{<:Real},
    gridpoints;
    locate_below::Bool=false,
    locate_above::Bool=false,
)
    function incrementalsearch(low, value, gridpoints)
        while value >= gridpoints[low + 1]
            low += 1
        end

        return low
    end

    indices = ones(Int, size(values))
    weights = Vector{Union{Nothing, Tuple{Float64, Float64}}}(nothing, size(values))

    low = 1
    high = length(gridpoints) - 1

    for ix in eachindex(values)
        if values[ix] <= gridpoints[1]
            # indices[ix] = low = 1  # No need as indices and low are initialized to 1

            if !locate_below
                weights[ix] = (1.0, 0.0)
            end
        elseif values[ix] >= gridpoints[end]
            indices[ix:end] .= length(gridpoints) - 1

            if !locate_above
                weights[ix:end] .= Ref((0.0, 1.0))
            end

            break
        elseif low == high
            indices[ix] = high
        else
            indices[ix] = low = incrementalsearch(low, values[ix], gridpoints)
        end
    end

    no_weights = filter(ix -> weights[ix] === nothing, 1:length(weights))
    for ix in no_weights
        weights[ix] =
            (
                gridpoints[indices[ix] .+ 1] .- values[ix],
                values[ix] .- gridpoints[indices[ix]]
            ) ./ (
                gridpoints[indices[ix] .+ 1]
                .- gridpoints[indices[ix]]
            )
    end

    return indices, weights
end

gridpoints = collect(range(0.1, 0.9, 100));
values = sort(rand(101));
result1 = locate_vector1(values, gridpoints; locate_above=false);
result2 = locate_vector2(values, gridpoints; locate_above=false);
@assert(result1 == result2)
result1 = locate_vector1(values, gridpoints; locate_above=true);
result2 = locate_vector2(values, gridpoints; locate_above=true);
@assert(result1 == result2)

using BenchmarkTools
gridpoints = collect(range(0.1, 0.9, 100));
values = sort(rand(101));
@btime locate_vector1($values, $gridpoints; locate_above=false); 
@btime locate_vector1($values, $gridpoints; locate_above=true); 
@btime locate_vector2($values, $gridpoints; locate_above=false); 
@btime locate_vector2($values, $gridpoints; locate_above=true);

yielding

  110.000 μs (1429 allocations: 44.39 KiB)
  144.400 μs (1627 allocations: 50.19 KiB)
  873.077 ns (4 allocations: 4.17 KiB)
  892.000 ns (4 allocations: 4.23 KiB)

tbeason · March 17, 2022, 3:05pm

Don’t forget to interpolate values in @btime

julia> @btime locate_vector2(values, gridpoints; locate_above=true);
  677.914 ns (5 allocations: 4.34 KiB)

julia> @btime locate_vector2($values, $gridpoints; locate_above=$true);
  633.136 ns (4 allocations: 4.31 KiB)

DNF · March 17, 2022, 3:10pm

Actually, here’s a version with just 2 allocations:

function locate_vector3(
    values::AbstractVector{<:Real},
    gridpoints;
    locate_below::Bool=false,
    locate_above::Bool=false,
)
    function incrementalsearch(low, value, gridpoints)
        while value >= gridpoints[low + 1]
            low += 1
        end
        return low
    end

    indices = fill(1, size(values))
    weights = fill((NaN, NaN), size(values))

    low = 1
    high = length(gridpoints) - 1

    for ix in eachindex(values)
        if values[ix] <= gridpoints[1]
            if !locate_below
                weights[ix] = (1.0, 0.0)
            end
        elseif values[ix] >= gridpoints[end]
            indices[ix:end] .= length(gridpoints) - 1

            if !locate_above
                weights[ix:end] .= Ref((0.0, 1.0))
            end

            break
        elseif low == high
            indices[ix] = high
        else
            indices[ix] = low = incrementalsearch(low, values[ix], gridpoints)
        end
    end

    for i in eachindex(weights)
        w = weights[i]
        if isnan(w[1])
            grid1 = gridpoints[indices[i]]
            grid2 = gridpoints[indices[i]+1]
            val = values[i]
            weights[i] = ((grid2 - val) / (grid2 - grid1),
                          (val - grid1) / (grid2 - grid1))
        end
    end
    return indices, weights
end

1.7.2> @btime locate_vector3($values, $gridpoints; locate_above=true);
  575.275 ns (2 allocations: 2.64 KiB)

DNF · March 17, 2022, 3:18pm

By the way:

Are both gridpoints and values expected to be sorted, and even possibly regularly spaced? You don’t seem to exploit that. Your search function is woefully suboptimal.

fredrikpaues · March 17, 2022, 3:27pm

Thank you! Both are sorted. None regularly spaced. How is the search function suboptimal? (I have another search function which uses binary search, but I just included one of them here.)

DNF · March 17, 2022, 3:38pm

I would have thought that binary search was better, but looking closer, it seems the current version might not be too bad.

goerch · March 17, 2022, 4:08pm

I imagine some kind of merge variation could help?

Along the lines of

    iv = 1
    ig = 1
    while iv <= length(values) && ig <= length(gridpoints) 
        if values[iv] < gridpoints[ig]
            # do something
            iv += 1
        else if values[iv] == gridpoints[ig]
            # do something
            iv += 1
            ig += 1
        else
            # do something
            ig += 1
        end
    end
    while iv <= length(values)
        # do something
        iv += 1
    end

fredrikpaues · March 25, 2022, 2:40pm

By the way, I just learned that eachindex can take multiple inputs. Is there a good reason (not) to write èachindex(values, indices, weights) here? I’m thinking that it might be a good idea, just in case values isn’t 1-based.

Topic		Replies	Views
Weights and indices for linear interpolation/extrapolation Performance question , interpolations	15	1845	March 12, 2022
Undesired memory allocation during interpolation Performance memory-allocation	33	1382	December 8, 2021
Vector notation, intermediate value and allocations New to Julia	8	463	June 16, 2022
Too many allocations? Performance	23	1676	February 18, 2021
Understanding assignment without allocation New to Julia question	14	1707	November 12, 2019

Can I decrease allocations here?

Related topics