Reducing the allocations of my function

Hi, I have a function that can help me find all the qualified numbers that are within certain range, but it just ocupies a lot of allocations. Since I need to repeat this process thousands of time. I hope I can reduce the allocation to almost zero. Any suggestions are welcome, here is my program

function RandV(v, vstep, Num)
    vcount = 0
    while vcount < Num
        vrand = rand(-round(v / vstep):round(v / vstep), 100000, 3)
        index = findall(abs.(sqrt.(vrand[:, 1] .^ 2 .+ vrand[:, 2] .^ 2 .+ vrand[:, 3] .^ 2) .* vstep .- v) .< 0.01 * v)
        vcount = length(index)
        if vcount > Num
            return vrand[index, :] .* vstep
            break
        end
    end
end

@time RandV(1.0, 0.01, 10)  # 23 allocations in my computer

When you do rand[:, 1] it allocates. You need to use a view instead. This can easily be done using the @views macro.

Also, you are allocating memory on each part of the loop when you use the rand function. One way to remove the allocations is to manually write out the loop you intend to use, so that it does not need all of the memory.

I can try that, but it’s not the dominant part that occupy allocations, the main allocations are from

index = findall(abs.(sqrt.(vrand[:, 1] .^ 2 .+ vrand[:, 2] .^ 2 .+ vrand[:, 3] .^ 2) .* vstep .- v) .< 0.01 * v)

which I don’t have better way to solve it

function RandV(v, vstep, Num)
    vcount = 0
    while vcount < Num
        vrand = rand(-round(v / vstep):round(v / vstep), 100000, 3)
        
        vrand01 = @view vrand[:, 1]
        vrand02 = @view vrand[:, 2]
        vrand03 = @view vrand[:, 3]

        index = findall(abs.(sqrt.(vrand01 .^ 2 .+ vrand02 .^ 2 .+ vrand03 .^ 2) .* vstep .- v) .< 0.01 * v)
        vcount = length(index)
        if vcount > Num
            return vrand[index, :] .* vstep
            break
        end
    end
end

@time rel = RandV(1.0, 0.01, 10)   # 11 allocations

use this can reduce the allocations to 11, any other suggestions, thanks~

I’ve tried to recreate this, removing as many allocations as possible:

using Random
function RandV!(vrand, v, vstep, Num, init_vrand::Val{T}=Val(true)) where {T}
    (N, d) = size(vrand)
    if T
        rand!(vrand, -round(v / vstep):round(v / vstep))
    end
    while true
        vcount = 0
        for i in 1:N
            mag = vrand[i, 1] * vrand[i, 1] + vrand[i, 2] * vrand[i, 2] + vrand[i, 3] * vrand[i, 3]
            if abs(sqrt(mag) * vstep - v) < 0.01 * v
                vcount += 1
                vrand[vcount, :] .= view(vrand, i, :) 
            end
        end
        if vcount >= Num # Changed to >= as this made a bit more sense to me
            selection_vrand = view(vrand, 1:vcount)
            selection_vrand .*= vstep
            return selection_vrand
        end
        # Recreate the random numbers in the preallocated array
        rand!(vrand, -round(v / vstep):round(v / vstep))
    end
end

This will take in a preallocated array, and you can wrap this with your normal function:

function RandV(v, vstep, Num)
    N = 100000
    d = 3
    vrand = rand(-round(v / vstep):round(v / vstep), N, d)
    return RandV!(vrand, v, vstep, Num, Val(false))
end

EDIT: On my machine:

julia> @time RandV(v, vstep, Num);
  0.003289 seconds (5 allocations: 2.289 MiB)

julia> @time RandVold(v, vstep, Num);
  0.003476 seconds (23 allocations: 4.758 MiB)

You can probably remove some of the additional allocations, there should only really be 1 for the vrand array.

Maybe you can say a bit more, about what you are doing with this function? Seems something along the lines of producing 3d-rand-vectors, with their norm not exceeding some value (depending on v, vstep)…

…but also, your call with param Num=10 seems odd, to me - I interpret, that you actually only want 10 3d-rands, satisfying some condition, but yet, you produce 100.000, of which 100s, if not 1000s satisfy that condition (with the parameters you have given, in the example). :thinking:

Maybe I’ve completely embarrassed myself, trying to understand, what you’re doing, but you could save others, at least? :sweat_smile:

1 Like

Yeah, here I just use Num=10, but in real program, typically, Num=100-1000, expecially when the v is only 10s of vstep, it will need 100000 random 3d vectors. I choose this value to just make sure it can produce enough random 3d vectors that satisfiy the condition (that is within 99% close to v in amplitude), since we want to average and delete the influence of direction. and vrand needs to have common factor vstep

ok, got it.
I think, in this case - and especially, when you’re using this function with several sets of params, I would “just” produce 1x 3d-rand-vector at a time, test and save or discard and continue with the next one, until you have produced as many, as you actually need (not 1 less and not 1 more) - and then return.

This is also orthogonal to @jmair’s suggestion, of operating on one preallocated buffer for the vectors, which I would also do. In your original code, if you produce 100.000 vectors, want “just” 1000, but then happen to produce only 999, that actually satisfy your condition, you throw the whole 1000 away and produce another 100.000, etc…

I also like broadcasting, but it won’t save your cpu, to have to cycle, eventually. And I would suggest a simple loop, if you don’t find a more elegant way to save your function a lot of the work, it is currently doing. :wink:

1 Like

I will try it, thanks~

Played around with it, a little. On my laptop…

julia> @btime RandV(1,0.01,10)
  3.587 ms (23 allocations: 4.75 MiB)

julia> @btime RandV(1,0.01,1000)
  3.771 ms (23 allocations: 4.75 MiB)

first version was this…

function RandV2!(ret::Array{Float64}, v, vstep, Num)
    # initialization of ret should really be checked, here!!!
    vcount = 1
    while vcount < Num + 1
        limit = round(v / vstep)
        candidate = rand(-limit:limit,1,3)
        test = candidate[1]^2 + candidate[2]^2 + candidate[3]^2
        if abs(sqrt(test) * vstep - v) < 0.01 * v
            ret[vcount,:] = candidate .* vstep
            vcount += 1
        end
    end
    return ret
end
julia> res = zeros(Float64, 1000, 3) # need some preallocated mem, for all versions, here...

julia> @btime RandV2!(res, 1, 0.01, 10)
  17.400 ÎĽs (76 allocations: 5.94 KiB)

julia> @btime RandV2!(res, 1, 0.01, 1_000)
  7.022 ms (30054 allocations: 2.29 MiB)

…nice, for small Num, but worse, for larger ones, and also, I didn’t manage to eliminate the allocations.
So then, I fell back to my (much more familiar) procedural - low-level - style and came up with something, that doesn’t look particularly nice, but at least does the job, without allocations…

function RandV3!(ret::Array{Float64}, v, vstep, Num)
    # initialization of ret should really be checked, here!!!
    vcount = 1
    limit = round(v / vstep)
    while vcount <= Num
        ret[vcount,1] = rand(-limit:limit)
        ret[vcount,2] = rand(-limit:limit)
        ret[vcount,3] = rand(-limit:limit)
        test = ret[vcount,1]^2 + ret[vcount,2]^2 + ret[vcount,3]^2
        if abs(sqrt(test) * vstep - v) < 0.01 * v
            ret[vcount,1] *= vstep
            ret[vcount,2] *= vstep
            ret[vcount,3] *= vstep
            vcount += 1
        end
    end
    return ret
end
julia> @btime RandV3!(res, 1, 0.01, 10)
  38.600 ÎĽs (0 allocations: 0 bytes)

julia> @btime RandV3!(res, 1, 0.01, 1000)
  14.684 ms (0 allocations: 0 bytes)

…sadly, despite 0 allocations, performance went down, by ~50%, also.

So now, let’s hope for someone achieving 0 allocations, with great performance, doing all the nice broadcasting and non-low-level - stuff. (Anytime, I used some of that, I ran into allocations, in this example). :thinking:

Here’s a version that fills a preallocated array:

using StaticArrays
using LinearAlgebra: norm

function randv!(vreturn::AbstractMatrix, v, vstep)
    Base.require_one_based_indexing(vreturn)
    (num, n) = size(vreturn)
    n == 3 || throw(ArgumentError("vreturn must have 3 columns"))
    vlim = round(v/vstep)
    vrange = -vlim:vlim
    vcount = 0
    vtest = 0.01 * v
    while vcount < num
        vrand = SVector{3,Float64}(rand(vrange) for _ in 1:3)
        if abs(norm(vrand) * vstep - v) < vtest
            vcount += 1
            vreturn[vcount, :] .= vrand * vstep
        end
    end
    return vreturn
end

It doesn’t allocate and is reasonably fast:

julia> using BenchmarkTools

julia> vreturn = zeros(10, 3);

julia> @btime randv!(vreturn, 1.0, 0.01)
  4.980 ÎĽs (0 allocations: 0 bytes)
10Ă—3 Matrix{Float64}:
  0.85   0.51   0.02
 -0.73  -0.18  -0.67
  0.57   0.12   0.81
  0.95   0.12  -0.3
  0.44  -0.87   0.25
  0.71   0.67  -0.24
 -0.26  -0.56   0.78
 -0.68  -0.15  -0.72
 -0.83  -0.55  -0.15
  0.85   0.29   0.43

Edit: Modified my original post to add a bang to the function name and put the modified argument first, plus a couple other minor edits.

4 Likes