# Reducing the allocations of my function

Hi, I have a function that can help me find all the qualified numbers that are within certain range, but it just ocupies a lot of allocations. Since I need to repeat this process thousands of time. I hope I can reduce the allocation to almost zero. Any suggestions are welcome, here is my program

``````function RandV(v, vstep, Num)
vcount = 0
while vcount < Num
vrand = rand(-round(v / vstep):round(v / vstep), 100000, 3)
index = findall(abs.(sqrt.(vrand[:, 1] .^ 2 .+ vrand[:, 2] .^ 2 .+ vrand[:, 3] .^ 2) .* vstep .- v) .< 0.01 * v)
vcount = length(index)
if vcount > Num
return vrand[index, :] .* vstep
break
end
end
end

@time RandV(1.0, 0.01, 10)  # 23 allocations in my computer
``````

When you do `rand[:, 1]` it allocates. You need to use a `view` instead. This can easily be done using the `@views` macro.

Also, you are allocating memory on each part of the loop when you use the `rand` function. One way to remove the allocations is to manually write out the loop you intend to use, so that it does not need all of the memory.

I can try that, but itâ€™s not the dominant part that occupy allocations, the main allocations are from

``````index = findall(abs.(sqrt.(vrand[:, 1] .^ 2 .+ vrand[:, 2] .^ 2 .+ vrand[:, 3] .^ 2) .* vstep .- v) .< 0.01 * v)

``````

which I donâ€™t have better way to solve it

``````function RandV(v, vstep, Num)
vcount = 0
while vcount < Num
vrand = rand(-round(v / vstep):round(v / vstep), 100000, 3)

vrand01 = @view vrand[:, 1]
vrand02 = @view vrand[:, 2]
vrand03 = @view vrand[:, 3]

index = findall(abs.(sqrt.(vrand01 .^ 2 .+ vrand02 .^ 2 .+ vrand03 .^ 2) .* vstep .- v) .< 0.01 * v)
vcount = length(index)
if vcount > Num
return vrand[index, :] .* vstep
break
end
end
end

@time rel = RandV(1.0, 0.01, 10)   # 11 allocations
``````

use this can reduce the allocations to 11, any other suggestions, thanks~

Iâ€™ve tried to recreate this, removing as many allocations as possible:

``````using Random
function RandV!(vrand, v, vstep, Num, init_vrand::Val{T}=Val(true)) where {T}
(N, d) = size(vrand)
if T
rand!(vrand, -round(v / vstep):round(v / vstep))
end
while true
vcount = 0
for i in 1:N
mag = vrand[i, 1] * vrand[i, 1] + vrand[i, 2] * vrand[i, 2] + vrand[i, 3] * vrand[i, 3]
if abs(sqrt(mag) * vstep - v) < 0.01 * v
vcount += 1
vrand[vcount, :] .= view(vrand, i, :)
end
end
if vcount >= Num # Changed to >= as this made a bit more sense to me
selection_vrand = view(vrand, 1:vcount)
selection_vrand .*= vstep
return selection_vrand
end
# Recreate the random numbers in the preallocated array
rand!(vrand, -round(v / vstep):round(v / vstep))
end
end
``````

This will take in a preallocated array, and you can wrap this with your normal function:

``````function RandV(v, vstep, Num)
N = 100000
d = 3
vrand = rand(-round(v / vstep):round(v / vstep), N, d)
return RandV!(vrand, v, vstep, Num, Val(false))
end
``````

EDIT: On my machine:

``````julia> @time RandV(v, vstep, Num);
0.003289 seconds (5 allocations: 2.289 MiB)

julia> @time RandVold(v, vstep, Num);
0.003476 seconds (23 allocations: 4.758 MiB)
``````

You can probably remove some of the additional allocations, there should only really be 1 for the `vrand` array.

Maybe you can say a bit more, about what you are doing with this function? Seems something along the lines of producing 3d-rand-vectors, with their norm not exceeding some value (depending on v, vstep)â€¦

â€¦but also, your call with param `Num=10` seems odd, to me - I interpret, that you actually only want 10 3d-rands, satisfying some condition, but yet, you produce 100.000, of which 100s, if not 1000s satisfy that condition (with the parameters you have given, in the example).

Maybe Iâ€™ve completely embarrassed myself, trying to understand, what youâ€™re doing, but you could save others, at least?

1 Like

Yeah, here I just use `Num=10`, but in real program, typically, `Num=100-1000`, expecially when the `v` is only 10s of `vstep`, it will need 100000 random 3d vectors. I choose this value to just make sure it can produce enough random 3d vectors that satisfiy the condition (that is within 99% close to v in amplitude), since we want to average and delete the influence of direction. and `vrand` needs to have common factor `vstep`

ok, got it.
I think, in this case - and especially, when youâ€™re using this function with several sets of params, I would â€śjustâ€ť produce 1x 3d-rand-vector at a time, test and save or discard and continue with the next one, until you have produced as many, as you actually need (not 1 less and not 1 more) - and then return.

This is also orthogonal to @jmairâ€™s suggestion, of operating on one preallocated buffer for the vectors, which I would also do. In your original code, if you produce 100.000 vectors, want â€śjustâ€ť 1000, but then happen to produce only 999, that actually satisfy your condition, you throw the whole 1000 away and produce another 100.000, etcâ€¦

I also like broadcasting, but it wonâ€™t save your cpu, to have to cycle, eventually. And I would suggest a simple loop, if you donâ€™t find a more elegant way to save your function a lot of the work, it is currently doing.

1 Like

I will try it, thanks~

Played around with it, a little. On my laptopâ€¦

``````julia> @btime RandV(1,0.01,10)
3.587 ms (23 allocations: 4.75 MiB)

julia> @btime RandV(1,0.01,1000)
3.771 ms (23 allocations: 4.75 MiB)
``````

``````function RandV2!(ret::Array{Float64}, v, vstep, Num)
# initialization of ret should really be checked, here!!!
vcount = 1
while vcount < Num + 1
limit = round(v / vstep)
candidate = rand(-limit:limit,1,3)
test = candidate[1]^2 + candidate[2]^2 + candidate[3]^2
if abs(sqrt(test) * vstep - v) < 0.01 * v
ret[vcount,:] = candidate .* vstep
vcount += 1
end
end
return ret
end
``````
``````julia> res = zeros(Float64, 1000, 3) # need some preallocated mem, for all versions, here...

julia> @btime RandV2!(res, 1, 0.01, 10)
17.400 ÎĽs (76 allocations: 5.94 KiB)

julia> @btime RandV2!(res, 1, 0.01, 1_000)
7.022 ms (30054 allocations: 2.29 MiB)
``````

â€¦nice, for small Num, but worse, for larger ones, and also, I didnâ€™t manage to eliminate the allocations.
So then, I fell back to my (much more familiar) procedural - low-level - style and came up with something, that doesnâ€™t look particularly nice, but at least does the job, without allocationsâ€¦

``````function RandV3!(ret::Array{Float64}, v, vstep, Num)
# initialization of ret should really be checked, here!!!
vcount = 1
limit = round(v / vstep)
while vcount <= Num
ret[vcount,1] = rand(-limit:limit)
ret[vcount,2] = rand(-limit:limit)
ret[vcount,3] = rand(-limit:limit)
test = ret[vcount,1]^2 + ret[vcount,2]^2 + ret[vcount,3]^2
if abs(sqrt(test) * vstep - v) < 0.01 * v
ret[vcount,1] *= vstep
ret[vcount,2] *= vstep
ret[vcount,3] *= vstep
vcount += 1
end
end
return ret
end
``````
``````julia> @btime RandV3!(res, 1, 0.01, 10)
38.600 ÎĽs (0 allocations: 0 bytes)

julia> @btime RandV3!(res, 1, 0.01, 1000)
14.684 ms (0 allocations: 0 bytes)
``````

â€¦sadly, despite 0 allocations, performance went down, by ~50%, also.

So now, letâ€™s hope for someone achieving 0 allocations, with great performance, doing all the nice broadcasting and non-low-level - stuff. (Anytime, I used some of that, I ran into allocations, in this example).

Hereâ€™s a version that fills a preallocated array:

``````using StaticArrays
using LinearAlgebra: norm

function randv!(vreturn::AbstractMatrix, v, vstep)
Base.require_one_based_indexing(vreturn)
(num, n) = size(vreturn)
n == 3 || throw(ArgumentError("vreturn must have 3 columns"))
vlim = round(v/vstep)
vrange = -vlim:vlim
vcount = 0
vtest = 0.01 * v
while vcount < num
vrand = SVector{3,Float64}(rand(vrange) for _ in 1:3)
if abs(norm(vrand) * vstep - v) < vtest
vcount += 1
vreturn[vcount, :] .= vrand * vstep
end
end
return vreturn
end
``````

It doesnâ€™t allocate and is reasonably fast:

``````julia> using BenchmarkTools

julia> vreturn = zeros(10, 3);

julia> @btime randv!(vreturn, 1.0, 0.01)
4.980 ÎĽs (0 allocations: 0 bytes)
10Ă—3 Matrix{Float64}:
0.85   0.51   0.02
-0.73  -0.18  -0.67
0.57   0.12   0.81
0.95   0.12  -0.3
0.44  -0.87   0.25
0.71   0.67  -0.24
-0.26  -0.56   0.78
-0.68  -0.15  -0.72
-0.83  -0.55  -0.15
0.85   0.29   0.43
``````

Edit: Modified my original post to add a bang to the function name and put the modified argument first, plus a couple other minor edits.

4 Likes