Question, in function that create arrays inside of them, it is better to pass those arrays as arguments? For example:
function f1(x)
sto = similar(x)
sto = x .+ 1
return sto
end
function f2(sto, x)
sto = x .+ 1
return sto
end
So that in a loop I can reuse sto
without Julia having to allocate memory for every iteration:
function g1(x)
for i in 1:100
sto = f1(x)
end
end
function g2(x)
sto = similar(x)
for i in 1:100
f2(sto, x)
end
return sto
end
is that the point?
Yes, usually what I do is to let this as an alternative to the user, like this:
function f1(x;aux=similar(x))
aux .= x .+ 1 # note the broadcasting here!
return aux
end
Then the function can be called only with x
, in which case it allocates aux
, or you can pass aux
as a keyword parameter, in which it reuses what is given.
@amrods note that you need .=
to broadcast, otherwise the preallocated vector is not used for anything (this is incorrect in the code you posted there).
oh nice, i didnāt know you could reference other arguments at function definition.
There is also Random.rand!
which will fill the array with random values, and also make sure that they have the correct eltype. For example
A = rand(Int8, 3, 4);
rand!(A)
rand!
will generate random Int8
values, etc.
These are great tips - thank you for writing this.
Once thing I recently learned related to views - and if I get any of this wrong please someone correct me - is that a view still allocates. For example, if you have a hot inner loop where you use small blocks of a larger vector you might think (or at least I naively did) that you can avoid heap allocations by using something like the following. I thought I could use the view to do allocation-free access and operations on elements of A
.
A = rand(100)
B = rand(3)
function view_test(A, B)
tmp = @view A[2:4]
tmp .= tmp .* B
end
This is a silly example, but in my own hot inner loop I was reusing that view in places many times and re-creating each iteration of the loop. I ended up with tons of heap allocations everywhere that view was used (sorry if this is obvious to others!). I eventually found some posts indicating that if you have something like this you should just loop over the small block - which is what I ended up doing. In other locations with larger arrays I got around the problem by using copyto!
or just made loops there as well.
That specifically should not allocate. But you can have allocations with views if the slice is of a non-contiguous memory block (not sure if this is true, what certainly happens is that views of non-contiguous slices are not very performant).
I get this when using @btime
: 20.160 ns (1 allocation: 48 bytes)
What I obtain:
julia> @btime view_test($A, $B);
169.941 ns (0 allocations: 0 bytes)
Perhaps you didnāt interpolate the variables?
julia> @btime view_test(A, B);
195.690 ns (1 allocation: 48 bytes)
Iām amazed at how fast your system is btw
Which version of Julia are you using?
In older versions (1.0 for sure, then I donāt remember), struct
s with mutable members were heap-allocated. In the current stable and LTS versions (1.7 and 1.6, respectively), views should be stack-allocated but of course the stack-allocated data includes a reference to a heap-allocated backing array (except, of course, the cases when the backing array itself has isbits
type, e.g. UnitRange
or StaticArray
).
Iām using 1.6.2
Even with variable interpolation?
1.6.3:
julia> @btime view_test($A,$B)
80.945 ns (0 allocations: 0 bytes)
Yes, interpolating then the allocation is zero.
What does that mean then? Where is the allocation coming from when you donāt interpolate?
It is like the input variables were non-constant globals. See: julia - When to interpolate in benchmarking expressions - Stack Overflow
(I miss as well a more in depth explantation of these interpolations in BenchmarkTools, as a rule of thumb I understand that they should be always used)
Ah okay - so the point then is that the view did not create an allocation, it was just a mistake in how I was using @btime
.
Just one more follow-up that I just remembered. I think allocation still may have been happening, in my particular case. I say this because I would use julia --track-allocation=user
and Profile
(making sure to clear the profiler before running the function) and I would see memory allocations next to the lines with views. Perhaps it was because they were views of non-contiguous sections of memory as you said in an earlier post.
I tried to get allocations in those cases here, without success. Thus, Iām not sure if that still applies (or has anytime after 1.5). But it may.
At the same time I always found the output of --track-allocation=user
confusing, I usually check manually the code after profiling with that to be sure of the source of the allocations.
One more approach for the case 2. to avoid allocations when broadcasting. The package LazyArrays
allows you to do this and keep the notation simple: it only requires adding @~
to the broadcast, and you donāt need to pre-allocate the output.
Example:
using BenchmarkTools, LazyArrays
x,y = rand(100), rand(100)
tmp = similar(x)
#pre-allocating tmp
function f(x,y,tmp)
z = 0.
for i in 1:10
@. tmp = x * y
z += sum(tmp)
end
z
end
@btime f($x,$y,$tmp) # 144.617 ns (0 allocations: 0 bytes)
#using LazyArrays
function g(x,y)
z = 0.
for i in 1:10
tmp = @~ x .* y
z += sum(tmp)
end
z
end
f(x,y,tmp) == g(x,y) # true
@btime g($x,$y) # 107.120 ns (0 allocations: 0 bytes)
Why for case 2 using mapreduce (i.e. z += mapreduce(*, norm, x,y)
) still allocates ? In the doc it is explicitly said that it skip the intermediate array:
mapreduce(f, op, A::AbstractArray...; dims=:, [init])
Evaluates to the same as `reduce(op, map(f, A...); dims=dims, init=init)`, but is generally faster because the intermediate array is avoided.
I think you have a tricky case there. You are first multiplying the elements of x
and y
, which would in principle create an intermediate array z .= x .* y
and then computing the norm of z
.
Computing the norm of an array can be done in-place, but I doubt that the compiler can handle that automatically for an array that is constructed lazily.
But, to be sincere, I donĀ“t completely understand what mapreduce
is doing there (I think the docs are not clear on what it does when more than one collection is provided). For instance, if that call to mapreduce
makes sense, I would think that these two would give the same result:
julia> x = rand(10); y = rand(10);
julia> mapreduce(*, norm, x, y)
0.015890825295371242
julia> norm(xi*yi for (xi,yi) in zip(x,y))
1.0128277332826312
Or, if (thatās what I understand from the docs), the function f
should be applied to each element to the collections, in which case I think that should error, as *(x)
is not a defined operation. But that works even with a single collection, so Iām lost here:
julia> mapreduce(*, norm, [1,2])
1.0