# Scalar multiplication makes array reallocation

Hi everyone,

While testing the fastest approach to multiply a matrix to a vector a discovered a nasty fact about reallocation.
Multiplying a matrix for a constant-defined value (namely a scalar) makes Julia reallocate the array, even within the `@views` scope. If the multiplication does not happen, the memory is never reallocated.

Could you please explain to me why?

The code below shows what I mean

``````function network_test()
n =1000
w = rand(3,n,n) .- 0.5
r = 1:n
state = falses(n)
out = ones(Float64,n)
for tt in 1:2000
state = rand([false,true],n)
@views out = w[3,:,:]*state
end
return  w
end

function network_scalar()
n =1000
w = rand(3,n,n) .- 0.5
r = 1:n
state = falses(n)
out = ones(Float64,n)
for tt in 1:2000
#

state = rand([false,true],n)
@views out = 0.1*w[3,:,:]*state
end
return  w
end

using BenchmarkTools
@btime network_test()
#  2.618 s (12006 allocations: 79.20 MiB)
@btime network_scalar()
#  7.480 s (14006 allocations: 14.98 GiB)

``````

I think what you want is to create the view, then broadcast `setindex!`

``````@views out = w[3,:,:]
out .*= 0.1 .* state
``````

That shouldn’t allocate. But you are intending to modify the original array?

Maybe use `.` to broadcast

``````julia> w = rand(1_000_000);

julia> @time w = w + w;
0.004330 seconds (2 allocations: 7.629 MiB)

julia> @time w = w + w;
0.004495 seconds (2 allocations: 7.629 MiB)

julia> @time w .= w .+ w;
0.000981 seconds (2 allocations: 48 bytes)

julia> @time w .= w .+ w;
0.001092 seconds (2 allocations: 48 bytes)
``````
1 Like

It would be interesting to understand, what is going on behind the scene, but one way to solve it is to use parenthesis:

``````@views out .= 0.1 .* (w[3,:,:] * state)
``````

On a side note, you can also reduce number of allocations by changing `state` generation to

``````state = rand((false,true),n)
``````

Maybe the first thing to know is that `*` proceeds left to right:

``````julia> @which 0.1*w[3,:,:]*state  # @less will show the code
*(a, b, c, xs...) in Base at operators.jl:538
``````

and so this takes the view, multiplies it by a scalar to make a matrix, and then multiplies that by the vector. It would be more efficient from this perspective to write `0.1 * (w[3,:,:]*state)` which is the same as `0.1 .* w[3,:,:]*state`, or better `lmul!(0.1, w[3,:,:]*state)`. (Soon, perhaps, https://github.com/JuliaLang/julia/pull/37898 may automate such things.) If `out` was pre-allocated, you could also write `mul!(out, w[3,:,:], state, 0.1, 0)`.

``````julia> using LinearAlgebra

julia> out = rand(1000); w = rand(3,1000,1000) .- 0.5; state = rand(Bool, 1000);

julia> @btime (0.1 * (@view \$w[3,:,:])) * \$state; # allocates a new matrix, then a vector
1.853 ms (4 allocations: 7.64 MiB)

julia> @btime 0.1 * ((@view \$w[3,:,:]) * \$state); # allocates two vectors
1.193 ms (5 allocations: 23.89 KiB)

julia> @btime mul!(\$out, \$(@view w[3,:,:]), \$state, 0.1, 0.0); # no allocation
1.189 ms (0 allocations: 0 bytes)
``````

But the other thing to know is that you aren’t hitting the fast `*` here at all, because the element types don’t match, and because the view of `w` isn’t a happy one. You might do better to re-arrange the dimensions of `w`, and to take `state = rand((1.0, 0.0),n)` instead.

``````julia> @btime mul!(\$out, \$(@view w[3,:,:]), \$(state .+ 0.0)); # both Float64
1.186 ms (2 allocations: 80 bytes)

julia> @btime mul!(\$out, \$(w[3,:,:]), \$(state .+ 0.0)); # BLAS
151.750 μs (0 allocations: 0 bytes)

julia> w2 = permutedims(w,(2,3,1)); size(w2)
(1000, 1000, 3)

julia> @btime mul!(\$out, \$(@view w2[:,:,3]), \$(state .+ 0.0)); # safe view
151.850 μs (0 allocations: 0 bytes)

julia> strides(@view w[3,:,:]) # problem
(3, 3000)

julia> strides(@view w2[:,:,3]) # no problem
(1, 1000)
``````
4 Likes

`@views` only affect indexing expressions. So the above is equivalent to

``````out = 0.1 * view(w, 3, :, :) * state
``````

so you don’t allocate anything in the slicing operation. But both `view(w,3,:,:) * state` will allocate a completely new array, and multiplication with 0.1 will also allocate a new array. There is also no in-place copying into `out` going on here.

You have to make your operations in-place, the `view` only helps with the slicing.

A couple of other remarks:

These pre-allocations have no effect:

``````state = falses(n)
out = ones(Float64,n)
``````

because you over-write them later (not in-place, just new assignment), so they are wasted. In other words, you re-use the labels `state` and `out`, but you don’t re-use the arrays that those labels were applied to.

Also, `false(n)` is very different from `rand([false, true], n)`. The former creates a `BitVector`, while the latter makes a vector of `Bool`s.

For a vector of random `true/false` use either `state = rand(Bool, n)` or `state = Random.bitrand(n)`. Then, in order to update this vector in-place, write `rand!(state)`.

3 Likes

Hi DNF,

How to make the operations in place?
I thought that pre-defining the variable out of the loop-cycle was enough!
Also ´cause this loop is much more inefficient (3x) although the allocations are less

``````function network_scalar()
n =1000
w = rand(3,n,n) .- 0.5
r = 1:n
state = falses(n)
out = ones(Float64,n)
for tt in 1:2000
#

for cc in r
for dd in r
@views out[cc] =w[3,cc,dd]*state[dd]
end
end
end
return  w
end

## with matrix multiplication
2.357 s (12006 allocations: 79.20 MiB)

## with indexing
12.710 s (7 allocations: 45.78 MiB)
``````

I was writing an answer, but realized I don’t understand what you are trying to do. Why do you return `w` from your function? Nothing has happened to that.

The answer from @mcabbott shows how you can do this using `mul!`. Here’s an example without the benchmarking code:

``````function network_scalar()
n = 1000
w = rand(n, n, 3) .- 0.5  # this is better than rand(3, n, n)
state = rand((0.0, 1.0), n)
out = ones(Float64, n)
for tt in 1:2000
state .= (rand.() .< 0.5)
wv = @view w[:, :, 3]
mul!(out, wv, state, 0.1, 0.0)
end
return out
end
``````

I changed `w` from `rand(3, n, n)` to `rand(n, n, 3)`, that makes a huge difference in performance.

1 Like

Yes, that’s right.
Itis because this is the minimal working example of 600 lines code… Which also have other trade-offs (like have a boolean state array)

So a meaningful example would be

``````function network_test()
n =1000
w = rand(3,n,n) .- 0.5
r = 1:n
state = falses(n)
out = zeros(Float64,n)
for tt in 1:2000
for cc in n
state[cc] = rand((0.,1.))
end
@views out = w[3,:,:]*state
end
return  out,w
end

``````

Ok, that’s fine.

Can you explain to me why the `(n,n,3)` is better than `(n,n,3)` ?
I thought that being Julia column-major you always want to set the accessed dimensions at the outmost of the matrix.

Thanks

It’s because Julia arrays are column-major that `(n,n,3)` is better than `(3,n,n)`. You want to slice along the innermost dimensions of the array, not the outermost.

It’s the same reason that it is faster to loop like this

``````for j in 1:n
for i in 1:m
x[i, j] = foo()
end
end
``````

than like this

``````for i in 1:m
for j in 1:n
x[i, j] = foo()
end
end
``````

Ok, I misunderstood it Thanks

Here you can see the difference between row- and column-major, and why it is better to move down along the columns for column major:

1 Like