# How do I use `@view` correctly?

I have a function that looked like the following. The explanation follows (which is fairly simple and probably easier to digest than the code).

``````    function main(mydat)
for t = 1:141
daily_sum = 0
for st = 1:50
daily_sum += rand(mydat[st][:, t])
end
national_vec[t] = daily_sum
end
return national_vec
end
``````

Let me explain. The argument `mydat` is an array of matrices (i.e. `Array{Array{T,2} where T,1}`. Each element of `mydat` is a (20000, 141) matrix. So `mydat[1]`, `mydat[2]`, β¦ are all matrices and theres 50 of them. Each matrix as 141 columns and 20000 rows. You can think of this as βdayβ from 1 to 141 where each day has 20000 observations.

So `mydat[1][:, 1]` gives me the 20000 observations on day 1 for matrix 1.

Similarly, `mydat[2][:, 1]` gives me the 20000 observations from day 1 for matrix 2.

What I am trying to do is simply build up `national_vec` which is just 141 points (corresponding to each day). For each day `t`, I want `national_vec[t]` to simply be the sum of a randomly chosen observation on that day for all states.

So the line `rand(mydat[st][:, t])` picks a random number from the `t`'th day for matrix `st.

But this is really slow. It takes about 2.5 seconds, and takes about 10gb in total memory allocations. Not to mention, as soon as I try to parallelize this or run this function a few times in `map` or something, it basically halts.

I think the problem is `rand(mydat[st][:, t])`. From my reading the codeβ¦ this makes a copy of the very large matrix. And given the double for loop I have 141 * 50 copies everytime I run the function.

So how can I avoid all of these allocations? I tried using a `@view`, but it didnβt do anything:

``````        for st = 1:50
sl_df = @view mydat[st]
for t = 1:141
sl_df_t = sl_df[1][:, t]. ## us sl_df[1] to "materialize" the view
r_val = rand(sl_df_t)
national_vec[t] += r_val
end
end
``````

(notice the for loop is switched now)β¦ but I think materializing the view itself makes a copy. Is there a way this could be done with less allocations?

That would mean the element type of the inner matrices are abstract. Itβs be better to have it be `Array{Array{T,2},1} where T`.

When you

``````mydat[1][:, 1]
``````

The `mydat[1]` does not allocate, but the `[:,1]` does. That is, you want this

``````        for st = 1:50
sl_df = mydat[st]
for t = 1:141
sl_df_t = @view sl_df[:, t]. ## us sl_df[1] to "materialize" the view
r_val = rand(sl_df_t)
national_vec[t] += r_val
end
end
``````
2 Likes

Wow. You are correct.

`````` Section   ncalls     time   %tot     avg     alloc   %tot      avg
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
the map        1    2.46s   100%   2.46s   10.5GiB  100%   10.5GiB
``````

to

`````` Section   ncalls     time   %tot     avg     alloc   %tot      avg
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
the map        1   35.7ms   100%  35.7ms   7.54MiB  100%   7.54MiB
``````

So how did you know that `mydat[st]` does not allocation? but the slice does?

And a follow up question.

``````    df_of_states = Array{DataFrame, 1}(undef, 50)
for (i, vs) in enumerate(validstates)
fn = "/data/Seyed/UST_\$(vs)_posterior_y.dat"
df_of_states[i] = CSV.File(fn, header=false) |> DataFrame
end
return convert(Array{Matrix, 1}, df_of_states)
``````

This is how I am making an array of matrix. I am reading files as a dataframe and then converting them to a Matrix. I suppose its probably faster to use `dlmread`

Are the elements homogeneous?
E.g., are they all `Float64`? If so, convert it to `Vector{Matrix{Float64}}` instead.
This should get rid of more of the memory allocations:

``````julia> mydat = [rand(20_000,141) for _ β 1:50];

julia> function main(mydat)
x = first(mydat)
national_vec = similar(x, size(x,2))
for st = eachindex(mydat)
sl_df = mydat[st]
for t = axes(sl_df,2)
sl_df_t = @view sl_df[:, t] ## us sl_df[1] to "materialize" the view
r_val = rand(sl_df_t)
national_vec[t] += r_val
end
end
national_vec
end
main (generic function with 1 method)

julia> @time main(mydat)' # compiles the first time
0.021275 seconds (31.09 k allocations: 1.690 MiB)
1Γ141 LinearAlgebra.Adjoint{Float64,Array{Float64,1}}:
24.7744  25.1805  24.2569  26.9104  22.9285  28.679  24.1086  24.8078  26.9158  25.7687  23.9178  22.8186  28.3685  23.6111  22.9678  19.9831  26.3506  23.1235  23.1952  β¦  24.9806  27.7651  23.4628  24.1254  29.1692  23.8009  23.5058  25.9814  23.6158  25.0811  22.2461  24.6074  25.5613  27.114  22.1449  28.2436  25.8054  27.8247

julia> @time main(mydat)' # <1ms, hardly any memory allocations
0.000402 seconds (2 allocations: 1.234 KiB)
1Γ141 LinearAlgebra.Adjoint{Float64,Array{Float64,1}}:
20.7249  23.2148  23.2165  24.4314  23.5861  23.9752  24.4532  27.1991  21.289  23.4638  22.1045  26.9317  24.4251  25.6509  27.1677  25.6708  23.3995  24.7414  23.7216  β¦  24.4308  26.4543  26.9126  23.4689  23.9522  23.1952  23.0667  26.0824  26.2722  25.1874  23.0188  25.8316  23.726  23.1942  21.0519  27.7204  24.5987  26.3101
``````

Try to make them concrete.

An array of arrays (and also an array of `mutable struct`s) will not store them inline.
In general, accessing elements of `Array`s of concrete types will not allocate in general.
Thatβs because it wont create new objects that need allocation.
What this means is that when you have something like a vector of vectors:

``````julia> vec_of_vecs = [rand(8) for i β 1:5]
5-element Array{Array{Float64,1},1}:
[0.9170226170394657, 0.24361746965027598, 0.8301577134517231, 0.5374783489713422, 0.07460097188909143, 0.7868810900488112, 0.7650236967046415, 0.4136822753710694]
[0.9567577494615755, 0.18267899561125578, 0.19100167151042924, 0.984123525629351, 0.5128171583998884, 0.32186583064833485, 0.9715644235222682, 0.29006124045439363]
[0.5699382085517086, 0.927792460722215, 0.2583960273596917, 0.9675822582985902, 0.39329179938669423, 0.4119966241377919, 0.4513878209263098, 0.39815631590925693]
[0.9431395813740855, 0.4572581366779722, 0.5524381532157661, 0.6334366382018066, 0.27460150482608103, 0.9625791801615522, 0.5275377189656854, 0.7237251842303987]
[0.33712726927204417, 0.5967460062016541, 0.6314347597686307, 0.9227711499841353, 0.08778516934849834, 0.03663703579139499, 0.37827512792773477, 0.8189945274507124]
``````

If get one of them:

``````x = vec_of_vecs[1]
``````

This did not make a copy; `x` is the same array as is located in `vec_of_vecs[1]`. If we change the first element of `x`:

``````julia> x[1] = -100000
-100000
``````

That also changes `vec_of_vecs`:

``````julia> vec_of_vecs
5-element Array{Array{Float64,1},1}:
[-100000.0, 0.24361746965027598, 0.8301577134517231, 0.5374783489713422, 0.07460097188909143, 0.7868810900488112, 0.7650236967046415, 0.4136822753710694]
[0.9567577494615755, 0.18267899561125578, 0.19100167151042924, 0.984123525629351, 0.5128171583998884, 0.32186583064833485, 0.9715644235222682, 0.29006124045439363]
[0.5699382085517086, 0.927792460722215, 0.2583960273596917, 0.9675822582985902, 0.39329179938669423, 0.4119966241377919, 0.4513878209263098, 0.39815631590925693]
[0.9431395813740855, 0.4572581366779722, 0.5524381532157661, 0.6334366382018066, 0.27460150482608103, 0.9625791801615522, 0.5275377189656854, 0.7237251842303987]
[0.33712726927204417, 0.5967460062016541, 0.6314347597686307, 0.9227711499841353, 0.08778516934849834, 0.03663703579139499, 0.37827512792773477, 0.8189945274507124]
``````

If you think of arrays as buckets, that means we bound the bucket in `vec_of_vecs` to `x`, and now we swapped one of the items in that bucket for another.
If we bind `x` to a different bucket, that wont change the first element of `vec_of_vecs`:

``````julia> x = rand(3)
3-element Array{Float64,1}:
0.17031155470520654
0.3232604424593408
0.45040474685377463
``````

That does not change

``````julia> vec_of_vecs
5-element Array{Array{Float64,1},1}:
[-100000.0, 0.24361746965027598, 0.8301577134517231, 0.5374783489713422, 0.07460097188909143, 0.7868810900488112, 0.7650236967046415, 0.4136822753710694]
[0.9567577494615755, 0.18267899561125578, 0.19100167151042924, 0.984123525629351, 0.5128171583998884, 0.32186583064833485, 0.9715644235222682, 0.29006124045439363]
[0.5699382085517086, 0.927792460722215, 0.2583960273596917, 0.9675822582985902, 0.39329179938669423, 0.4119966241377919, 0.4513878209263098, 0.39815631590925693]
[0.9431395813740855, 0.4572581366779722, 0.5524381532157661, 0.6334366382018066, 0.27460150482608103, 0.9625791801615522, 0.5275377189656854, 0.7237251842303987]
[0.33712726927204417, 0.5967460062016541, 0.6314347597686307, 0.9227711499841353, 0.08778516934849834, 0.03663703579139499, 0.37827512792773477, 0.8189945274507124]
``````

Changing a binding like this wont allocate memory, but creating a new array/bucket will.
So when reading and writing code, pay attention to the difference in `x = y` and `x .= y`. The former assigns a bucket, the second copies elements from one to another.

2 Likes

In addition to what @Elrod said above, indexing an `Array` with an integer will never make a copy; indexing an array with a range will always make a copy (unless you use `@view`).

So no matter what the element type or number of dimensions of `A` is, `A[1]`, `A[1, 1]`, etc. will never make a copy, but `A[1:10]` and `A[:]` and `A[:, 1]` will all make a copy (unless you use `@view`).

8 Likes