# Non-allocating views

I recently installed Julia Version 1.5.0-beta1.0 and I was impressed with the non-allocating views. I just tried an example of LU factorisation written in vectorised style to get a feel for it, and the performance compared very well to the inbuilt `generic_lufact!` which uses straight loops.

``````swap!(x,y) = for i in eachindex(x, y) @inbounds x[i], y[i] = y[i], x[i] end

function lufact!(A, p)

m, n = size(A)
p .= 1:m;

@views @inbounds begin
for k = 1:m-1
_, kp = findmax(i->abs(A[i,k]), k:m)  #35316
if k != kp
swap!(A[k,:], A[kp,:])
swap!(view(p,k), view(p,kp))
end
A[k+1:m,k] ./= A[k,k]
for j = k+1:n
A[k+1:m,j] .-= A[k,j] .* A[k+1:m,k]
end
end
end
A, p
end
``````

Note: I used the two argument findmax from https://github.com/JuliaLang/julia/pull/35316#issuecomment-622629895 to find the maximum absolute value for the pivot. I hope this will make its official appearance in Julia shortly, because it seems very useful. But maybe there’s a better way to do this that I overlooked.

Quick timing test:

``````using LinearAlgebra:generic_lufact!
function lutest()
N = 1000; A = rand(N,N); p = zeros(Int, N);
B = copy(A); @time lufact!(B,p);
C = copy(A); @time generic_lufact!(C);
@show B ≈ C
end
``````

I found the time was very competitive (faster even) compared to `generic_lufact!` which surprised me.

``````julia> lutest()
0.115506 seconds
0.206406 seconds (1 allocation: 7.938 KiB)
B ≈ C = true
``````

By comparison, in Julia Version 1.4.2 the results were

``````julia> lutest()
0.236483 seconds (1.51 M allocations: 91.735 MiB, 1.80% gc time)
0.210450 seconds (2 allocations: 7.969 KiB)
B ≈ C = true
``````

with lots of allocations.

I’m new to Julia, so any feedback on style or substance is appreciated. I also wondered if I overlooked an inbuilt function to swap the contents of two arrays (like BLAS swap).

Just to be sure, these are second timings, right? If you just executed `lutest()` a single time in each terminal, you are measuring compilation time. Also, if my memory does not fail me, from one minor version to the other the values of `rand` can change, so this can introduce additional noise if the computation time depends on the specific values.