Hello everyone! I’ve been trying to convert some simple functions that were originally implemented in Matlab into Julia. They revolve around vector/matrix and matrix/matrix multiplication.

My question stems from trying to understand how to best optimize some code for both speed and efficiency. Following the performance tips, its recommend to pre-allocate the outputs which would cut down on the amount of memory that is allocated each time the function is called. Possibly by using mul!() or matmul!() but in this example, I saw no real difference on the run time of each function or even on the number of allocations. Ideally, I should be pre-allocating to avoid hitting garbage collection but even in this simple example it appears to not help.

Is my only real option for speed up to switch over to GPUs or is there a subtler more efficient way to preform these calculations? I’ve an small example of the code with the benchmarks from my system.

```
n = 1000
pts = 256
m = [pts pts]
Test_mat = randn(ComplexF64,n,prod(m))
Z = randn(typeof(Test_mat[1,1]),m[1],m[2])
function Afwd(x,Enc)
return Enc*vec(x)
end
function Aadj(b,Enc,m)
return reshape(BLAS.gemv('C',Enc,b),(m[1],m[2]))
end
function Aadj_Afwd(x, Enc, m)
return Aadj(Afwd(x,Enc),Enc,m)
end
test = Afwd(vec(Z),Test_mat);
@btime Afwd($Z, $Test_mat)
@btime Aadj($test, $Test_mat, $m)
@btime Aadj_Afwd($Z, $Test_mat, $m)
56.917 ms (3 allocations: 15.83 KiB)
52.040 ms (4 allocations: 1.00 MiB)
104.370 ms (7 allocations: 1.02 MiB)
```