I am trying to make a super-fast QR decomposition with 0 allocations if possible. At this point https://github.com/mohamed82008/InplaceQR.jl is at 2 allocations: 64 bytes
and within a factor of 8 of LAPACK’s multi-threaded qrfact!
when using julia.exe --check-bounds=no -O3
so I am happy with it, but just curious where are these 2 allocations coming from? Is it just the REPL?
Thanks in advance.