Would anyone be able to offer some insight towards getting faster performance for the below sort of use case?
Say I have a struct that holds a Matrix and my primary goal is to fill it with values, possibly transposed, and it could be a very large matrix:
struct mytest
mat::Matrix
end
Now to define dimensions and create some binary data for the sake of a MWE for testing performance:
m = 5000
n = 50000
dtype = Float32
nbytes = sizeof(dtype) * m * n
bytes = rand(UInt8, nbytes)
tempfile = "tmp"
open(tempfile, "w") do fid
write(fid, bytes)
end
Defining a few different functions for reading the sample data:
function testread1(fid::IOStream, dtype::Type, m::Int, n::Int)
seek(fid, 0)
nbytes = sizeof(dtype) * m * n
temp = zeros(UInt8, nbytes)
readbytes!(fid, temp, nbytes)
temp2 = reinterpret(dtype, temp)
return reshape(temp2, (m, n))'
end
function testread2(fid::IOStream, dtype::Type, m::Int, n::Int)
seek(fid, 0)
read!(fid, Array{dtype}(undef, m, n))'
end
function testread3(fid::IOStream, dtype::Type, m::Int, n::Int)::Matrix
seek(fid, 0)
read!(fid, Array{dtype}(undef, m, n))'
end
And finally testing performance of both reading the data and then allocating a mytest
struct:
using BenchmarkTools
fid = open(tempfile, "r")
@benchmark test1 = testread1(fid, dtype, m, n) # median 0.159 s
@benchmark test2 = testread2(fid, dtype, m, n) # median 0.129 s
@benchmark test3 = testread3(fid, dtype, m, n) # median 1.182 s
@benchmark a = mytest(test1) # median 1.239 s
@benchmark b = mytest(test2) # median 1.241 s
@benchmark c = mytest(test3) # median 0.00074 s
As you can see, testread1() and testread2() were fast but they return an adjoint(::Matrix{Float32})
or similar, which is then costly to finally convert into a plain Matrix when I allocate a mytest
struct from it.
Any advice?
Update: I realize it’s basically the transpose operation that’s hurting performance, and I guess that’s the heart of my question. I find that it often comes up in my work that I need to transpose a matrix, or reshape it, etc, and then use it in a context that strictly requires a Matrix
, not an adjoint
for example.