How do you get a Matrix back from a file?

When you pass a Matrix to write like so

A = rand(UInt, 20,20)
write("mat.bin", A) 

It works just fine. No errors.
But how do you get the matrix back out?
The docs show the string example which leads one to try

Matrix(read("mat.bin"))

or

Matrix{UInt}(read("mat.bin"))

But both throw an error.

How do you get a Matrix{UInt} from a Vector{UInt8}?

The String example is straightforward because UTF-8 already specifies how a bunch bytes (UInt8) should be interpreted to become a string. For a Matrix, such a straight conversion cannot work because the dimension information about the matrix isn’t stored anywhere, and is necessary to create it.

You can get the data as a vector of UInts first with reinterpret:

julia> reinterpret(UInt, read("uints.bin"))
8-element reinterpret(UInt64, ::Vector{UInt8}):
 0x0000000000000001
 0x0000000000002710
 0x000000000000000a
 0x0000000000002694
 0x0000000000000064
 0x000000000000d431
 0x00000000000003e8
 0x0000000000008b7f

This was a dummy 2x4 UInt matrix I created and used write to write it into "uints.bin". We can then use reshape to turn it back into a matrix, specifying the dimension sizes:

julia> reshape(reinterpret(UInt, read("uints.bin")), 2, 4)
2Ă—4 reshape(reinterpret(UInt64, ::Vector{UInt8}), 2, 4) with eltype UInt64:
 0x0000000000000001  0x000000000000000a  0x0000000000000064  0x00000000000003e8
 0x0000000000002710  0x0000000000002694  0x000000000000d431  0x0000000000008b7f

julia> A #the original matrix
2Ă—4 Matrix{UInt64}:
 0x0000000000000001  0x000000000000000a  0x0000000000000064  0x00000000000003e8
 0x0000000000002710  0x0000000000002694  0x000000000000d431  0x0000000000008b7f

It’s likely the wrong approach, it only stores the values, not dimensions here 20, 20, so it might as well be a vector of 400. Only do this in very few cases, where you know the dimensions of the data to be static (in case you want to write more to the file). What you likely want is some kind of serialization system, there are many, but this one works, and I think it may be the best one:

Well neither is it straightforward (if you want to store more than one string in a file), because I checked and it stores the String exactly, in my test “Palli”, my name as 5 bytes, not even a trailing \0 or length, so write, and its docs are rather useless and misleading. I feel it’s missing a note, stating most should rather use JLD2.jl (and some software, likely it, might use write indirectly for you).

1 Like

Can confirm, the docs are not very helpful in this area.

For the most simple applications, you might also consider Serialization, which is in the standard library:

using Serialization
A = rand(UInt, 20, 20)
serialize("mat.jls", A)
B = deserialize("mat.jls")

This method is not stable across minor versions (1.7 vs 1.8, say), but for shorter term solutions I think it is pretty nice.

1 Like

That would be a good reason to avoid it then? But the situation is a bit more complex though:

The data format can change in minor (1.x) Julia releases, but files written by prior 1.x versions will remain readable. The main exception to this is when the definition of a type in an external package changes. If that occurs, it may be necessary to specify an explicit compatible version of the affected package in your environment. Renaming functions, even private functions, inside packages can also put existing files out of sync. Anonymous functions require special care: because their names are automatically generated, minor code changes can cause them to be renamed. Serializing anonymous functions should be avoided

I don’t recall the pros and cons of all the options, should JLD2.jl me promoted over this?