Binary-read fixed-length String

I was wondering how to read a fixed-length character string from a binary file. My file is formatted like this:

Int32|4-byte string|array of Float32's . . . 

The string consists of just four plain ASCII characters without null termination.

So far, I’ve been successful in reading the first integer

open("datafile.dat", "r") do io
  header = read(io, Int32)
  println(header)
  str = read(io, 4) # four-byte string
  println(str)
end

For the string, I get a Vector{UInt8}. So, how do you convert it to a String?

Example:

julia> v=UInt8[0x54,0x68,0x69,0x73]
4-element Vector{UInt8}:
 0x54
 0x68
 0x69
 0x73

julia> join(Array{Char}(v))
"This"
1 Like

even simpler:

julia> v=UInt8[0x54,0x68,0x69,0x73]
4-element Vector{UInt8}:
 0x54
 0x68
 0x69
 0x73

julia> String(v)  # just return this
"This"

julia> v
UInt8[]

this is non-copy

3 Likes

Oh, wow… hmm, my example is from an older code of mine, has something changed from an earlier version? Or did I just missed how easy it can be? Probably the last :wink:

Thank you both for the solutions!


I’m finding that Julia’s binary reading isn’t as simple as Fortran’s. The datafile I’m reading was actually created by a Fortran program. To read it in Fortran, it’s just

integer:: iflag
character(4):: str
real(8):: arr(100,200)
open(newunit=uni, access="STREAM",  . . .)
read(uni) iflag, str, arr

while Julia uses at least three different methods:

iflag = read(io, Int32)
str = String(read(io, 4)) # four-byte string
arr = Array{Float32}(undef, (100,200))
read!(io,arr)

So, I was wondering if it’s possible to write a generic binary-read function, read_bin!(v...), which can read objects of simple types:

iflag::Int32 = 0
str::String = ""
arr = Array{Float32}(undef, (100,200))
read_bin!(io, iflag, (str, 4), arr)

or

iflag, str, arr = read_bin(io, Int32, (String, 4), (Array{Float32, 2}, (100, 200)) )

those two seems to contain same amount of typing while Julia doesn’t have random things like newunit=uni, access="STREAM" then somehow uni is now a variable and also config is done by remembering string? (I speak eps() Fortran sorry)

Btw in case you like this better:

iflag = read(io, Int32)
str = String(read(io, 4)) # four-byte string
arr = reshape(reinterpret(Float32, read(io)), 100, 200)
4 Likes

Like @jling says reading strings is pretty ok

str = String(read(io, 4)) # four-byte string

Reading Arrays seems too hard, on the other hand.
I have openned https://github.com/JuliaLang/julia/issues/41865

3 Likes

those two seems to contain same amount of typing while Julia doesn’t have random things like newunit=uni, access=“STREAM”

When I said “at least three different methods”, I didn’t mean the amount of typing. I meant inconsistency. In addition to the three ways, now reshape and reinterpret have been thrown in.

You have to learn several different things to do “the same” thing.

In Fortran, newunit and access are regular parts of the open statement. You consistently use them.

So, I appreciate the efforts of @oxinabox to bring some consistency.

As he/she discusses in the other thread, this user interface

iflag = read(io, Int32)

is desirable for Arrays, too.

I mean you’re just moving 4 and String to different places, language will need that information anyway, in Julia, String is not a vector of Chars (think Unicodes…) so you need to read 4 bytes first and then say: it’s a String.

Similarly for Array, the data in file is not 2D and grouped in bytes, so you will need an order, now it happens to be the in-file order is the same as in-memory order (column major), but again, it’s not guaranteed, imagine the file is output by Numpy.

I agree it will be easier if we have a:

read(io, Vector{T}, n)

but the shape info, you will have to deal with it.

1 Like