Reading binary values from a file

I saw the following code (http://www.samuelbosch.com/2014/07/benchmarking-reading-binary-values-from.html) and I was intrigued how close the structure resembles Scheme or Lisp code (I wrote something similar in Bigloo 20 years ago):

 function readvalue(stream, position)
     seek(stream, position)
     return read(stream, Int32)
 end
 
 function readvalues(filename::String, indices)
     stream = open(filename, "r")
     try
         return Int32[readvalue(stream, index*4) for index in indices]
     finally
         close(stream)
     end
 end

The Julia code performed worst in the test (compared to Python, Go language, F#, Ocaml, with the exception of R). I am not concerned with the Julia performance but wonder what needs to be changed to make the above code Julia idiomatic?

By the way: I was also intrigued by the F# version:

open System
open System.IO
 
 let readValue (reader:BinaryReader) cellIndex = 
     reader.BaseStream.Seek(int64 (cellIndex*4), SeekOrigin.Begin)  ignore
     match reader.ReadInt32() with
     | Int32.MinValue -> None
     | v -> Some(v)
         
 let readValues indices fileName = 
     use reader = new BinaryReader(File.Open(fileName, FileMode.Open, FileAccess.Read, FileShare.Read))
     let values = Array.map (readValue reader) indices
     values

It is more idiomatic to use a do block with the open function:

function readvalues(filename::AbstractString, indices)
    stream = open(filename, "r") do stream
        return [readvalue(stream, 4*index - 3) for index in indices]
    end
end

Also note that indices in Julia is by default 1-based, so you need to account for this in calling the seek function.

Edit: See correction of this code in the follow-on post below.

1 Like

Hi Peter, why do we need stream = ..., in:

stream = open(filename, "r") do stream ... ?

Also, shouldn’t it be: 4*index - 4, in:

readvalue(stream, 4*index - 3) ?

See a working example here:

function readvalue(stream, position)
    seek(stream, position)
    return read(stream, Int32)
end

function readvalues(filename::AbstractString, indices)
    open(filename, "r") do stream
        return [readvalue(stream, 4*index-4) for index in indices]
    end
end

# Writing test file:
x = Int32.(1:1000)
open("array_Int32.bin", "w") do io
    write(io, x, Int32[])
end

# Reading test file:
y = readvalues("array_Int32.bin", 1:10:1000)

Thanks.

1 Like

You’re right on both points. The stream = was an oversight left in from the OP’s code, but the 4*index-3 is a serious mistake. I wasn’t careful enough checking the semantics of seek, where I assumed that it was also 1-based! Thanks for the catch.

1 Like