Is seeking a binary sequence on `IOStream` built in?

binaryio

#1

I needed a function that I’m calling “seek_binary_sequence”. I’m writing binary data files, and looking for a “magic sequence” of bytes. I think a reasonable implementation is as follows

function seek_binary_sequence(io::IOStream, seq::AbstractArray{<:Number,1})
    atype = eltype(seq)
    ix=1
    while !eof(io) && ix <= length(seq)
        anum = read(io, atype)
        ix = (anum == seq[ix]) ? ix+1 : 1
    end
end

My hunch is others will want to do this as well, which prompts 2 questions,

  1. Have I missed this implementation elsewhere in the core parts of Julia?
  2. Is this something people should implement on their own whenever they need it?
  3. Is this a reasonable candidate for adding to Base ?

Thanks in advance!


#2

This is not the same thing of what you want, but you can mimic the behavior with readuntil as follows:

julia> buf = IOBuffer("foobarbaz")
IOBuffer(data=UInt8[...], readable=true, writable=false, seekable=true, append=false, size=9, maxsize=Inf, ptr=1, mark=-1)

julia> readuntil(buf, b"ob") # find magic bytes "ob"
2-element Array{UInt8,1}:
 0x66
 0x6f

julia> read(buf, String)
"arbaz"

I don’t know other simple ways. So, perhaps adding such a function, say seekuntil, to the Base may be useful.

Also you must be careful when reading multiple bytes from an I/O stream, because the stream may assume the different byte order as you expect.


[EDIT] The code above is for Julia 0.7-dev. On Julia 0.6, you may write:

julia> buf = IOBuffer("foobarbaz")
IOBuffer(data=UInt8[...], readable=true, writable=false, seekable=true, append=false, size=9, maxsize=Inf, ptr=1, mark=-1)

julia> readuntil(buf, "ob")
"foob"

julia> String(read(buf))
"arbaz"

#3

@bicycle1885 Thank you for letting me know about readuntil. Also, that’s a good point about byte-order.

That seems like very similar functionality, except that it returns the contents up to (and including) the “magic sequence”. Wanting of seekuntil instead of readuntil may be somewhat rare, but the main advantage of seekuntil should be lower memory usage.

That said, the issue of byte-order seems to be the best reason to restrict the delim sequence to something like Vector{UInt8}.