Readdlm with IOBuffer

I have a ASCII datafile that has a number of different vectors and matrices within it. Once I have the file read, I can figure out what is vector and what is a matrix. I thought I would use readdlm to read the matrix, but ran into a problem. I expect CSV would work, but I first want to explore readdlm. Any help appreciated. A MWE is:

  julia> using DelimitedFiles

  julia> x = [1; 2; 3; 4];

  julia> y = [5; 6; 7; 8];

  julia> open("delim_file.txt", "w") do io
             writedlm(io, [x y])
         end;

julia> a = readlines("delim_file.txt")

julia> readdlm(IOBuffer(a), '\t', Int, '\n')
ERROR: MethodError: no method matching IOBuffer(::Vector{String})
The type `IOBuffer` exists, but no method is defined for this combination of argument types when trying to construct it.

Closest candidates are:
  IOBuffer(::SubString{String})
   @ Base strings\io.jl:309
  IOBuffer(::String)
   @ Base strings\io.jl:308
  IOBuffer(; read, write, append, truncate, maxsize, sizehint)
   @ Base iobuffer.jl:245
  ...

It’s unclear how you want to generalize this but the basic use of IOBuffer is

julia> a=read("delim_file.txt")
16-element Vector{UInt8}:
 0x31
 0x09
 0x35
 0x0a
 0x32
 0x09
 0x36
 0x0a
 0x33
 0x09
 0x37
 0x0a
 0x34
 0x09
 0x38
 0x0a

julia> readdlm(IOBuffer(a), '\t', Int, '\n')
4Ă—2 Matrix{Int64}:
 1  5
 2  6
 3  7
 4  8

You can also do things like

julia> b = IOBuffer()
IOBuffer(data=UInt8[...], readable=true, writable=true, seekable=true, append=false, size=0, maxsize=Inf, ptr=1, mark=-1)

julia> for line in readlines("delim_file.txt")
           write(b, line, "\n")
       end

julia> seekstart(b)
IOBuffer(data=UInt8[...], readable=true, writable=true, seekable=true, append=false, size=16, maxsize=Inf, ptr=1, mark=-1)

julia> readdlm(b, '\t', Int, '\n')
4Ă—2 Matrix{Int64}:
 1  5
 2  6
 3  7
 4  8

which maybe is more easily adapted to what you want to do.

A lot easier and more efficient to do b = IOBuffer(read("delim_file.txt", String)) to read the whole file at once.

I believe that’s what I did in my first reply but it might not generalize so well to the requirement of “Once I have the file read, I can figure out what is vector and what is a matrix”.

In my use case, I have file header material, a blank line, then block header material, another blank line, matrix data, blank line, repeat 2x, then final block header material, blank line and a final vector with a blank line at the end.

I read in the vector as shown in the MWE search for blank lines as my block delimiters using,

data1 = readlines(filename1)
blanklines = Int[]
for (i, data) in enumerate(data1)
    if isempty(data)
        blanklines = push!(blanklines, i)
    end
end

And then @GunnarFarneback method works perfectly for extracting the data for each block. The MWE was too minimal to show this detail.

Easier to write:

blanklines = findall(isempty, data1)

Note also that if you have an array data of lines as is returned by readlines, you can just do

io = IOBuffer(join(data[i:j], '\n'))

to join lines i:j into an IOBuffer rather than a loop of write statements.

(There are also various ways to make this more efficient and avoid copying data, but I’m guessing simplicity is the main goal here, since if you cared about efficiency then CSV.jl is usually a lot faster than readdlm.)