Loading first few lines from data file

#1

Hi, I have a data file with 100 rows of data of the form
1 2
3 4
5 6.1

I want to load the first 10 lines as a 10X2 array. However, I do not see any option in readdlm() for doing that, although it should be an easy task.

0 Likes

#2

If the file is only 100 rows surely it won’t take more than a fraction of a second to read it, so wouldn’t you just do:

a = readdlm("my_100_row_file.txt")[1:10]

readdlm doesn’t seem to have an option to specify the number of rows to read, but you could easily do something yourself along the lines of:

function read_first_n(filename, n = 10)
  a = Array{Any}(undef, n)
  open(filename) do file
      for (i, ln) in enumerate(eachline(file))
        if i <= n
          a[i] = ln
        else 
          break
        end
      end
    end
    return a
end

or if you want to make your life easier just use the CSV package which is a lot more fully featured than the DelimitedFiles stdlib:

using CSV
CSV.read("my_100_row_file.txt", delim = ' ', rows = 10)
1 Like

#3

I also don’t see how to do it with readdlm, but a combination of Iterators.take and eachline makes this quite easy to do yourself. In the example below, I read the first 3 lines only, and specify the type to be Float64:

julia> map.(s -> parse(Float64, s), split.(Iterators.take(eachline("test.txt"), 3), ' '))
3-element Array{Array{Float64,1},1}:
 [1.0, 2.0]
 [3.0, 4.0]
 [5.0, 6.1]
0 Likes

#4

Since you can pass any IO handle to readdlm, you can do something like this:

open(`head -n10 $file`) do io
    readdlm(io)
end

Might be possible to just pass the command object directly to readdlm. I’m not at a computer so I haven’t tried any of this. As a philosophical matter, we try to avoid APIs with lots of options (head/tail/etc options to every function that reads a file) in favor of composable constructs (passing IO objects that can do some form of trucation to functions).

4 Likes

#5

Thank you for the reply. However, after having installed and loaded “Iterators”, I get the following error message. Perhaps there is an error in your code.

x = map.(s -> parse(Float64, s), split.(Iterators.take(eachline(“Data.txt”), 3), ’ '))

ERROR: MethodError: no method matching size(::##16#18)
Closest candidates are:
size{N}(::Any, ::Integer, ::Integer, ::Integer…) at abstractarray.jl:48
size(::BitArray{1}) at bitarray.jl:39
size(::BitArray{1}, ::Any) at bitarray.jl:43

in broadcast_shape(::Function, ::Base.Take{EachLine}, ::Char, ::Vararg{Char,N}) at ./broadcast.jl:31
in broadcast_t(::Function, ::Type{Any}, ::Function, ::Vararg{Any,N}) at ./broadcast.jl:213
in broadcast(::Function, ::Function, ::Base.Take{EachLine}, ::Char) at ./broadcast.jl:230

0 Likes

#6

Thank you, this one actually worked. But when I said “10” in my posted question, I basically meant some arbitrary number. Is it possible to pass an argument N into the open() command ?

0 Likes

#7

Thank you for the reply. But there seems to be an error with the word undef in your second line. The following error message is created.

ERROR: UndefVarError: undef not defined

0 Likes

#8

Of couse, you can do. Try this way:

n = 10
file = "somefile.txt"
open(readlines, `head -n $(n) $(file)`)
1 Like

#9

No. I think what’s happening here is that you’re using Julia 0.6? If that’s the case, I’d strongly recommend pausing whatever development you’re doing and focusing on upgrading to Julia 1+.

I like the readdlm version, to avoid having to write the parsing logic yourself, I’m just a bit concerned about going through an external command that way. It doesn’t seem platform independent?

1 Like

#10

My version is being given as v"0.5.1-pre+31", so I think it means 0.5.1. Is that too backdated ?
Thank you for the tip !

0 Likes

#11

Yes, that’s an old unmaintained version. We are on Julia 1.1 / 1.2 now. Lots of things have changed, which is why the examples given here don’t work for you. As I said, I think upgrading should be your number one priority.

0 Likes

#12

I will do that right away.

0 Likes

#13

Actually, the program gets into some kind of infinite loop when I do that

julia> n = 10;

julia> file = “Data.txt”;

julia> x = open(readlines, `head -n$(n) $(file)’)

Here, the execution keeps running.

0 Likes

#14

Thanks it worked after I updated my Julia version.

1 Like

#15

Thanks it worked after I updated my Julia version. But the return type is Array{String,1}, not Array{Float64,2} as I had hoped.

0 Likes

#16

It’s because you’re using readlines which returns strings. As I stated above, personally I’m not a fan of mixing bash and Julia for things that can easily be done in just Julia, so I would do something like this:

julia> rows = 3;

julia> first_rows = Iterators.take(eachline("test.txt"), rows);

julia> data = reduce(vcat, map.(s -> parse(Float64, s), split.(first_rows, ' '))')
3×2 Array{Float64,2}:
 1.0  2.0
 3.0  4.0
 5.0  6.1
1 Like