Read vector from data file

Matt_jl · January 18, 2024, 9:46am

Hi
I’m currently trying to read a vector from a .dat file. the structure of the file is not complicated and I currently managed to read the rows/columns that I need by using the following code:

using DelimitedFiles
function get_data(path_to_file::String)
    data = readdlm(path_to_file)
    z = data[1145:1183, 1]  # first col
    pdz = data[1145:1183, 2]  # Second col
    return z, pdz
end

as you can see I need only some specific rows (contained by two text separators)

the file is structured as follows:

#... data percentile .....
0 1 2 3 4 5 6
(I don't need them, 1 row multiple columns)

# ... data 1...
    0.1000  1.825E-029
    0.3000  6.247E-016
    0.5000  3.227E-007
    0.7000  4.726E-008
    0.9000  3.678E-008
... (data that I need, multiple rows, 2 col)
...
#.... data 2 ....
(I don't need them)

... and so on

this work but I found my solution quite inelegant.
for reference in Python, using numpy this can be obtained with just 1 line of code:

z_p, pdz_p=np.genfromtxt(path_to_file,unpack=True,skip_header=1144,max_rows=40)

Is there any solution out there similar to python/numpy?
I haven’t benchmarked the codes but I bet my current implementation Is slower to numpy.

TheLateKronos · January 18, 2024, 10:22am

Have you tried CSV.jl? I believe it should be able to do what you want.

As a sidenote, it might be easier to clean up your data file and then read a nicely formatted file, than directly reading a file with messy formatting. It might require a temp-file, but given that you know the start row it seems like you only want to read a single file, so then that is no problem.

Also, do not worry about performance unless you have to. Premature optimization can take a lot of time and make the code less readable and/or less general. If the difference is 1 vs 5 seconds, as a one-time cost (or 1 vs 5 milliseconds more realistically), then there is little to actually be gained.

rafael.guerra · January 18, 2024, 10:57am

The easiest solution is perhaps given here.

Matt_jl · January 18, 2024, 10:57am

I’m looking with CSV.File but it doesn’t work as I expect,
I don’t know if I can unpack the results and for some reason it tries to generate the columns based on the first uncommented line of the file, not the ones I’m reading.

data = CSV.File(path_to_file, skipto=1144, limit=40, header=1144,ignorerepeated=true, comment="#")

rafael.guerra · January 18, 2024, 11:26am

Your MWE doesn’t seem to show a header. Try header=false

Matt_jl · January 18, 2024, 1:19pm

nope.
for some reasons it reads 19 columns, 2 of which are the 2 that I want but I dont’ get where is it getting the remaining columns

rafael.guerra · January 18, 2024, 1:22pm

Have you tried: delim=' ' ?

Matt_jl · January 18, 2024, 1:26pm

ok it works!
it was my error, i thought that the key ‘ignorerepeated = true’ was enough but I was wrong.
Thank you!
the final code:

data = CSV.File(data_file; skipto=1144,limit=40, comment="#", header=false,ignorerepeated=true, delim=' ')

now I wonder only if I can already write something like:

col1, col2 = CSV.File(.....)

rafael.guerra · January 18, 2024, 2:39pm

Using Tables.jl:

c1, c2 = CSV.File(...) |> Tables.columntable