Read a text file directly as vector of SVectors

Datseris · December 16, 2017, 9:52am

Here is my try:

function read_dataset(filename, V::Type{SVector{D, T}}, sep::Char = ',') where {D, T}

    data = SVector{D, T}[]
    open(filename) do io
        for ss in eachline(io)
            s = split(ss, sep)
            push!(data, V(ntuple(k -> parse(T, s[k]), Val(D))))
        end
    end
    return data
end

I feel like this is completely “wrong” though, because when I checked the source code for readdlm it seems super complicated and to understand it I would have to go very deep.

So my simple question is if there is already an existing method that loads a text file directly as a Vector{SVector} and not as a Matrix?

Notice : Reinterpret cannot work here because for 99.9% of datafiles I will need to load, the row axis is the “SVector” axis, and the column axis gives the discrete data points, e.g.:

-0.3999999999999999,0.3
1.076,-0.11999999999999997
-0.7408864000000001,0.32280000000000003
0.554322279213056,-0.22226592

etc. This means that I would first need to transpose the matrix, which does not really seem efficient?

mohamed82008 · December 16, 2017, 11:21am

Not sure if there is a faster way, but why not just read it to a vector, line by line, then reinterpret at the end? If u know the size of the vector, u can also use sizehint! to allocate once.

Datseris · December 16, 2017, 12:42pm

How can I tell the size of the vector I will need? Is there a function that tells me how many lines the file will have?
Isn’t the way reinterpret works changed in 0.7?

Tamas_Papp · December 16, 2017, 12:53pm

You have to read the file, eg with readline and just count, or read by byte and just count the \ns, adjusting for whether the last line has a terminating newline or not.

But a single pass with push! should be faster; you can always do a sizehint! at the end.

Datseris · December 16, 2017, 12:56pm

Okay I clearly do not get it, so please explain a bit more…

After I have finished reading each line, I will already have pushed all the data in the vector. Thus I will also have counted everything and already filled my vector.

What is the meaning of sizehint! then? It is completely meaningless, right? I truly don’t get how sizehint! can be used at “the end”.

mohamed82008 · December 16, 2017, 1:03pm

There is countlines but it will read the file anyways. If memory is not a problem, you can use readlines to read all lines at once in a string vector then call length. Benchmarking both would be interesting. Also if you have full control over the format, then encoding this information in the first line would probably be the fastest.

Not sure about the second question.

Tamas_Papp · December 16, 2017, 1:11pm

Sorry, I meant resize!. I am under the impression that it will free the extra storage, but I may be misreading the code.

Topic		Replies	Views
Quick way to reinterpret this? General Usage	2	185	April 2, 2024
Reading complex text files with vectors General Usage question , io	6	1055	August 25, 2021
Reinterpreting vectors into vector of SVectors? General Usage staticarrays , structarrays	5	319	February 7, 2024
Convert Vector{SVector} to Matrix Performance staticarrays	20	5519	March 6, 2023
Reinterpret SVector General Usage	2	769	July 27, 2017

Read a text file directly as vector of SVectors

Related topics