Here is my try:
function read_dataset(filename, V::Type{SVector{D, T}}, sep::Char = ',') where {D, T}
data = SVector{D, T}[]
open(filename) do io
for ss in eachline(io)
s = split(ss, sep)
push!(data, V(ntuple(k -> parse(T, s[k]), Val(D))))
end
end
return data
end
I feel like this is completely “wrong” though, because when I checked the source code for readdlm
it seems super complicated and to understand it I would have to go very deep.
So my simple question is if there is already an existing method that loads a text file directly as a Vector{SVector}
and not as a Matrix?
Notice : Reinterpret cannot work here because for 99.9% of datafiles I will need to load, the row axis is the “SVector” axis, and the column axis gives the discrete data points, e.g.:
-0.3999999999999999,0.3
1.076,-0.11999999999999997
-0.7408864000000001,0.32280000000000003
0.554322279213056,-0.22226592
etc. This means that I would first need to transpose
the matrix, which does not really seem efficient?
Not sure if there is a faster way, but why not just read it to a vector, line by line, then reinterpret
at the end? If u know the size of the vector, u can also use sizehint!
to allocate once.
You have to read the file, eg with readline
and just count, or read by byte and just count the \n
s, adjusting for whether the last line has a terminating newline or not.
But a single pass with push!
should be faster; you can always do a sizehint!
at the end.
Okay I clearly do not get it, so please explain a bit more…
After I have finished reading each line, I will already have pushed all the data in the vector. Thus I will also have counted everything and already filled my vector.
What is the meaning of sizehint!
then? It is completely meaningless, right? I truly don’t get how sizehint!
can be used at “the end”.
There is countlines
but it will read the file anyways. If memory is not a problem, you can use readlines
to read all lines at once in a string vector then call length
. Benchmarking both would be interesting. Also if you have full control over the format, then encoding this information in the first line would probably be the fastest.
Not sure about the second question.
Sorry, I meant resize!
. I am under the impression that it will free the extra storage, but I may be misreading the code.