Hi,
I have a small csv file with 8 columns and 48 rows. The first two columns are strings and the remaining columns are floats.
My goal: For each row, extract the first two strings and parse the remaining floats into a vector and create an instance of a custom type
MyType(s1::String,s2::String,v::Vector{Float64}
At first, it seemed like a natural candidate for CSV and Query. I know my coding ability sucks, but I can almost write it by hand faster than my code is reading that small file. It is taking 6 seconds to read and parse using CSV.read and Query. Digging around, I found a comment that DataFrames are slow if you are manipulating rows, which is what I was doing.
I then used DataFrames.stack
to turn my rows into columns, but that was even slower (~8 seconds).
My latest attempt is to use Base.readcsv. This reads my CSV into a 2d array, which seems like something I can work with, but when I access on element of the first two columns, the type is SubString{String}
. I can’t seem to be able to find any documentation for SubString
and I need a String. How can I get a String
from a SubString{String}
?
Any ideas would be appreciated. Thanks
PS: I know the first time to run any function is slow, but I will only ever run this function once since I only need to load the data to memory once.