Creating DataFrame from vector of rows, and fixing type

I have some data in a Vector{Vector{SubString{String}}} that I wanted to turn into a DataFrame. While DataFrame constructor accepts a vector of vectors, it seems that it assumes the inner vectors are columns (which is sensible). To adapt it for my case, I landed on

DataFrame(stack(d; dims=1), columnnames)

(There doesn’t seem to be an Iterators version of stack afaict.)

  1. Is this a reasonable way to go about this, or is there a better/more ergonomic way to handle this?

After I called the above, I ended up with a DataFrame where each column is of type SubString. Some of my columns are actually integers, and some are floats (and only one is an actual String).

  1. What’s the best way to bring these columns to the appropriate type automatically? I vaguely remember a trick using the identity function, but not exactly how to use it, nor do I know if that’s the recommended path here. Actually, I believe that trick was for when missings are removed or the type otherwise needs narrowing, not for when it needs to be actually changed. So I guess the path here is to just parse the columns into the right type and replace them individually?

Waiting for a more specific and julianic solution
I want to imagine that the vector of vectors was obtained by splitting a text.
Then operating inversely …

using DataFrames,CSV
vov=[["a","1","2."],["b","3","4."],["c","5","6."]]
str=join(join.(vov,','),'\n')
write("df.txt",str)
CSV.read("df.txt", DataFrame,header=false)
julia> CSV.read("df.txt", DataFrame,header=false)
3Γ—3 DataFrame
 Row β”‚ Column1  Column2  Column3 
     β”‚ String1  Int64    Float64
─────┼───────────────────────────
   1 β”‚ a              1      2.0
   2 β”‚ b              3      4.0
   3 β”‚ c              5      6.0

to save disk space :).
But I don’t think it’s a good habit

julia> io=IOBuffer()
IOBuffer(data=UInt8[...], readable=true, writable=true, seekable=true, append=false, size=0, maxsize=Inf, ptr=1, mark=-1)

julia> write(io,str)
20


julia> CSV.read(io.data[1:io.size], DataFrame,header=false)
3Γ—3 DataFrame
 Row β”‚ Column1  Column2  Column3 
     β”‚ String1  Int64    Float64
─────┼───────────────────────────
   1 β”‚ a              1      2.0
   2 β”‚ b              3      4.0
   3 β”‚ c              5      6.0

If you are starting from strings, you will need to parse the data somehow. Here, is my take

julia> stuff = [["1", view("one", 1:3)], ["2", "two"], ["3", "three"]]
3-element Vector{Vector{AbstractString}}:
 ["1", "one"]
 ["2", "two"]
 ["3", "three"]

julia> spec = (x = Base.Fix1(tryparse, Int64), y = identity);

julia> Iterators.map(x -> NamedTuple{keys(spec)}(x .|> values(spec)), stuff) |> DataFrame
3Γ—2 DataFrame
 Row β”‚ x      y         
     β”‚ Int64  Abstract… 
─────┼──────────────────
   1 β”‚     1  one
   2 β”‚     2  two
   3 β”‚     3  three

Not claiming that this is efficient.

1 Like