I have some data in a Vector{Vector{SubString{String}}} that I wanted to turn into a DataFrame. While DataFrame constructor accepts a vector of vectors, it seems that it assumes the inner vectors are columns (which is sensible). To adapt it for my case, I landed on
DataFrame(stack(d; dims=1), columnnames)
(There doesnβt seem to be an Iterators version of stack afaict.)
- Is this a reasonable way to go about this, or is there a better/more ergonomic way to handle this?
 
After I called the above, I ended up with a DataFrame where each column is of type SubString. Some of my columns are actually integers, and some are floats (and only one is an actual String).
- Whatβs the best way to bring these columns to the appropriate type automatically? 
I vaguely remember  a trick using the identity function, but not exactly how to use it, nor do I know if thatβs the recommended path here. Actually, I believe that trick was for when missings are removed or the type otherwise needs narrowing, not for when it needs to be actually changed. So I guess the path here is to just parse the columns into the right type and replace them individually? 
             
            
              
              
              
            
            
           
          
            
            
              Waiting for a more specific and julianic solution
I want to imagine that the vector of vectors was obtained by splitting a text.
Then operating inversely β¦
using DataFrames,CSV
vov=[["a","1","2."],["b","3","4."],["c","5","6."]]
str=join(join.(vov,','),'\n')
write("df.txt",str)
CSV.read("df.txt", DataFrame,header=false)
julia> CSV.read("df.txt", DataFrame,header=false)
3Γ3 DataFrame
 Row β Column1  Column2  Column3 
     β String1  Int64    Float64
ββββββΌβββββββββββββββββββββββββββ
   1 β a              1      2.0
   2 β b              3      4.0
   3 β c              5      6.0
to save disk space :).
But I donβt think itβs a good habit
julia> io=IOBuffer()
IOBuffer(data=UInt8[...], readable=true, writable=true, seekable=true, append=false, size=0, maxsize=Inf, ptr=1, mark=-1)
julia> write(io,str)
20
julia> CSV.read(io.data[1:io.size], DataFrame,header=false)
3Γ3 DataFrame
 Row β Column1  Column2  Column3 
     β String1  Int64    Float64
ββββββΌβββββββββββββββββββββββββββ
   1 β a              1      2.0
   2 β b              3      4.0
   3 β c              5      6.0
             
            
              
              
              
            
            
           
          
            
            
              If you are starting from strings, you will need to parse the data somehow. Here, is my take
julia> stuff = [["1", view("one", 1:3)], ["2", "two"], ["3", "three"]]
3-element Vector{Vector{AbstractString}}:
 ["1", "one"]
 ["2", "two"]
 ["3", "three"]
julia> spec = (x = Base.Fix1(tryparse, Int64), y = identity);
julia> Iterators.map(x -> NamedTuple{keys(spec)}(x .|> values(spec)), stuff) |> DataFrame
3Γ2 DataFrame
 Row β x      y         
     β Int64  Abstractβ¦ 
ββββββΌββββββββββββββββββ
   1 β     1  one
   2 β     2  two
   3 β     3  three
Not claiming that this is efficient.
             
            
              
              
              1 Like