I have some data in a Vector{Vector{SubString{String}}}
that I wanted to turn into a DataFrame. While DataFrame
constructor accepts a vector of vectors, it seems that it assumes the inner vectors are columns (which is sensible). To adapt it for my case, I landed on
DataFrame(stack(d; dims=1), columnnames)
(There doesnβt seem to be an Iterators
version of stack
afaict.)
- Is this a reasonable way to go about this, or is there a better/more ergonomic way to handle this?
After I called the above, I ended up with a DataFrame where each column is of type SubString
. Some of my columns are actually integers, and some are floats (and only one is an actual String).
- Whatβs the best way to bring these columns to the appropriate type automatically?
I vaguely remember a trick using the identity
function, but not exactly how to use it, nor do I know if thatβs the recommended path here. Actually, I believe that trick was for when missing
s are removed or the type otherwise needs narrowing, not for when it needs to be actually changed. So I guess the path here is to just parse the columns into the right type and replace them individually?
Waiting for a more specific and julianic solution
I want to imagine that the vector of vectors was obtained by splitting a text.
Then operating inversely β¦
using DataFrames,CSV
vov=[["a","1","2."],["b","3","4."],["c","5","6."]]
str=join(join.(vov,','),'\n')
write("df.txt",str)
CSV.read("df.txt", DataFrame,header=false)
julia> CSV.read("df.txt", DataFrame,header=false)
3Γ3 DataFrame
Row β Column1 Column2 Column3
β String1 Int64 Float64
ββββββΌβββββββββββββββββββββββββββ
1 β a 1 2.0
2 β b 3 4.0
3 β c 5 6.0
to save disk space :).
But I donβt think itβs a good habit
julia> io=IOBuffer()
IOBuffer(data=UInt8[...], readable=true, writable=true, seekable=true, append=false, size=0, maxsize=Inf, ptr=1, mark=-1)
julia> write(io,str)
20
julia> CSV.read(io.data[1:io.size], DataFrame,header=false)
3Γ3 DataFrame
Row β Column1 Column2 Column3
β String1 Int64 Float64
ββββββΌβββββββββββββββββββββββββββ
1 β a 1 2.0
2 β b 3 4.0
3 β c 5 6.0
If you are starting from strings, you will need to parse the data somehow. Here, is my take
julia> stuff = [["1", view("one", 1:3)], ["2", "two"], ["3", "three"]]
3-element Vector{Vector{AbstractString}}:
["1", "one"]
["2", "two"]
["3", "three"]
julia> spec = (x = Base.Fix1(tryparse, Int64), y = identity);
julia> Iterators.map(x -> NamedTuple{keys(spec)}(x .|> values(spec)), stuff) |> DataFrame
3Γ2 DataFrame
Row β x y
β Int64 Abstractβ¦
ββββββΌββββββββββββββββββ
1 β 1 one
2 β 2 two
3 β 3 three
Not claiming that this is efficient.
1 Like