Creating DataFrame from vector of rows, and fixing type

digital_carver · March 5, 2025, 6:01pm

I have some data in a Vector{Vector{SubString{String}}} that I wanted to turn into a DataFrame. While DataFrame constructor accepts a vector of vectors, it seems that it assumes the inner vectors are columns (which is sensible). To adapt it for my case, I landed on

DataFrame(stack(d; dims=1), columnnames)

(There doesn’t seem to be an Iterators version of stack afaict.)

Is this a reasonable way to go about this, or is there a better/more ergonomic way to handle this?

After I called the above, I ended up with a DataFrame where each column is of type SubString. Some of my columns are actually integers, and some are floats (and only one is an actual String).

What’s the best way to bring these columns to the appropriate type automatically? ~~I vaguely remember a trick using the identity function, but not exactly how to use it, nor do I know if that’s the recommended path here.~~ Actually, I believe that trick was for when missings are removed or the type otherwise needs narrowing, not for when it needs to be actually changed. So I guess the path here is to just parse the columns into the right type and replace them individually?

rocco_sprmnt21 · March 6, 2025, 6:25pm

Waiting for a more specific and julianic solution
I want to imagine that the vector of vectors was obtained by splitting a text.
Then operating inversely …

using DataFrames,CSV
vov=[["a","1","2."],["b","3","4."],["c","5","6."]]
str=join(join.(vov,','),'\n')
write("df.txt",str)
CSV.read("df.txt", DataFrame,header=false)

julia> CSV.read("df.txt", DataFrame,header=false)
3×3 DataFrame
 Row │ Column1  Column2  Column3 
     │ String1  Int64    Float64
─────┼───────────────────────────
   1 │ a              1      2.0
   2 │ b              3      4.0
   3 │ c              5      6.0

to save disk space :).
But I don’t think it’s a good habit

julia> io=IOBuffer()
IOBuffer(data=UInt8[...], readable=true, writable=true, seekable=true, append=false, size=0, maxsize=Inf, ptr=1, mark=-1)

julia> write(io,str)
20


julia> CSV.read(io.data[1:io.size], DataFrame,header=false)
3×3 DataFrame
 Row │ Column1  Column2  Column3 
     │ String1  Int64    Float64
─────┼───────────────────────────
   1 │ a              1      2.0
   2 │ b              3      4.0
   3 │ c              5      6.0

bertschi · March 6, 2025, 7:38pm

If you are starting from strings, you will need to parse the data somehow. Here, is my take

julia> stuff = [["1", view("one", 1:3)], ["2", "two"], ["3", "three"]]
3-element Vector{Vector{AbstractString}}:
 ["1", "one"]
 ["2", "two"]
 ["3", "three"]

julia> spec = (x = Base.Fix1(tryparse, Int64), y = identity);

julia> Iterators.map(x -> NamedTuple{keys(spec)}(x .|> values(spec)), stuff) |> DataFrame
3×2 DataFrame
 Row │ x      y         
     │ Int64  Abstract… 
─────┼──────────────────
   1 │     1  one
   2 │     2  two
   3 │     3  three

Not claiming that this is efficient.

Topic		Replies	Views
DataFrames/CSV: how to read vectors from *.csv? General Usage	9	2808	March 26, 2021
Construct Julia Dataframe from row data New to Julia question , dataframes , data_structures	11	6169	March 21, 2020
Type conversion driving me crazy Data dataframes	10	1130	September 14, 2022
Transforming DataFrame from column of vector General Usage dataframes	3	244	December 1, 2022
Cleanest way to convert DataFrame row into a Vector? Data question , dataframes , vector	20	13864	February 24, 2023

Creating DataFrame from vector of rows, and fixing type

Related topics