Can I have vectors in DataFrame cells?

I would to transfer information from a Matlab table to Julia. Julia does not directly import MAT tables, so I exported to CSV. The final column of my Matlab table is a column of vectors of 100000 data points each. Exporting to CSV, this last column becomes one column for each value in the vector. So when I import it into Julia, it still has a column for each data point in the vector. Can I have a column of vectors in a DataFrame? If so, how do I initialize it, and what is the best way to bring the data from 100000 columns into one column?
If a DataFrame can not have vectors, is there a better way to bring the 100000 columns into vectors for each row than nested for loops?

1 Like

Have you checked out MAT.jl?

Yes. It does not yet support Matlab tables

1 Like

Yes! This works just like you might hope:

julia> using DataFrames

julia> df = DataFrame(A = [1, 2, 3], B = [[1, 2, 3], [4, 5, 6], [7, 8, 9]])
3Γ—2 DataFrame
 Row β”‚ A      B         
     β”‚ Int64  Array…    
   1 β”‚     1  [1, 2, 3]
   2 β”‚     2  [4, 5, 6]
   3 β”‚     3  [7, 8, 9]

You can also create an empty DataFrame with a column whose elements are themselves vectors, and you can push new rows to that data frame:

julia> df = DataFrame(A = Vector{Int}(), B = Vector{Vector{Int}}())
0Γ—2 DataFrame

julia> push!(df, (1, [1, 2, 3]))
1Γ—2 DataFrame
 Row β”‚ A      B         
     β”‚ Int64  Array…    
   1 β”‚     1  [1, 2, 3]

As for handling your CSV import, I don’t know of a clever way (hopefully someone else here does), but bear in mind that loops in Julia are fast, so if you can solve your problem with a loop that’s often the fastest way to do it anyway.


Just in case it might interest, the code below takes the following CSV input example with a data vector:

using DelimitedFiles, DataFrames

f = readdlm("CSV_arrays.csv", ',')
N = 2;  # number of columns before data vector
Nr, Nc = size(f)
df1 = DataFrame(view(f,2:Nr,1:N), Symbol.(f[1,1:N]))
df2 = DataFrame(DataVector = [Float64.(view(f,i,N+1:Nc)) for i in 2:Nr])
df = hcat(df1, df2, makeunique=true)
df[!, :Name] = convert.(String, df[:, :Name])
df[!, :Year] = convert.(Int, df[:, :Year])

to produce:

julia> df
3Γ—3 DataFrame
 Row β”‚ Name      Year   DataVector
     β”‚ String    Int64  Array…
   1 β”‚ Baseline   1999  [-3.1, 0.0, 1.5]
   2 β”‚ Monitor1   2000  [-1.0, -2.0, -3.0]
   3 β”‚ Monitor2   2001  [0.0, 1.2, 2.0]