# Can I have vectors in DataFrame cells?

I would to transfer information from a Matlab table to Julia. Julia does not directly import MAT tables, so I exported to CSV. The final column of my Matlab table is a column of vectors of 100000 data points each. Exporting to CSV, this last column becomes one column for each value in the vector. So when I import it into Julia, it still has a column for each data point in the vector. Can I have a column of vectors in a DataFrame? If so, how do I initialize it, and what is the best way to bring the data from 100000 columns into one column?
If a DataFrame can not have vectors, is there a better way to bring the 100000 columns into vectors for each row than nested for loops?
Thanks!

1 Like

Have you checked out MAT.jl?

Yes. It does not yet support Matlab tables

1 Like

Yes! This works just like you might hope:

``````julia> using DataFrames

julia> df = DataFrame(A = [1, 2, 3], B = [[1, 2, 3], [4, 5, 6], [7, 8, 9]])
3Γ2 DataFrame
Row β A      B
β Int64  Arrayβ¦
ββββββΌββββββββββββββββββ
1 β     1  [1, 2, 3]
2 β     2  [4, 5, 6]
3 β     3  [7, 8, 9]
``````

You can also create an empty `DataFrame` with a column whose elements are themselves vectors, and you can push new rows to that data frame:

``````julia> df = DataFrame(A = Vector{Int}(), B = Vector{Vector{Int}}())
0Γ2 DataFrame

julia> push!(df, (1, [1, 2, 3]))
1Γ2 DataFrame
Row β A      B
β Int64  Arrayβ¦
ββββββΌββββββββββββββββββ
1 β     1  [1, 2, 3]
``````

As for handling your CSV import, I donβt know of a clever way (hopefully someone else here does), but bear in mind that loops in Julia are fast, so if you can solve your problem with a loop thatβs often the fastest way to do it anyway.

4 Likes

Just in case it might interest, the code below takes the following CSV input example with a data vector:

``````Name,Year,DataVector
Baseline,1999,-3.1,0,1.5
Monitor1,2000,-1,-2,-3
Monitor2,2001,0,1.2,2
``````
``````using DelimitedFiles, DataFrames

N = 2;  # number of columns before data vector
Nr, Nc = size(f)
df1 = DataFrame(view(f,2:Nr,1:N), Symbol.(f[1,1:N]))
df2 = DataFrame(DataVector = [Float64.(view(f,i,N+1:Nc)) for i in 2:Nr])
df = hcat(df1, df2, makeunique=true)
df[!, :Name] = convert.(String, df[:, :Name])
df[!, :Year] = convert.(Int, df[:, :Year])
``````

to produce:

``````julia> df
3Γ3 DataFrame
Row β Name      Year   DataVector
β String    Int64  Arrayβ¦
ββββββΌβββββββββββββββββββββββββββββββββββββ
1 β Baseline   1999  [-3.1, 0.0, 1.5]
2 β Monitor1   2000  [-1.0, -2.0, -3.0]
3 β Monitor2   2001  [0.0, 1.2, 2.0]
``````
2 Likes