I would to transfer information from a Matlab table to Julia. Julia does not directly import MAT tables, so I exported to CSV. The final column of my Matlab table is a column of vectors of 100000 data points each. Exporting to CSV, this last column becomes one column for each value in the vector. So when I import it into Julia, it still has a column for each data point in the vector. Can I have a column of vectors in a DataFrame? If so, how do I initialize it, and what is the best way to bring the data from 100000 columns into one column?
If a DataFrame can not have vectors, is there a better way to bring the 100000 columns into vectors for each row than nested for loops?
Thanks!
1 Like
Have you checked out MAT.jl?
Yes. It does not yet support Matlab tables
1 Like
Yes! This works just like you might hope:
julia> using DataFrames
julia> df = DataFrame(A = [1, 2, 3], B = [[1, 2, 3], [4, 5, 6], [7, 8, 9]])
3Γ2 DataFrame
Row β A B
β Int64 Arrayβ¦
ββββββΌββββββββββββββββββ
1 β 1 [1, 2, 3]
2 β 2 [4, 5, 6]
3 β 3 [7, 8, 9]
You can also create an empty DataFrame
with a column whose elements are themselves vectors, and you can push new rows to that data frame:
julia> df = DataFrame(A = Vector{Int}(), B = Vector{Vector{Int}}())
0Γ2 DataFrame
julia> push!(df, (1, [1, 2, 3]))
1Γ2 DataFrame
Row β A B
β Int64 Arrayβ¦
ββββββΌββββββββββββββββββ
1 β 1 [1, 2, 3]
As for handling your CSV import, I donβt know of a clever way (hopefully someone else here does), but bear in mind that loops in Julia are fast, so if you can solve your problem with a loop thatβs often the fastest way to do it anyway.
5 Likes
Just in case it might interest, the code below takes the following CSV input example with a data vector:
Name,Year,DataVector
Baseline,1999,-3.1,0,1.5
Monitor1,2000,-1,-2,-3
Monitor2,2001,0,1.2,2
using DelimitedFiles, DataFrames
f = readdlm("CSV_arrays.csv", ',')
N = 2; # number of columns before data vector
Nr, Nc = size(f)
df1 = DataFrame(view(f,2:Nr,1:N), Symbol.(f[1,1:N]))
df2 = DataFrame(DataVector = [Float64.(view(f,i,N+1:Nc)) for i in 2:Nr])
df = hcat(df1, df2, makeunique=true)
df[!, :Name] = convert.(String, df[:, :Name])
df[!, :Year] = convert.(Int, df[:, :Year])
to produce:
julia> df
3Γ3 DataFrame
Row β Name Year DataVector
β String Int64 Arrayβ¦
ββββββΌβββββββββββββββββββββββββββββββββββββ
1 β Baseline 1999 [-3.1, 0.0, 1.5]
2 β Monitor1 2000 [-1.0, -2.0, -3.0]
3 β Monitor2 2001 [0.0, 1.2, 2.0]
2 Likes