I would to transfer information from a Matlab table to Julia. Julia does not directly import MAT tables, so I exported to CSV. The final column of my Matlab table is a column of vectors of 100000 data points each. Exporting to CSV, this last column becomes one column for each value in the vector. So when I import it into Julia, it still has a column for each data point in the vector. Can I have a column of vectors in a DataFrame? If so, how do I initialize it, and what is the best way to bring the data from 100000 columns into one column?
If a DataFrame can not have vectors, is there a better way to bring the 100000 columns into vectors for each row than nested for loops?
Thanks!
Have you checked out MAT.jl?
Yes. It does not yet support Matlab tables
Yes! This works just like you might hope:
julia> using DataFrames
julia> df = DataFrame(A = [1, 2, 3], B = [[1, 2, 3], [4, 5, 6], [7, 8, 9]])
3Γ2 DataFrame
Row β A B
β Int64 Arrayβ¦
ββββββΌββββββββββββββββββ
1 β 1 [1, 2, 3]
2 β 2 [4, 5, 6]
3 β 3 [7, 8, 9]
You can also create an empty DataFrame with a column whose elements are themselves vectors, and you can push new rows to that data frame:
julia> df = DataFrame(A = Vector{Int}(), B = Vector{Vector{Int}}())
0Γ2 DataFrame
julia> push!(df, (1, [1, 2, 3]))
1Γ2 DataFrame
Row β A B
β Int64 Arrayβ¦
ββββββΌββββββββββββββββββ
1 β 1 [1, 2, 3]
As for handling your CSV import, I donβt know of a clever way (hopefully someone else here does), but bear in mind that loops in Julia are fast, so if you can solve your problem with a loop thatβs often the fastest way to do it anyway.
Just in case it might interest, the code below takes the following CSV input example with a data vector:
Name,Year,DataVector
Baseline,1999,-3.1,0,1.5
Monitor1,2000,-1,-2,-3
Monitor2,2001,0,1.2,2
using DelimitedFiles, DataFrames
f = readdlm("CSV_arrays.csv", ',')
N = 2; # number of columns before data vector
Nr, Nc = size(f)
df1 = DataFrame(view(f,2:Nr,1:N), Symbol.(f[1,1:N]))
df2 = DataFrame(DataVector = [Float64.(view(f,i,N+1:Nc)) for i in 2:Nr])
df = hcat(df1, df2, makeunique=true)
df[!, :Name] = convert.(String, df[:, :Name])
df[!, :Year] = convert.(Int, df[:, :Year])
to produce:
julia> df
3Γ3 DataFrame
Row β Name Year DataVector
β String Int64 Arrayβ¦
ββββββΌβββββββββββββββββββββββββββββββββββββ
1 β Baseline 1999 [-3.1, 0.0, 1.5]
2 β Monitor1 2000 [-1.0, -2.0, -3.0]
3 β Monitor2 2001 [0.0, 1.2, 2.0]
