IndexedTable to DataFrame

I am wondering how to convert an IndexedTable to a DataFrame. There are two posts that offer (partial) solutions to this, but I am not sure if that is (still) the best way.


For now, it is not clear to me when it is preferable to work with JuliaDB/IndexedTable and when it is preferable to work with a DataFrame. I am not sure if there is any guidance on this. As long as it fits in memory I can/should stay with a DataFrame? In my case, I certainly have more options/tools/experience with DataFrames.

This is actually incredibly simple now thanks to Tables.jl:

julia> t = table((x=1:5, y=randn(5)))
Table with 5 rows, 2 columns:
x  y
────────────
1  1.22615
2  0.688459
3  -0.999314
4  0.279394
5  0.192156

julia> DataFrame(t)
5×2 DataFrame
│ Row │ x     │ y         │
│     │ Int64 │ Float64   │
├─────┼───────┼───────────┤
│ 1   │ 1     │ 1.22615   │
│ 2   │ 2     │ 0.688459  │
│ 3   │ 3     │ -0.999314 │
│ 4   │ 4     │ 0.279394  │
│ 5   │ 5     │ 0.192156  │

Likewise, you can do table(df).

3 Likes

If you are working with a Distributed Table, you can convert it to a DataFrame like so:

df = DataFrame(collect(my_distributed_table))

What if I am working with a DNDSparse?

I have omitted some column names and types in this snippet:

dta=loadndsparse(fileNamesToBeLoaded,
header_exists=true,
chunks=1
,colparsers=thisTypeDictMod);

typeof(dta)

JuliaDB.DNDSparse{Tuple{Int64},NamedTuple{(:tip,:name3,:number1  ........),Tuple{UInt32,String, ..........}}}

I tried these three approaches and all fail

DataFrame(dta);
DataFrame(table(dta));
DataFrame(collect(dta));
julia> DataFrame(collect(dta));
ERROR: MethodError: no method matching Tables.Schema(::Type{Tuple{Int64,UInt32,String,String,String,Int64,Int64,Int64,Int64,Int64,Int64,Int64,String,Int64,Int64,String,Int64,Int64,String,Int64,Int64,String,String,String,String,String,String,String,Union{Missing, UInt16},Union{Missing, UInt16},String,String}})

DataFrame(dta);
ERROR: ArgumentError: no default `Tables.columns` implementation for type: JuliaDB.DNDSparse{Tuple{Int64},NamedTuple{(:tipNr,.....