Export csv - CSV.jl and CSVFiles do not help

Hi,

I have a 90000 x 414 DataFrame I need to export to Stats. I tried saving it as csv using CSV and CSVFiles but Julia gets stuck and I have to stop it. This happens even if I try with many fewer rows.

Any advice? Including other formats that stata can read.

thanks

Is there any way you could share the file? I’m happy to help take a look for CSV.jl (I’m the primary author). Also, are you running the latest versions?

3 Likes

Ditto with respect to CSVFiles.jl, happy to look into this if we can find some way to replicate it. Maybe you could save the data as a feather file and post that somewhere?

I just tried to save a generic 90000x414 DataFrame with random Float64s with CSVFiles.jl, and that worked. It took 10s, on my machine (which has a fast processor and a fast SSD).

1 Like

Unfortunately there is data I cannot share. Any idea about what I can do? The package works for very small files. And it is the latest version. When I used it on Julia 0.6 a couple of months ago, it was working with a slightly smaller DataFrame. Now even that DataFrame would not work.

Here is what is shows when I do head(df). It looks weird. Does it help?
.

Could you show us the output of running typeof.(DataFrames.columns(df))? The screenshot you are showing there looks weird and makes me wonder whether some unusual type is used for the column elements.

DataType[240]
Vector{Union{Missing, Float32}}
Vector{Union{Missing, Float32}}
Vector{Union{Missing, Float32}}
Vector{Union{Missing, Float32}}
Vector{Union{Missing, Float32}}
Vector{Union{Missing, Float32}}
Vector{Union{Missing, Float32}}
Vector{Union{Missing, Float32}}
Vector{Union{Missing, Float32}}
Vector{Union{Missing, Float32}}

Vector{Union{Missing, Float32}}
Vector{Union{Missing, Float32}}
Vector{Union{Missing, Float32}}
Vector{Union{Missing, Float32}}
Vector{Int64}
Vector{Int64}
Vector{Int64}
Vector{Int64}
Vector{Int64}
Vector{Int64}

I don’t know what those weird vectors are. I will look into them.

Perhaps better would be to give a list of column types without duplicates in case column 146 is the problem, in which case it’s going to be missed by the output above.

I just got confused by the [90m parts in the screenshot. Those seem to appear in the first columns already, so I think this is maybe a wrong lead to figure out what is going on…

The [90m parts are due to incorrect handling of colors by the terminal. The data looks OK.