Converting CSV to Parquet in Julia

Seems to be confused about disk storage format and Julia DataFrames which are in memory? You can save the in memory dataframe in a number of different formats like parquet CSV. But to dispaly you need to load into RAM and likely in the format of DataFrames?

Another approach to converting a CSV in Parquet file would be like this.

using CSV, DataFrames, Arrow

df = CSV.read("FILE_PATH", DataFrame)

prq = Arrow.write("newfile.parquet", df)
1 Like

With TidierFiles.jl there are a couple ways you can do it

using TidierFiles
mtcars_path = "https://gist.githubusercontent.com/seankross/a412dfbd88b3db70b74b/raw/5f23f993cd87c283ce766e7ac6b329ee7cc2e1d1/mtcars.csv"
destination_path = "/path/to/output/test_car2.parquet"

# saves an intermediate
df = read_csv(mtcars_path)
write_parquet(df, destination_path)

# no intermediate saved
write_parquet(read_csv(mtcars_path), destination_path)

With Tidier, for example, @chain is reexported, allowing you do to the following as well. Once again, without saving to a local intermediate.

using Tidier
@chain begin
    read_csv(mtcars_path)
    write_parquet(_, destination_path)
end

test your results

read_csv(destination_path) #errors
read_parquet(destination_path) #succeeds
2 Likes