Converting CSV to Parquet in Julia

xiaodai · March 17, 2021, 5:01am

Seems to be confused about disk storage format and Julia DataFrames which are in memory? You can save the in memory dataframe in a number of different formats like parquet CSV. But to dispaly you need to load into RAM and likely in the format of DataFrames?

Andre_G · August 10, 2024, 6:17pm

Another approach to converting a CSV in Parquet file would be like this.

using CSV, DataFrames, Arrow

df = CSV.read("FILE_PATH", DataFrame)

prq = Arrow.write("newfile.parquet", df)

drizk1 · August 11, 2024, 1:08pm

With TidierFiles.jl there are a couple ways you can do it

using TidierFiles
mtcars_path = "https://gist.githubusercontent.com/seankross/a412dfbd88b3db70b74b/raw/5f23f993cd87c283ce766e7ac6b329ee7cc2e1d1/mtcars.csv"
destination_path = "/path/to/output/test_car2.parquet"

# saves an intermediate
df = read_csv(mtcars_path)
write_parquet(df, destination_path)

# no intermediate saved
write_parquet(read_csv(mtcars_path), destination_path)

With Tidier, for example, @chain is reexported, allowing you do to the following as well. Once again, without saving to a local intermediate.

using Tidier
@chain begin
    read_csv(mtcars_path)
    write_parquet(_, destination_path)
end

test your results

read_csv(destination_path) #errors
read_parquet(destination_path) #succeeds

Topic		Replies	Views
Displaying a parquet file in Arrow New to Julia dataframes , parquet , arrow	7	1549	March 17, 2021
File IO - Parquet File Reader Data	4	1198	October 30, 2018
Neither Parquet.jl nor Parquet2.jl can read my .parquet file Data	7	843	August 31, 2022
Unable to write DataFrame to Parquet or Arrow? Data question	7	607	July 27, 2021
Trying to read parquet file that is using Zstd codec General Usage question , dataframes , zstd , parquet	6	1704	May 6, 2020

Converting CSV to Parquet in Julia

Related topics