oo92
March 16, 2021, 7:13pm
1
Hi.
I have a simple dataframe that I want to convert to parquet. This is my attempt:
begin
df = CSV.read("/home/onur/julia-assignment/temp.csv", DataFrame)
prq = Parquet.File(df)
end
But this is the error I’m getting:
MethodError: no method matching Parquet.File(::DataFrames.DataFrame)
Closest candidates are:
Parquet.File(::Any, !Matched::Any, !Matched::Any, !Matched::Any, !Matched::Any) at /home/onur/.julia/packages/Parquet/h8mm5/src/reader.jl:54
Parquet.File(!Matched::String, !Matched::IOStream, !Matched::Parquet.PAR2.FileMetaData, !Matched::Parquet.Schema, !Matched::Parquet.PageLRU) at /home/onur/.julia/packages/Parquet/h8mm5/src/reader.jl:54
Parquet.File(!Matched::AbstractString; map_logical_types) at /home/onur/.julia/packages/Parquet/h8mm5/src/reader.jl:61
How can I open a CSV file as Parquet?
Judging by the Parquet.jl README it should be
using CSV, Parquet
df = CSV.read("/home/onur/julia-assignment/temp.csv", DataFrame)
file = tempname() * ".parquet"
write_parquet(file, df)
or any other filename of your choice.
1 Like
oo92
March 16, 2021, 7:35pm
3
And how can I view the file in Pluto like I would with Pandas in Jupyter?
I am not sure I understand the question, sorry. Parquet
, CSV
, Arrow
and so on, are just storage formats. I suppose it is possible to do something with the data representation, but this is usually something low level. It’s not the way how usually people are working with data. Roughly speaking, common way is to store in one or another format, then load it to memory and transform to a representation which is more suitable for data manipulation. After everything is done you store it again in necessary format.
Data representation which is convenient for various manipulations is pandas in python, dataframes in R, and DataFrame in Julia. But you already did it on the first step, when you loaded data from the CSV.
1 Like
oo92
March 16, 2021, 7:40pm
5
Can I view the file in Pluto? Is there a way to do that? That’s what I am curious about.
What is “viewing the file”? You can get it binary presentation with read(file)
. Or, you can load it with DataFrame(read_parquet(path))
but that should give you more or less the same DataFrame
that you get on CSV.read
step.
oo92
March 16, 2021, 7:44pm
7
Can I recreate the CSV file as Parquet in my working directory?
oo92
March 16, 2021, 8:02pm
8
This is the error I got
MethodError: no method matching read_parquet(::String, ::DataFrames.DataFrame)
Just put the name of the DataFrame into a Pluto cell to view it in Pluto:
df
1 Like
oo92
March 16, 2021, 8:16pm
10
Yea but how can I confirm if the output of df
is now Parquet and not CSV, as it used to be?
Viewing the DataFrame and writing to disk are completely separate topics.
CSV and Parquet are disk formats, DataFrames are in memory.
oo92
March 16, 2021, 8:22pm
12
Can I write this CSV file also as a Parquet file to my working directory? If so, how can I do that?
Have you tried the method described by @Skoffer ?
oo92
March 16, 2021, 8:25pm
14
Yea. I don’t see a parquet file in my current directory.
Just change file
definition to
file = "/home/onur/julia-assignment/temp.parquet"
oo92
March 16, 2021, 8:33pm
16
Wait. Just changing the file extension automatically converts to parquet?
Obviously not. Changing directory from /tmp
(as it is produced by tempfile
) to `/home/onur/julia-assignment’ changes location of the resulting file.
oo92
March 16, 2021, 8:36pm
18
Skoffer:
parquet
I get this
ArgumentError: "/home/onur/julia-assignment/temp.parquet" is not a valid file
using CSV, Parquet
df = CSV.read("/home/onur/julia-assignment/temp.csv", DataFrame)
file = "/home/onur/julia-assignment/temp.parquet"
write_parquet(file, df)
Which line exactly giving you this error? Can you show the complete output?
1 Like
oo92
March 16, 2021, 8:39pm
20
Nvm. I messed up on this line. It was my mistake. Thank you very much.
1 Like