oo92  
                
               
                 
              
                  
                    March 16, 2021,  7:13pm
                   
                   
              1 
               
             
            
              Hi.
I have a simple dataframe that I want to convert to parquet. This is my attempt:
begin
	df = CSV.read("/home/onur/julia-assignment/temp.csv", DataFrame)
	prq = Parquet.File(df)
end
 
But this is the error I’m getting:
MethodError: no method matching Parquet.File(::DataFrames.DataFrame)
Closest candidates are:
Parquet.File(::Any, !Matched::Any, !Matched::Any, !Matched::Any, !Matched::Any) at /home/onur/.julia/packages/Parquet/h8mm5/src/reader.jl:54
Parquet.File(!Matched::String, !Matched::IOStream, !Matched::Parquet.PAR2.FileMetaData, !Matched::Parquet.Schema, !Matched::Parquet.PageLRU) at /home/onur/.julia/packages/Parquet/h8mm5/src/reader.jl:54
Parquet.File(!Matched::AbstractString; map_logical_types) at /home/onur/.julia/packages/Parquet/h8mm5/src/reader.jl:61
 
How can I open a CSV file as Parquet?
             
            
               
               
               
            
            
           
          
            
            
              Judging by the Parquet.jl README  it should be
using CSV, Parquet
df = CSV.read("/home/onur/julia-assignment/temp.csv", DataFrame)
file = tempname() * ".parquet"
write_parquet(file, df)
 
or any other filename of your choice.
             
            
               
               
              1 Like 
            
            
           
          
            
              
                oo92  
                
               
              
                  
                    March 16, 2021,  7:35pm
                   
                   
              3 
               
             
            
              And how can I view the file in Pluto like I would with Pandas in Jupyter?
             
            
               
               
               
            
            
           
          
            
            
              I am not sure I understand the question, sorry. Parquet, CSV, Arrow and so on, are just storage formats. I suppose it is possible to do something with the data representation, but this is usually something low level. It’s not the way how usually people are working with data. Roughly speaking, common way is to store in one or another format, then load it to memory and transform to a representation which is more suitable for data manipulation. After everything is done you store it again in necessary format.
Data representation which is convenient for various manipulations is pandas in python, dataframes in R, and DataFrame in Julia. But you already did it on the first step, when you loaded data from the CSV.
             
            
               
               
              1 Like 
            
            
           
          
            
              
                oo92  
                
               
              
                  
                    March 16, 2021,  7:40pm
                   
                   
              5 
               
             
            
              Can I view the file in Pluto? Is there a way to do that? That’s what I am curious about.
             
            
               
               
               
            
            
           
          
            
            
              What is “viewing the file”? You can get it binary presentation with read(file). Or, you can load it with DataFrame(read_parquet(path)) but that should give you more or less the same DataFrame that you get on CSV.read step.
             
            
               
               
               
            
            
           
          
            
              
                oo92  
                
               
              
                  
                    March 16, 2021,  7:44pm
                   
                   
              7 
               
             
            
              Can I recreate the CSV file as Parquet in my working directory?
             
            
               
               
               
            
            
           
          
            
              
                oo92  
                
               
              
                  
                    March 16, 2021,  8:02pm
                   
                   
              8 
               
             
            
              This is the error I got
MethodError: no method matching read_parquet(::String, ::DataFrames.DataFrame)
 
             
            
               
               
               
            
            
           
          
            
            
              Just put the name of the DataFrame into a Pluto cell to view it in Pluto:
df
 
             
            
               
               
              1 Like 
            
            
           
          
            
              
                oo92  
                
               
              
                  
                    March 16, 2021,  8:16pm
                   
                   
              10 
               
             
            
              Yea but how can I confirm if the output of df is now Parquet and not CSV, as it used to be?
             
            
               
               
               
            
            
           
          
            
            
              Viewing the DataFrame and writing to disk are completely separate topics. 
CSV and Parquet are disk formats, DataFrames are in memory.
             
            
               
               
               
            
            
           
          
            
              
                oo92  
                
               
              
                  
                    March 16, 2021,  8:22pm
                   
                   
              12 
               
             
            
              Can I write this CSV file also as a Parquet file to my working directory? If so, how can I do that?
             
            
               
               
               
            
            
           
          
            
            
              Have you tried the method described by @Skoffer  ?
             
            
               
               
               
            
            
           
          
            
              
                oo92  
                
               
              
                  
                    March 16, 2021,  8:25pm
                   
                   
              14 
               
             
            
              Yea. I don’t see a parquet file in my current directory.
             
            
               
               
               
            
            
           
          
            
            
              Just change file definition to
file = "/home/onur/julia-assignment/temp.parquet"
 
             
            
               
               
               
            
            
           
          
            
              
                oo92  
                
               
              
                  
                    March 16, 2021,  8:33pm
                   
                   
              16 
               
             
            
              Wait. Just changing the file extension automatically converts to parquet?
             
            
               
               
               
            
            
           
          
            
            
              Obviously not. Changing directory from /tmp (as it is produced by tempfile) to `/home/onur/julia-assignment’ changes location of the resulting file.
             
            
               
               
               
            
            
           
          
            
              
                oo92  
                
               
              
                  
                    March 16, 2021,  8:36pm
                   
                   
              18 
               
             
            
              
 Skoffer:
 
parquet
 
 
I get this
ArgumentError: "/home/onur/julia-assignment/temp.parquet" is not a valid file
 
             
            
               
               
               
            
            
           
          
            
            
              using CSV, Parquet
df = CSV.read("/home/onur/julia-assignment/temp.csv", DataFrame)
file = "/home/onur/julia-assignment/temp.parquet"
write_parquet(file, df)
 
Which line exactly giving you this error? Can you show the complete output?
             
            
               
               
              1 Like 
            
            
           
          
            
              
                oo92  
                
               
              
                  
                    March 16, 2021,  8:39pm
                   
                   
              20 
               
             
            
              
Nvm. I messed up on this line. It was my mistake. Thank you very much.
             
            
               
               
              1 Like