I am using Julia (v0.6.4) to load csv files. These files can be found at this link:
Julia is unable to load them (train.csv and test.csv). I tried to use the CSV package with method CSV.read.
While in Python using pandas it load in no time. All all working perfectly.
It appears there is a serious issue in parsing/loading of these files in Julia.
Did anyone try and succeed or have answer/tip to make this work ?
Thanks.
favba
August 2, 2018, 4:36pm
#2
I just google and it seems there is a Pandas wrapper for julia: https://github.com/JuliaPy/Pandas.jl
Maybe that will work for you?
Works for me on some simple test data.
julia> using CSV
julia> d = CSV.read("benchmarkdata.csv", header=false)
10Γ3 DataFrames.DataFrame
β Row β Column1 β Column2 β Column3 β
βββββββΌββββββββββΌββββββββββββββββββββββΌβββββββββββ€
β 1 β c β iteration_pi_sum β 27.369 β
β 2 β c β matrix_multiply β 72.068 β
β 3 β c β matrix_statistics β 4.52399 β
β 4 β c β parse_integers β 0.099092 β
β 5 β c β print_to_file β 9.93013 β
β 6 β c β recursion_fibonacci β 0.022726 β
β 7 β c β recursion_quicksort β 0.258923 β
β 8 β c β userfunc_mandelbrot β 0.07669 β
β 9 β fortran β iteration_pi_sum β 27.3692 β
β 10 β fortran β matrix_multiply β 83.5437 β
julia> versioninfo()
Julia Version 0.6.4
Commit 9d11f62bcb (2018-07-09 19:09 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i7-3960X CPU @ 3.30GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge MAX_THREADS=16)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.9.1 (ORCJIT, sandybridge)
gibson@sophist$ cat benchmarkdata.csv
c,iteration_pi_sum,27.369022
c,matrix_multiply,72.067976
c,matrix_statistics,4.523993
c,parse_integers,0.099092
c,print_to_file,9.930134
c,recursion_fibonacci,0.022726
c,recursion_quicksort,0.258923
c,userfunc_mandelbrot,0.07669
fortran,iteration_pi_sum,27.369179
fortran,matrix_multiply,83.543703
You should post exactly what you tried in Julia and the resulting error message or incorrect output. Use triple backticks to quote the code blocks.
1 Like
You can also try to use CSVFiles.jl , it uses a different parser under the hood, so if you are lucky, it might be able to deal with those files. Syntax would be:
using CSVFiles, DataFrames
df = load("foo.csv") |> DataFrame
Update 2018-Feb-19: added R feather and Pandas; thanks to @zhangliye for the pandas code
For Julia, JLD.jl has the fastest write-solution and I have used it via the ultra-convenient FileIO.jl. However for interop with other packages, the slightly slower Feather.jl is also a good choice, also it may be arguable that you read data more often than you write, so Feather.jlβs superior read-speed will be essential. However, Rβs feather is faster than Juliaβs.
The read and write speed seem to scale β¦
Check out the post for βinspirationβ, I often find that using Rβs data.tableβs fread
is the fastest.
1 Like
Thanks. This works. But the same file if I try using CSV.jl or CSVFiles.jl it fails to load. These files can be downloaded from the Kaggle competition website (https://www.kaggle.com/c/santander-value-prediction-challenge/data ).
Normally most of the CSV files are loaded using CSV.jl or CSVFiles.jl. In this case I am refering to specific files from the Kaggle competition site which are not loaded by CSV.jl or CSVFiles.jl. These files can be downloaded from the Kaggle competition website (https://www.kaggle.com/c/santander-value-prediction-challenge/data ).
This clearly means there is a bug in CSV.jl or CSVFiles.jl.
2 Likes