I am using Julia (v0.6.4) to load csv files. These files can be found at this link:
Julia is unable to load them (train.csv and test.csv). I tried to use the CSV package with method CSV.read.
While in Python using pandas it load in no time. All all working perfectly.
It appears there is a serious issue in parsing/loading of these files in Julia.
Did anyone try and succeed or have answer/tip to make this work ?
Thanks.
favba
August 2, 2018, 4:36pm
2
I just google and it seems there is a Pandas wrapper for julia: https://github.com/JuliaPy/Pandas.jl
Maybe that will work for you?
Works for me on some simple test data.
julia> using CSV
julia> d = CSV.read("benchmarkdata.csv", header=false)
10Γ3 DataFrames.DataFrame
β Row β Column1 β Column2 β Column3 β
βββββββΌββββββββββΌββββββββββββββββββββββΌβββββββββββ€
β 1 β c β iteration_pi_sum β 27.369 β
β 2 β c β matrix_multiply β 72.068 β
β 3 β c β matrix_statistics β 4.52399 β
β 4 β c β parse_integers β 0.099092 β
β 5 β c β print_to_file β 9.93013 β
β 6 β c β recursion_fibonacci β 0.022726 β
β 7 β c β recursion_quicksort β 0.258923 β
β 8 β c β userfunc_mandelbrot β 0.07669 β
β 9 β fortran β iteration_pi_sum β 27.3692 β
β 10 β fortran β matrix_multiply β 83.5437 β
julia> versioninfo()
Julia Version 0.6.4
Commit 9d11f62bcb (2018-07-09 19:09 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i7-3960X CPU @ 3.30GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge MAX_THREADS=16)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.9.1 (ORCJIT, sandybridge)
gibson@sophist$ cat benchmarkdata.csv
c,iteration_pi_sum,27.369022
c,matrix_multiply,72.067976
c,matrix_statistics,4.523993
c,parse_integers,0.099092
c,print_to_file,9.930134
c,recursion_fibonacci,0.022726
c,recursion_quicksort,0.258923
c,userfunc_mandelbrot,0.07669
fortran,iteration_pi_sum,27.369179
fortran,matrix_multiply,83.543703
You should post exactly what you tried in Julia and the resulting error message or incorrect output. Use triple backticks to quote the code blocks.
1 Like
You can also try to use CSVFiles.jl , it uses a different parser under the hood, so if you are lucky, it might be able to deal with those files. Syntax would be:
using CSVFiles, DataFrames
df = load("foo.csv") |> DataFrame
Update 2018-Feb-19: added R feather and Pandas; thanks to @zhangliye for the pandas code
For Julia, JLD.jl has the fastest write-solution and I have used it via the ultra-convenient FileIO.jl. However for interop with other packages, the slightly slower Feather.jl is also a good choice, also it may be arguable that you read data more often than you write, so Feather.jlβs superior read-speed will be essential. However, Rβs feather is faster than Juliaβs.
The read and write speed seem to scale β¦
Check out the post for βinspirationβ, I often find that using Rβs data.tableβs fread
is the fastest.
1 Like
Thanks. This works. But the same file if I try using CSV.jl or CSVFiles.jl it fails to load. These files can be downloaded from the Kaggle competition website (Santander Value Prediction Challenge | Kaggle ).
Normally most of the CSV files are loaded using CSV.jl or CSVFiles.jl. In this case I am refering to specific files from the Kaggle competition site which are not loaded by CSV.jl or CSVFiles.jl. These files can be downloaded from the Kaggle competition website (Santander Value Prediction Challenge | Kaggle ).
This clearly means there is a bug in CSV.jl or CSVFiles.jl.
2 Likes