I am using Julia (v0.6.4) to load csv files. These files can be found at this link:
Julia is unable to load them (train.csv and test.csv). I tried to use the CSV package with method CSV.read.
While in Python using pandas it load in no time. All all working perfectly.
It appears there is a serious issue in parsing/loading of these files in Julia.
Did anyone try and succeed or have answer/tip to make this work ?
I just google and it seems there is a Pandas wrapper for julia:
Maybe that will work for you?
Works for me on some simple test data.
julia> using CSV
julia> d = CSV.read("benchmarkdata.csv", header=false)
│ Row │ Column1 │ Column2 │ Column3 │
│ 1 │ c │ iteration_pi_sum │ 27.369 │
│ 2 │ c │ matrix_multiply │ 72.068 │
│ 3 │ c │ matrix_statistics │ 4.52399 │
│ 4 │ c │ parse_integers │ 0.099092 │
│ 5 │ c │ print_to_file │ 9.93013 │
│ 6 │ c │ recursion_fibonacci │ 0.022726 │
│ 7 │ c │ recursion_quicksort │ 0.258923 │
│ 8 │ c │ userfunc_mandelbrot │ 0.07669 │
│ 9 │ fortran │ iteration_pi_sum │ 27.3692 │
│ 10 │ fortran │ matrix_multiply │ 83.5437 │
Julia Version 0.6.4
Commit 9d11f62bcb (2018-07-09 19:09 UTC)
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i7-3960X CPU @ 3.30GHz
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge MAX_THREADS=16)
LLVM: libLLVM-3.9.1 (ORCJIT, sandybridge)
gibson@sophist$ cat benchmarkdata.csv
You should post exactly what you tried in Julia and the resulting error message or incorrect output. Use triple backticks to quote the code blocks.
You can also try to use
CSVFiles.jl, it uses a different parser under the hood, so if you are lucky, it might be able to deal with those files. Syntax would be:
using CSVFiles, DataFrames
df = load("foo.csv") |> DataFrame
Update 2018-Feb-19: added R feather and Pandas; thanks to
@zhangliye for the pandas code
For Julia, JLD.jl has the fastest write-solution and I have used it via the ultra-convenient FileIO.jl. However for interop with other packages, the slightly slower Feather.jl is also a good choice, also it may be arguable that you read data more often than you write, so Feather.jl’s superior read-speed will be essential. However, R’s feather is faster than Julia’s.
The read and write speed seem to scale …
Check out the post for “inspiration”, I often find that using R’s data.table’s
fread is the fastest.
Thanks. This works. But the same file if I try using CSV.jl or CSVFiles.jl it fails to load. These files can be downloaded from the Kaggle competition website (
Normally most of the CSV files are loaded using CSV.jl or CSVFiles.jl. In this case I am refering to specific files from the Kaggle competition site which are not loaded by CSV.jl or CSVFiles.jl. These files can be downloaded from the Kaggle competition website (
This clearly means there is a bug in CSV.jl or CSVFiles.jl.