Julia is unable to load CSV files from the Kaggle competition

I am using Julia (v0.6.4) to load csv files. These files can be found at this link:

Julia is unable to load them (train.csv and test.csv). I tried to use the CSV package with method CSV.read.

While in Python using pandas it load in no time. All all working perfectly.

It appears there is a serious issue in parsing/loading of these files in Julia.

Did anyone try and succeed or have answer/tip to make this work ?

Thanks.

I just google and it seems there is a Pandas wrapper for julia: https://github.com/JuliaPy/Pandas.jl
Maybe that will work for you?

Works for me on some simple test data.

julia> using CSV

julia> d = CSV.read("benchmarkdata.csv", header=false)
10Γ—3 DataFrames.DataFrame
β”‚ Row β”‚ Column1 β”‚ Column2             β”‚ Column3  β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ c       β”‚ iteration_pi_sum    β”‚ 27.369   β”‚
β”‚ 2   β”‚ c       β”‚ matrix_multiply     β”‚ 72.068   β”‚
β”‚ 3   β”‚ c       β”‚ matrix_statistics   β”‚ 4.52399  β”‚
β”‚ 4   β”‚ c       β”‚ parse_integers      β”‚ 0.099092 β”‚
β”‚ 5   β”‚ c       β”‚ print_to_file       β”‚ 9.93013  β”‚
β”‚ 6   β”‚ c       β”‚ recursion_fibonacci β”‚ 0.022726 β”‚
β”‚ 7   β”‚ c       β”‚ recursion_quicksort β”‚ 0.258923 β”‚
β”‚ 8   β”‚ c       β”‚ userfunc_mandelbrot β”‚ 0.07669  β”‚
β”‚ 9   β”‚ fortran β”‚ iteration_pi_sum    β”‚ 27.3692  β”‚
β”‚ 10  β”‚ fortran β”‚ matrix_multiply     β”‚ 83.5437  β”‚

julia> versioninfo()
Julia Version 0.6.4
Commit 9d11f62bcb (2018-07-09 19:09 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-3960X CPU @ 3.30GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge MAX_THREADS=16)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, sandybridge)

gibson@sophist$ cat benchmarkdata.csv 
c,iteration_pi_sum,27.369022
c,matrix_multiply,72.067976
c,matrix_statistics,4.523993
c,parse_integers,0.099092
c,print_to_file,9.930134
c,recursion_fibonacci,0.022726
c,recursion_quicksort,0.258923
c,userfunc_mandelbrot,0.07669
fortran,iteration_pi_sum,27.369179
fortran,matrix_multiply,83.543703

You should post exactly what you tried in Julia and the resulting error message or incorrect output. Use triple backticks to quote the code blocks.

1 Like

You can also try to use CSVFiles.jl, it uses a different parser under the hood, so if you are lucky, it might be able to deal with those files. Syntax would be:

using CSVFiles, DataFrames

df = load("foo.csv") |> DataFrame

Check out the post for β€œinspiration”, I often find that using R’s data.table’s fread is the fastest.

1 Like

Thanks. This works. But the same file if I try using CSV.jl or CSVFiles.jl it fails to load. These files can be downloaded from the Kaggle competition website (Santander Value Prediction Challenge | Kaggle).

Normally most of the CSV files are loaded using CSV.jl or CSVFiles.jl. In this case I am refering to specific files from the Kaggle competition site which are not loaded by CSV.jl or CSVFiles.jl. These files can be downloaded from the Kaggle competition website (Santander Value Prediction Challenge | Kaggle).
This clearly means there is a bug in CSV.jl or CSVFiles.jl.

2 Likes