Julia is unable to load CSV files from the Kaggle competition

Altaf_Sultanji · August 2, 2018, 4:34pm

I am using Julia (v0.6.4) to load csv files. These files can be found at this link:

Julia is unable to load them (train.csv and test.csv). I tried to use the CSV package with method CSV.read.

While in Python using pandas it load in no time. All all working perfectly.

It appears there is a serious issue in parsing/loading of these files in Julia.

Did anyone try and succeed or have answer/tip to make this work ?

Thanks.

favba · August 2, 2018, 4:36pm

I just google and it seems there is a Pandas wrapper for julia: https://github.com/JuliaPy/Pandas.jl
Maybe that will work for you?

John_Gibson · August 2, 2018, 5:45pm

Works for me on some simple test data.

julia> using CSV

julia> d = CSV.read("benchmarkdata.csv", header=false)
10×3 DataFrames.DataFrame
│ Row │ Column1 │ Column2             │ Column3  │
├─────┼─────────┼─────────────────────┼──────────┤
│ 1   │ c       │ iteration_pi_sum    │ 27.369   │
│ 2   │ c       │ matrix_multiply     │ 72.068   │
│ 3   │ c       │ matrix_statistics   │ 4.52399  │
│ 4   │ c       │ parse_integers      │ 0.099092 │
│ 5   │ c       │ print_to_file       │ 9.93013  │
│ 6   │ c       │ recursion_fibonacci │ 0.022726 │
│ 7   │ c       │ recursion_quicksort │ 0.258923 │
│ 8   │ c       │ userfunc_mandelbrot │ 0.07669  │
│ 9   │ fortran │ iteration_pi_sum    │ 27.3692  │
│ 10  │ fortran │ matrix_multiply     │ 83.5437  │

julia> versioninfo()
Julia Version 0.6.4
Commit 9d11f62bcb (2018-07-09 19:09 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-3960X CPU @ 3.30GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge MAX_THREADS=16)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, sandybridge)

gibson@sophist$ cat benchmarkdata.csv 
c,iteration_pi_sum,27.369022
c,matrix_multiply,72.067976
c,matrix_statistics,4.523993
c,parse_integers,0.099092
c,print_to_file,9.930134
c,recursion_fibonacci,0.022726
c,recursion_quicksort,0.258923
c,userfunc_mandelbrot,0.07669
fortran,iteration_pi_sum,27.369179
fortran,matrix_multiply,83.543703

You should post exactly what you tried in Julia and the resulting error message or incorrect output. Use triple backticks to quote the code blocks.

davidanthoff · August 2, 2018, 5:55pm

You can also try to use CSVFiles.jl, it uses a different parser under the hood, so if you are lucky, it might be able to deal with those files. Syntax would be:

using CSVFiles, DataFrames

df = load("foo.csv") |> DataFrame

xiaodai · August 3, 2018, 5:25am

Check out the post for “inspiration”, I often find that using R’s data.table’s fread is the fastest.

Altaf_Sultanji · August 3, 2018, 3:47pm

Thanks. This works. But the same file if I try using CSV.jl or CSVFiles.jl it fails to load. These files can be downloaded from the Kaggle competition website (Santander Value Prediction Challenge | Kaggle).

Altaf_Sultanji · August 3, 2018, 3:47pm

Normally most of the CSV files are loaded using CSV.jl or CSVFiles.jl. In this case I am refering to specific files from the Kaggle competition site which are not loaded by CSV.jl or CSVFiles.jl. These files can be downloaded from the Kaggle competition website (Santander Value Prediction Challenge | Kaggle).
This clearly means there is a bug in CSV.jl or CSVFiles.jl.

Topic		Replies	Views
How to solve failure to load CSV in Julia? New to Julia	1	354	April 11, 2022
JuliaDB won't open CSV file General Usage juliadb	6	1923	April 10, 2019
Error loading CSV package in Jupyter Notebook New to Julia jupyter , csv	3	415	August 2, 2022
Julia 1.0.1 Notebook gets crash New to Julia	4	582	November 11, 2018
Failed to precompile CSV due to load error Package Management package , csv	3	2287	October 28, 2021

Julia is unable to load CSV files from the Kaggle competition

Related topics