Simple benchmark for CVS.jl v0.9.11 in Julia 1.6.4

KZiemian · November 26, 2021, 11:01pm

I try to learn some data science in Julia from Julia Academy’s Julia for Data Science and simple benchmark from it is quite surprising for me. I made its code below compact and probably use more packages that it need, just because I don’t know which are needed.

using BenchmarkTools
using DataFrames
using DelimitedFiles
using CSV
using XLSX
using Downloads

P = Downloads.download("https://raw.githubusercontent.com/nassarhuda/easy_data/master/programming_languages.csv",
    "programming_languages.csv")

@btime P, H = readdlm("programming_languages.csv", ','; header=true);
@btime C = CSV.read("programming_languages.csv", DataFrame);

I get result like this.

125.375 μs (325 allocations: 51.19 KiB)
229.006 μs (428 allocations: 40.95 KiB)

Benchmarks show in the Jupyter notebook are below.

87.708 μs (325 allocations: 51.19 KiB)
35.417 μs (227 allocations: 22.02 KiB)

I have CSV.jl v0.9.11 and here is my machine info.

Julia Version 1.6.4
Commit 35f0c911f4 (2021-11-19 03:54 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i5-3210M CPU @ 2.50GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, ivybridge)

goerch · November 26, 2021, 11:24pm

Hm, I’d assume CSV.jl is optimized for slightly different use cases like this one (and of course it seems to be able to handle quite a bit of degenerate inputs).

Topic		Replies	Views
CSV vs DelimitedFiles vs Numpy Performance	15	978	January 20, 2024
CSV read in is too slow than other language General Usage performance	13	1371	June 21, 2023
CSV.read extremely slow wrt readtable Data	14	3638	July 27, 2018
CSV Reader Benchmarks: Julia Reads CSVs 10-20x Faster than Python and R General Usage announcement	68	8911	March 23, 2022
Very slow readdlm() General Usage	14	1917	October 2, 2018

Simple benchmark for CVS.jl v0.9.11 in Julia 1.6.4

Related topics