Failing to import (relatively) large CSV file with Julia and VSC

My final suggestion:

First step: Convert the .csv file to .arrow format using the convert.jl script:

using CSV, Arrow

FILENAME_FULL = "20240110_120secMother_AllCountries_002_T-Results_2022_059_Markup001(full).csv"
OUT_FILE = "20240110_120secMother_AllCountries_002_T-Results_2022_059_Markup001(full).arrow"

Arrow.write(OUT_FILE, CSV.File(FILENAME_FULL; header=false, types=Float32))
nothing

Second step: Read the .arrow file and convert it to an array (if that is what you need):

using Arrow, Tables

IN_FILE = "20240110_120secMother_AllCountries_002_T-Results_2022_059_Markup001(full).arrow"

m = nothing
GC.gc(true)
m = Tables.matrix(Arrow.Table(IN_FILE))
println("Size of matrix variable: $(Base.summarysize(m)/1e9) GB")
nothing

The first script needs 45 seconds on my PC (Ryzen 7950X), the second script 4.5s.

@rocco_sprmnt21 used a matrix of 40000x4000 elements, we have:

julia> m
39360×39360 Matrix{Float32}

which is ten times as large…

5 Likes