I’ve done a fair amount of work to get my CSV cleaned up (pipe-delimited/no quotes), data types and column names right and loaded separate, consulted the out-of-core section of the JuliaDB docs, and I found the TrueFX Jupyter notebook which was a lifesaver.
Nonetheless, I’m getting numerous deep errors when trying to loadndsparse:
From worker 9: unknown function (ip: 0x7ff61615e300)
From worker 9: jl_apply_generic at /usr/bin/../lib64/libjulia.so.1 (unknown line)
From worker 9: unknown function (ip: 0x7ff622af5d88)
From worker 9: unknown function (ip: 0x7ff622af6344)
From worker 9: jl_apply_generic at /usr/bin/../lib64/libjulia.so.1 (unknown line)
From worker 9: #table#71 at /home/user/.julia/packages/IndexedTables/5U0Ap/src/indexedtable.jl:137
From worker 9: jl_apply_generic at /usr/bin/../lib64/libjulia.so.1 (unknown line)
From worker 9: #table at ./none:0
From worker 9: jl_apply_generic at /usr/bin/../lib64/libjulia.so.1 (unknown line)
From worker 9: #table#72 at /home/user/.julia/packages/IndexedTables/5U0Ap/src/indexedtable.jl:140
From worker 9: jl_apply_generic at /usr/bin/../lib64/libjulia.so.1 (unknown line)
From worker 9: #table at ./none:0
From worker 9: jl_apply_generic at /usr/bin/../lib64/libjulia.so.1 (unknown line)
From worker 9: #convert#86 at /home/user/.julia/packages/IndexedTables/5U0Ap/src/indexedtable.jl:388
From worker 9: jl_apply_generic at /usr/bin/../lib64/libjulia.so.1 (unknown line)
From worker 9: #convert at ./none:0
From worker 9: jl_apply_generic at /usr/bin/../lib64/libjulia.so.1 (unknown line)
From worker 9: #ndsparse#103 at /home/user/.julia/packages/IndexedTables/5U0Ap/src/ndsparse.jl:99
From worker 9: jl_apply_generic at /usr/bin/../lib64/libjulia.so.1 (unknown line)
From worker 9: #ndsparse at ./none:0
From worker 9: jl_apply_generic at /usr/bin/../lib64/libjulia.so.1 (unknown line)
From worker 9: #ndsparse#102 at /home/user/.julia/packages/IndexedTables/5U0Ap/src/ndsparse.jl:65
From worker 9: jl_apply_generic at /usr/bin/../lib64/libjulia.so.1 (unknown line)
From worker 9: #ndsparse at ./none:0
From worker 9: jl_apply_generic at /usr/bin/../lib64/libjulia.so.1 (unknown line)
From worker 9: #ndsparse#106 at /home/user/.julia/packages/IndexedTables/5U0Ap/src/ndsparse.jl:112
From worker 9: jl_apply_generic at /usr/bin/../lib64/libjulia.so.1 (unknown line)
From worker 9: #ndsparse at ./none:0
From worker 9: jl_apply_generic at /usr/bin/../lib64/libjulia.so.1 (unknown line)
From worker 9: #ndsparse#107 at /home/user/.julia/packages/IndexedTables/5U0Ap/src/ndsparse.jl:116
From worker 9: jl_apply_generic at /usr/bin/../lib64/libjulia.so.1 (unknown line)
From worker 9: #ndsparse at ./none:0
From worker 9: jl_apply_generic at /usr/bin/../lib64/libjulia.so.1 (unknown line)
From worker 9: #convert#120 at /home/user/.julia/packages/IndexedTables/5U0Ap/src/ndsparse.jl:314
From worker 9: jl_apply_generic at /usr/bin/../lib64/libjulia.so.1 (unknown line)
From worker 9: #convert at ./none:0
From worker 9: jl_apply_generic at /usr/bin/../lib64/libjulia.so.1 (unknown line)
From worker 9: #_loadtable_serial#3 at /home/user/.julia/packages/JuliaDB/jDAlJ/src/util.jl:183
From worker 9: unknown function (ip: 0x7ff6100605d8)
From worker 9: #_loadtable_serial at ./none:0 [inlined]
From worker 9: #190 at /home/user/.julia/packages/JuliaDB/jDAlJ/src/io.jl:131
From worker 9: jl_apply_generic at /usr/bin/../lib64/libjulia.so.1 (unknown line)
From worker 9: do_task at /home/user/.julia/packages/Dagger/sdZXi/src/scheduler.jl:259
From worker 9: unknown function (ip: 0x7ff610056175)
From worker 9: jl_apply_generic at /usr/bin/../lib64/libjulia.so.1 (unknown line)
From worker 9: jl_f__apply at /usr/bin/../lib64/libjulia.so.1 (unknown line)
From worker 9: #112 at /builddir/build/BUILD/julia/build/usr/share/julia/stdlib/v1.2/Distributed/src/process_messages.jl:292
From worker 9: run_work_thunk at /builddir/build/BUILD/julia/build/usr/share/julia/stdlib/v1.2/Distributed/src/process_messages.jl:79
From worker 9: macro expansion at /builddir/build/BUILD/julia/build/usr/share/julia/stdlib/v1.2/Distributed/src/process_messages.jl:292 [inlined]
From worker 9: #111 at ./task.jl:268
From worker 9: jl_apply_generic at /usr/bin/../lib64/libjulia.so.1 (unknown line)
From worker 9: unknown function (ip: 0x7ff622b10c12)
From worker 9: unknown function (ip: 0xffffffffffffffff)
Here’s some errors I managed to grab while attempting to use loadtable instead of loadndsparse:
From worker 10: unknown function (ip: 0x7f52aba86c61)
From worker 10: unknown function (ip: 0x7f52aba871d2)
From worker 10: unknown function (ip: 0x7f52aba87300)
From worker 10: jl_apply_generic at /usr/bin/../lib64/libjulia.so.1 (unknown line)
From worker 10: unknown function (ip: 0x7f52b841ed88)
From worker 10: unknown function (ip: 0x7f52b841f344)
From worker 10: jl_apply_generic at /usr/bin/../lib64/libjulia.so.1 (unknown line)
From worker 10: do_task at /home/user/.julia/packages/Dagger/sdZXi/src/scheduler.jl:260
From worker 10: unknown function (ip: 0x7f52a597f0c5)
From worker 10: jl_apply_generic at /usr/bin/../lib64/libjulia.so.1 (unknown line)
From worker 10: jl_f__apply at /usr/bin/../lib64/libjulia.so.1 (unknown line)
From worker 10: #112 at /builddir/build/BUILD/julia/build/usr/share/julia/stdlib/v1.2/Distributed/src/process_messages.jl:292
From worker 10: run_work_thunk at /builddir/build/BUILD/julia/build/usr/share/julia/stdlib/v1.2/Distributed/src/process_messages.jl:79
From worker 10: macro expansion at /builddir/build/BUILD/julia/build/usr/share/julia/stdlib/v1.2/Distributed/src/process_messages.jl:292 [inlined]
From worker 10: #111 at ./task.jl:268
From worker 10: jl_apply_generic at /usr/bin/../lib64/libjulia.so.1 (unknown line)
From worker 10: unknown function (ip: 0x7f52b8439c12)
From worker 10: unknown function (ip: 0xffffffffffffffff)
I tried running this without loading anything from the CSV package and after running pkg update. Here’s some of the errors:
From worker 2: jl_apply_generic at /usr/bin/../lib64/libjulia.so.1 (unknown line)
From worker 2: unknown function (ip: 0x7f60e83d299f)
From worker 2: unknown function (ip: 0x7f60e83d3e3f)
From worker 2: unknown function (ip: 0x7f60e83d74a7)
From worker 2: unknown function (ip: 0x7f60e83d775d)
From worker 2: unknown function (ip: 0x7f60e83d9169)
From worker 2: unknown function (ip: 0x7f60e83dc1a0)
From worker 2: unknown function (ip: 0x7f60e83e16ae)
From worker 2: unknown function (ip: 0x7f60e845068a)
From worker 2: unknown function (ip: 0x7f60e8451c61)
From worker 2: unknown function (ip: 0x7f60e84521d2)
From worker 2: unknown function (ip: 0x7f60e8452300)
From worker 2: jl_apply_generic at /usr/bin/../lib64/libjulia.so.1 (unknown line)
From worker 2: unknown function (ip: 0x7f60f4de9d88)
From worker 2: unknown function (ip: 0x7f60f4dea344)
From worker 2: jl_apply_generic at /usr/bin/../lib64/libjulia.so.1 (unknown line)
From worker 2: #_loadtable_serial#3 at /home/user/.julia/packages/JuliaDB/jDAlJ/src/util.jl:178
From worker 2: unknown function (ip: 0x7f60e237f108)
From worker 2: #_loadtable_serial at ./none:0 [inlined]
From worker 2: #190 at /home/user/.julia/packages/JuliaDB/jDAlJ/src/io.jl:131
From worker 2: jl_apply_generic at /usr/bin/../lib64/libjulia.so.1 (unknown line)
From worker 2: do_task at /home/user/.julia/packages/Dagger/sdZXi/src/scheduler.jl:259
From worker 2: unknown function (ip: 0x7f60e2322505)
From worker 2: jl_apply_generic at /usr/bin/../lib64/libjulia.so.1 (unknown line)
From worker 2: jl_f__apply at /usr/bin/../lib64/libjulia.so.1 (unknown line)
From worker 2: #112 at /builddir/build/BUILD/julia/build/usr/share/julia/stdlib/v1.2/Distributed/src/process_messages.jl:292
From worker 2: run_work_thunk at /builddir/build/BUILD/julia/build/usr/share/julia/stdlib/v1.2/Distributed/src/process_messages.jl:79
From worker 2: macro expansion at /builddir/build/BUILD/julia/build/usr/share/julia/stdlib/v1.2/Distributed/src/process_messages.jl:292 [inlined]
From worker 2: #111 at ./task.jl:268
From worker 2: jl_apply_generic at /usr/bin/../lib64/libjulia.so.1 (unknown line)
From worker 2: unknown function (ip: 0x7f60f4e04c12)
From worker 2: unknown function (ip: 0xffffffffffffffff)
Also struggled to make JuliaDB work. Afaik, further development is not prioritised at this stage. Also Juliadb uses text parse.jl as the csv loading engine, not csv.jl. But I don’t think that knowledge helps in this case
there is no ability to read chunk by chunk but feather files are just like csvs in that they live on disk, and it’s lazy loading. So might have some advantages but not sure…
If you need to process column by column you might want to check out JDF.jl
You can load only a few columns at a time for analysis like this