JuliaDB, tutorial with large datasets and other questions

rna1990 · January 20, 2020, 4:49pm

Dear fellow julians,

I’ve been learning the usage of JuliaDB to wrangle with large datasets (couple of millions of entries) and I don’t think I’m quite understanding the difference between in-memory and out-of-memory processing.

My first question is: let’s say I loaded a table from .csv with loadtable() providing a filepath for output. This should load the table even if it’s larger than my laptops memory. But, what if I then merge/select/transform/join? Is the output of those operations still on disk? or in memory?
What if I save the table as a binary and load it in another session (does it go to memory or still out of memory?)?

Second question: do you have to convert any IndexedTable object to a DIndexedTable to be able to distribute the wrangling? Is there anyway to load a .csv directly into a DIndexedTable? I’m trying to join two IndexedTables and it takes forever or my pc simply runs out of memory, even if my original loading of the data specified an output filepath.

Thrid question: Is there any tutorial with very large datasets? the tutorial with the flights dataset doesn’t seem to help mea lot because I’m using the same functions it explains and the performance on my large dataset is somewhat dissapointing.

I hope my questions aren’t too ambiguous and thanks in advance for your help!

Topic		Replies	Views
Using JuliaDB to create larger than memory datasets and work with them? General Usage	3	1053	October 15, 2019
Struggling with Julia and large datasets General Usage question , big-data	67	11069	October 17, 2024
ANN: JuliaDB.jl Community	40	9705	November 13, 2018
JuliaDB out-of-memory computations New to Julia	2	515	December 6, 2018
JuliaDB Questions/Issues New to Julia package	13	2525	July 3, 2019

JuliaDB, tutorial with large datasets and other questions

Related topics