SAS is big in the corporate world. I work in the finance industry and SAS is still quite big. To be honest, using SAS is a pain. SAS can’t even syntax highlight its own language properly. But it’s still there because it has one trick – disk-based data manipulation and associated algorithms. Ten yea…

Looking forward to your blog post. I represent the proportion of people who have never heard of mmap before.

Thanks!! I thought JuliaDB was for connection to databases. Didn’t realise it had persistent data storage capability. Looks very close to what I need. Will do the research.

Also, there will soon be OnlineStats integration in JuliaDB ( https://github.com/JuliaComputing/JuliaDB.jl/pull/75 ) which would help building algorithms on top of it. Take a look at SparseRegression , for an example.

It’s a bit tricky, as there’s a “JuliaDB” organisation for connecting to databases, and then there’s the unrelated “JuliaDB.jl” package…

@xiaodai , Julia has some amazing tools for big data. One example is the ability to do lazy transformations of large arrays. For example, let’s imagine you have a 10TB 4d array stored as an NRRD file, and you want to take the square root of each element and swap dimensions 3 and 4. This could easy ta…

I hope to be able to learn more about these and be able to introduce this to the masses. It’s not something that I’ve seen and the syntax looks a bit different to the type programming I am used to e.g. R data.frame, data.table.

It’s also worth mentioning packages wrapping SQL engines, like SQLite. I know SAS users often rely on proc sql because it’s faster than the standard data step, so that should make sense to them. Of course that requires writing SQL instructions. I think @davidanthoff has also been working on a SQL b…

I don’t think it’s been mentioned in the thread, but the term you’re looking for is out-of-core. External memory algorithm In computing, external memory algorithms or out-of-core algorithms are algorithms that are designed to process data that are too large to fit into a computer's main …

I did a short writeup here: https://tpapp.github.io/post/large-ragged-dataset-julia/ Does not go into much detail, but the libraries I made public are much better documented. Hope you find this useful. FWIW, once data is ingested into a binary format and mmapped, I find that I can process a 100 G…

Disk based data manipulation framework needed

Specific Domains Data

Topic		Replies	Views
JuliaDB out-of-memory computations New to Julia	2	537	December 6, 2018
Package for reading/writing ~100GB data files General Usage	10	2966	November 17, 2018
Using JuliaDB to create larger than memory datasets and work with them? General Usage	3	1086	October 15, 2019
Online/out-of-core machine learning (ML) algorithms needs to compete with H20 & Spark Data	13	2444	March 1, 2018
ANN: JuliaDB.jl Community	40	10053	November 13, 2018

Disk based data manipulation framework needed

Related topics