Julia import large out-of-memory csv data



I am a new Julia user and a PhD in Finance, and I would like to use Julia in my future financial research. I have been using R for most of my research projects.

I was wondering if Julia has a package similar to Sparklyr in R that could handle large out of memory data. My data is 20 GB in csv format, and my ram is 16GB.

I installed HPAT package in Julia, but I am not sure if it helps to handle big data. In addition, I noticed that there is a Spark package in Julia, does it have any function that enables me to import local data, like the spark_read_csv function in Sparklyr?

By the way, I want to know if I can turn on the bracket and quote autocomplete option in JuliaPro.

Also, I am not sure if I can use LaTex in Julia, since it is an important tool for academic publication.

Finally, I noticed that Wharton Research Data Services (the main data vendor for financial scholars) does not have any info in terms of accessing data via Julia, I have requested them to make a tutorial of this, hopefully this will be done before the end of July.


You might want to check out JuliaDB: https://github.com/JuliaComputing/JuliaDB.jl It’s very new, but it’s being actively worked on by some of the core Julia devs, so it has a pretty promising future.

Also, can you explain what you mean by using LaTeX in Julia? You can type various mathematical symbols using their LaTeX abbreviations with \alpha<tab> to get α and \in<tab> for and so on. This works in the Julia REPL, inside JuliaPro, in Juno, in Sublime with the Julia language package, and probably many other environments. Is that what you mean?


I second @rdeits’s suggestion for JuliaDB. If that does not work though, you can try reading and parsing line by line, storing in a compact format in memory. This is what I am currently doing for a 500GB dataset.

Regarding LaTeX, if you mean exporting result tables (eg regressions) like in R, I am not aware of any package. It could be a nice beginner project.


If it’s plotting with LaTeX annotation you’re asking about then that is also possible.


I’m also not entirely sure what you meant by this but perhaps Weave.jl is what you’re looking for.


You can use LaTeX in your plots via [LaTeXStrings](https://github.com/stevengj/LaTeXStrings.jl_LaTeXStrings. See here for an example (about a quarter of the way down the page).


I mean that I need to transform tables and regression results in Julia to LaTex code and copy the code to Tex files, like what xtable and stargazer in R.




Thanks. This is exactly what I am looking for.


Just for curiosity, then what does Spark and HPAT packages do? They seem to be designed for handling big data, then do they have any function that could import large out of memory data?


One more question, do you know how to turn on the bracket and quote autocomplete option in Julia Juno Atom?


I am using the most recent version of Julia Pro, but could not install the JuliaDB package, I got the error message as below

JuliaDB can’t be installed because it has no versions that support 0.5.2 of julia. You may need to update METADATA by running Pkg.update()

how do I resolve this issue?


The REQUIRE file lists Julia 0.6- so I think you’d have to update to Julia Pro 0.6 if that’s possible.

The documentation also states that “JuliaDB works on Julia 0.6 or higher.