I am a new Julia user and a PhD in Finance, and I would like to use Julia in my future financial research. I have been using R for most of my research projects.
I was wondering if Julia has a package similar to Sparklyr in R that could handle large out of memory data. My data is 20 GB in csv format, and my ram is 16GB.
I installed HPAT package in Julia, but I am not sure if it helps to handle big data. In addition, I noticed that there is a Spark package in Julia, does it have any function that enables me to import local data, like the spark_read_csv function in Sparklyr?
By the way, I want to know if I can turn on the bracket and quote autocomplete option in JuliaPro.
Also, I am not sure if I can use LaTex in Julia, since it is an important tool for academic publication.
Finally, I noticed that Wharton Research Data Services (the main data vendor for financial scholars) does not have any info in terms of accessing data via Julia, I have requested them to make a tutorial of this, hopefully this will be done before the end of July.
You might want to check out JuliaDB: https://github.com/JuliaComputing/JuliaDB.jl It’s very new, but it’s being actively worked on by some of the core Julia devs, so it has a pretty promising future.
Also, can you explain what you mean by using LaTeX in Julia? You can type various mathematical symbols using their LaTeX abbreviations with \alpha<tab> to get α and \in<tab> for ∈ and so on. This works in the Julia REPL, inside JuliaPro, in Juno, in Sublime with the Julia language package, and probably many other environments. Is that what you mean?
I second @rdeits’s suggestion for JuliaDB. If that does not work though, you can try reading and parsing line by line, storing in a compact format in memory. This is what I am currently doing for a 500GB dataset.
Regarding LaTeX, if you mean exporting result tables (eg regressions) like in R, I am not aware of any package. It could be a nice beginner project.
I mean that I need to transform tables and regression results in Julia to LaTex code and copy the code to Tex files, like what xtable and stargazer in R.
Just for curiosity, then what does Spark and HPAT packages do? They seem to be designed for handling big data, then do they have any function that could import large out of memory data?