Julia import large out-of-memory csv data

Yifan_Liu · June 29, 2017, 4:42am

I am a new Julia user and a PhD in Finance, and I would like to use Julia in my future financial research. I have been using R for most of my research projects.

I was wondering if Julia has a package similar to Sparklyr in R that could handle large out of memory data. My data is 20 GB in csv format, and my ram is 16GB.

I installed HPAT package in Julia, but I am not sure if it helps to handle big data. In addition, I noticed that there is a Spark package in Julia, does it have any function that enables me to import local data, like the spark_read_csv function in Sparklyr?

By the way, I want to know if I can turn on the bracket and quote autocomplete option in JuliaPro.

Also, I am not sure if I can use LaTex in Julia, since it is an important tool for academic publication.

Finally, I noticed that Wharton Research Data Services (the main data vendor for financial scholars) does not have any info in terms of accessing data via Julia, I have requested them to make a tutorial of this, hopefully this will be done before the end of July.

rdeits · June 29, 2017, 4:55am

You might want to check out JuliaDB: https://github.com/JuliaComputing/JuliaDB.jl It’s very new, but it’s being actively worked on by some of the core Julia devs, so it has a pretty promising future.

Also, can you explain what you mean by using LaTeX in Julia? You can type various mathematical symbols using their LaTeX abbreviations with \alpha<tab> to get α and \in<tab> for ∈ and so on. This works in the Julia REPL, inside JuliaPro, in Juno, in Sublime with the Julia language package, and probably many other environments. Is that what you mean?

Tamas_Papp · June 29, 2017, 6:20am

I second @rdeits’s suggestion for JuliaDB. If that does not work though, you can try reading and parsing line by line, storing in a compact format in memory. This is what I am currently doing for a 500GB dataset.

Regarding LaTeX, if you mean exporting result tables (eg regressions) like in R, I am not aware of any package. It could be a nice beginner project.

Daneel · June 29, 2017, 7:10am

If it’s plotting with LaTeX annotation you’re asking about then that is also possible.

ValdarT · June 29, 2017, 8:30am

I’m also not entirely sure what you meant by this but perhaps Weave.jl is what you’re looking for.

stillyslalom · June 29, 2017, 8:54am

You can use LaTeX in your plots via [LaTeXStrings](https://github.com/stevengj/LaTeXStrings.jl_[LaTeXStrings](https://github.com/stevengj/LaTeXStrings.jl). See here for an example (about a quarter of the way down the page).

Yifan_Liu · June 29, 2017, 10:36am

I mean that I need to transform tables and regression results in Julia to LaTex code and copy the code to Tex files, like what xtable and stargazer in R.

Yifan_Liu · June 29, 2017, 1:36pm

Thanks. This is exactly what I am looking for.

Yifan_Liu · June 29, 2017, 1:50pm

Just for curiosity, then what does Spark and HPAT packages do? They seem to be designed for handling big data, then do they have any function that could import large out of memory data？

Yifan_Liu · June 29, 2017, 2:00pm

One more question, do you know how to turn on the bracket and quote autocomplete option in Julia Juno Atom?

Yifan_Liu · June 29, 2017, 2:20pm

I am using the most recent version of Julia Pro, but could not install the JuliaDB package, I got the error message as below

JuliaDB can’t be installed because it has no versions that support 0.5.2 of julia. You may need to update METADATA by running Pkg.update()

how do I resolve this issue?

Daneel · June 29, 2017, 2:43pm

The REQUIRE file lists Julia 0.6- so I think you’d have to update to Julia Pro 0.6 if that’s possible.

https://github.com/JuliaComputing/JuliaDB.jl/blob/master/REQUIRE

The documentation also states that “JuliaDB works on Julia 0.6 or higher.”

Topic		Replies	Views
Importing big data General Usage question	21	5443	November 14, 2017
Package for reading/writing ~100GB data files General Usage	10	2877	November 17, 2018
Using JuliaDB to create larger than memory datasets and work with them? General Usage	3	1050	October 15, 2019
Julia run using terminal for 1GB dataset showing out of memory error General Usage question	18	5003	August 31, 2017
JuliaDB out-of-memory computations New to Julia	2	515	December 6, 2018

Julia import large out-of-memory csv data

Related topics