Importing big data

dfdx · November 13, 2017, 11:07pm

I don’t have PostgreSQL at hand to test it, but it should look something like this (assuming you have already install Spark.jl):

Edit jvm/sparkjl/pom.xml and add the following to dependencies section:

<!-- https://mvnrepository.com/artifact/org.postgresql/postgresql -->
<dependency>
    <groupId>org.postgresql</groupId>
    <artifactId>postgresql</artifactId>
    <version>42.1.4</version>
</dependency>

In Java world, pom.xml is a single place you put all your dependencies. Since basic Spark installation doesn’t support PostgreSQL driver, we need to add it to the Java CLASSPATH. There are other ways to do it, but find this one pretty simple for basic use cases.

Run Pkg.build("Spark") for changes to take effect.
Create SparkSession:

using Spark
Spark.init()
sess = SparkSession()  # uses "local" master

Read Spark’s Dataset using JDBC format:

options = Dict(
    "url" => "jdbc:postgresql:dbserver",
    "dbtable" => "schema.tablename",
    "user" => "username",
    "password" => "password")
df = read_df(sess, "";  format="jdbc", options=options)

Converting Spark Dataset / DataFrame to Julia DataFrame isn’t supported out of the box yet, but you can:

export Spark dataset to CSV and read it from DataFrames.jl
call collect(spark_df) to get a list of rows and then build a Julia DataFrame

Issues on GitHub are also welcome. Spark API is really huge, so instead of randomly implementing parts of it I expect users of Spark.jl to create issues so I could prioritize and plan them.

ExpandingMan · November 14, 2017, 2:31pm

At some point when the data ecosystem stabilizes some I’d be happy to make PR’s for DataFrames support (still waiting for latest to be tagged).

Topic		Replies	Views
Package for reading/writing ~100GB data files General Usage	10	2924	November 17, 2018
Julia import large out-of-memory csv data General Usage question	11	2382	June 29, 2017
Why do you use JuliaDB? General Usage	9	2168	October 28, 2019
ANN: JuliaDB.jl Community	40	9856	November 13, 2018
Ingesting data to JuliaDB without .csv files Data question	4	1309	August 30, 2018

Importing big data

Related topics