The SparkSQL.jl package enables Julia programs to work with Spark data using SQL. SparkSQL.jl returns results from Apache Spark queries as Julia DataFrames. You can move Julia data to your Spark query too. A common use case for SparkSQL.jl is machine learning. SparkSQL.jl makes it easy to get data from Spark using SQL, do machine learning in Julia, and return data back to Apache Spark. Example syntax:
JuliaDataFrame = DataFrame(tickers = ["CRM", "IBM"])
onSpark = toSparkDS(sprk, JuliaDataFrame)
query = sql(sprk, "SELECT * FROM spark_data WHERE TICKER IN (SELECT * FROM julia_data)")
results = toJuliaDF(query)
To learn more, visit the tutorial page and project pages:
Thanks for the answer. Finding libraries for Julia has been a challenge! I don’t have a spark instance set up currently but this looks do-able. It would be great to keep this all within Julia like R and Python can.