Hi Billpete002,
The SparkSQL.jl package enables Julia programs to work with Spark data using SQL. SparkSQL.jl returns results from Apache Spark queries as Julia DataFrames. You can move Julia data to your Spark query too. A common use case for SparkSQL.jl is machine learning. SparkSQL.jl makes it easy to get data from Spark using SQL, do machine learning in Julia, and return data back to Apache Spark. Example syntax:
JuliaDataFrame = DataFrame(tickers = ["CRM", "IBM"])
onSpark = toSparkDS(sprk, JuliaDataFrame)
createOrReplaceTempView(onSpark, "julia_data")
query = sql(sprk, "SELECT * FROM spark_data WHERE TICKER IN (SELECT * FROM julia_data)")
results = toJuliaDF(query)
describe(results)
To learn more, visit the tutorial page and project pages:
Tutorial page:
Project page: