for my job we are exploring the datalakehouse possibilities in DataBricks. Also we probably need to write new code (currently in PL/SQL) into a language that is accepted in the databricks envirroment.
Question:
I am new to the databricks field, I have seen that spark and Julia should work together but I donโt see any explicit teamups with Julia and Databricks. Does that mean I canโt use Julia in DataBricks or because spark is the underlying technology that is being used by Databricks I CAN use Julia in Databricks?if yes, what are you experiences with Databricks and Julia?
As far as I know, Databircks uses their own Spark cluster manager not available in the open source version, so Spark.jl is unlikely to work out of the box there. However, if you manage to link Spark.jl to the Databrickโs libraries instead of building the open source version, Iโd expect APIs to be compatible.
@AlexanderChen I know its a few years later, but TidierDB.jl now supports Databricks as a backend for querying. It works thru the rest api. further documentation can be found here
connecting and querying is as simple as:
instance_id = "string_id"
token "string_token"
warehouse_id = "e673cd4f387f964a"
con = connect(:databricks, instance_id, token, "DEMODB", "PUBLIC", warehouse_id)
# After connection is established, a you may begin querying.
@chain db_table(con, "mtcars") begin
@select(wt)
@mutate(test = wt *2)
@aside @show_query _
@collect
end
WITH cte_2 AS (
SELECT wt, wt * 2 AS test
FROM tidierdb.default.mtcars)
SELECT *
FROM cte_2
32ร2 DataFrame
Row โ wt test
โ Float64 Float64
โโโโโโผโโโโโโโโโโโโโโโโโโ
1 โ 2.62 5.24
2 โ 2.875 5.75
3 โ 2.32 4.64
4 โ 3.215 6.43
โฎ โ โฎ โฎ
29 โ 3.17 6.34
30 โ 2.77 5.54
31 โ 3.57 7.14
32 โ 2.78 5.56
24 rows omitted