Databricks and Julia

AlexanderChen · July 1, 2022, 12:48pm

hi,

for my job we are exploring the datalakehouse possibilities in DataBricks. Also we probably need to write new code (currently in PL/SQL) into a language that is accepted in the databricks envirroment.

Question:
I am new to the databricks field, I have seen that spark and Julia should work together but I don’t see any explicit teamups with Julia and Databricks. Does that mean I can’t use Julia in DataBricks or because spark is the underlying technology that is being used by Databricks I CAN use Julia in Databricks?if yes, what are you experiences with Databricks and Julia?

best,

dfdx · July 1, 2022, 9:35pm

As far as I know, Databircks uses their own Spark cluster manager not available in the open source version, so Spark.jl is unlikely to work out of the box there. However, if you manage to link Spark.jl to the Databrick’s libraries instead of building the open source version, I’d expect APIs to be compatible.

drizk1 · June 29, 2024, 9:34am

@AlexanderChen I know its a few years later, but TidierDB.jl now supports Databricks as a backend for querying. It works thru the rest api. further documentation can be found here

connecting and querying is as simple as:

instance_id = "string_id"
token "string_token"
warehouse_id = "e673cd4f387f964a"
con = connect(:databricks, instance_id, token, "DEMODB", "PUBLIC", warehouse_id)
# After connection is established, a you may begin querying.

@chain db_table(con, "mtcars") begin
   @select(wt)
   @mutate(test = wt *2)
   @aside @show_query _
   @collect
end


WITH cte_2 AS (
SELECT  wt, wt * 2 AS test
        FROM tidierdb.default.mtcars)  
SELECT *
        FROM cte_2
32×2 DataFrame
 Row │ wt       test    
     │ Float64  Float64 
─────┼──────────────────
   1 │   2.62     5.24
   2 │   2.875    5.75
   3 │   2.32     4.64
   4 │   3.215    6.43
  ⋮  │    ⋮        ⋮
  29 │   3.17     6.34
  30 │   2.77     5.54
  31 │   3.57     7.14
  32 │   2.78     5.56
         24 rows omitted

Topic		Replies	Views
[ANN] SparkSQL.jl release 1.1.0 Package Announcements	0	415	September 1, 2021
[ANN] SparkSQL.jl release 1.2.0 Package Announcements announcement	0	291	November 15, 2021
[ANN] SparkSQL.jl release 1.0.0 Package Announcements	2	690	June 19, 2021
[ANN] SparkSQL.jl release 1.3.0 Package Announcements announcement	0	515	December 9, 2021
Connectiong to the Hive metastore on hdfs using Hive.jl or Spark.jl General Usage question	30	3643	April 18, 2019

Databricks and Julia

Related topics