JavaCallError when using Spark.jl

taotree · June 17, 2023, 6:17pm

When running using Spark:

julia> using Spark
ERROR: InitError: JavaCall.JavaCallError("JULIA_COPY_STACKS should not be set on Windows.")
Stacktrace:
  [1] assertroottask_or_goodenv
    @ C:\Users\joel\.julia\packages\JavaCall\MlduK\src\jvm.jl:233 [inlined]
  [2] _init(opts::Vector{String})
    @ JavaCall C:\Users\joel\.julia\packages\JavaCall\MlduK\src\jvm.jl:285
  [3] init()
    @ JavaCall C:\Users\joel\.julia\packages\JavaCall\MlduK\src\jvm.jl:277
  [4] init(; log_level::String)
    @ Spark C:\Users\joel\.julia\packages\Spark\89BUd\src\init.jl:56
  [5] init
    @ C:\Users\joel\.julia\packages\Spark\89BUd\src\init.jl:16 [inlined]
...
during initialization of module Spark

But then if I run using Spark again, it goes through. When trying to use it:

julia> spark = SparkSession.builder.appName("Main").master("local").getOrCreate()
ERROR: JavaCall.JavaCallError("Class Not Found org/apache/spark/sql/SparkSession")
Stacktrace:
 [1] _metaclass
   @ C:\Users\joel\.julia\packages\JavaCall\MlduK\src\core.jl:383 [inlined]
 [2] metaclass(class::Symbol)
   @ JavaCall C:\Users\joel\.julia\packages\JavaCall\MlduK\src\core.jl:389
 [3] jcall(::Type{JavaCall.JavaObject{Symbol("org.apache.spark.sql.SparkSession")}}, ::String, ::Type, ::Tuple{})
   @ JavaCall C:\Users\joel\.julia\packages\JavaCall\MlduK\src\core.jl:225
 [4] getproperty(#unused#::Type{SparkSession}, prop::Symbol)
   @ Spark C:\Users\joel\.julia\packages\Spark\89BUd\src\session.jl:48
 [5] top-level scope
   @ REPL[4]:1

I have jdk 11 and mvn in path. JAVA_HOME set to jdk 11.

Side question… Will Spark.jl allow me to read a parquet file from hdfs running in kubernetes (and with the spark master running in kubernetes if that’s needed)?

dfdx · June 18, 2023, 12:41pm

Usually, it’s a JavaCall.jl issue, see Major TODOs and Caveats secion from its README. Spark.jl doesn’t set this variable, so it might be set by your environment. Try checking it in a fresh Julia session:

ENV["JULIA_COPY_STACKS"]

If you are running Windows, it should not be set.

Will Spark.jl allow me to read a parquet file from hdfs running in kubernetes (and with the spark master running in kubernetes if that’s needed)?

Parquet files are certainly supported, HDFS should not be a problem too since we only proxy Java methods to Julia. I’m not so sure about Kubernetes. Spark.jl does not support server modes (e.g. yarn-server), but client modes should work fine. You can always try and report issues if any

Topic		Replies	Views
Error while parallelizing Julia at Scale spark , ijulia	0	388	January 4, 2022
Connectiong to the Hive metastore on hdfs using Hive.jl or Spark.jl General Usage question	30	3677	April 18, 2019
Setting up Julia on Spark on AWS EMR Julia at Scale question , spark , ijulia	24	2423	January 4, 2022
Compatibility across Julia 1+ versions General Usage	54	4006	October 26, 2019
[ANN] SparkSQL.jl release 1.2.0 Package Announcements announcement	0	291	November 15, 2021

JavaCallError when using Spark.jl

Related topics