JavaCallError when using Spark.jl

When running using Spark:

julia> using Spark
ERROR: InitError: JavaCall.JavaCallError("JULIA_COPY_STACKS should not be set on Windows.")
Stacktrace:
  [1] assertroottask_or_goodenv
    @ C:\Users\joel\.julia\packages\JavaCall\MlduK\src\jvm.jl:233 [inlined]
  [2] _init(opts::Vector{String})
    @ JavaCall C:\Users\joel\.julia\packages\JavaCall\MlduK\src\jvm.jl:285
  [3] init()
    @ JavaCall C:\Users\joel\.julia\packages\JavaCall\MlduK\src\jvm.jl:277
  [4] init(; log_level::String)
    @ Spark C:\Users\joel\.julia\packages\Spark\89BUd\src\init.jl:56
  [5] init
    @ C:\Users\joel\.julia\packages\Spark\89BUd\src\init.jl:16 [inlined]
...
during initialization of module Spark

But then if I run using Spark again, it goes through. When trying to use it:

julia> spark = SparkSession.builder.appName("Main").master("local").getOrCreate()
ERROR: JavaCall.JavaCallError("Class Not Found org/apache/spark/sql/SparkSession")
Stacktrace:
 [1] _metaclass
   @ C:\Users\joel\.julia\packages\JavaCall\MlduK\src\core.jl:383 [inlined]
 [2] metaclass(class::Symbol)
   @ JavaCall C:\Users\joel\.julia\packages\JavaCall\MlduK\src\core.jl:389
 [3] jcall(::Type{JavaCall.JavaObject{Symbol("org.apache.spark.sql.SparkSession")}}, ::String, ::Type, ::Tuple{})
   @ JavaCall C:\Users\joel\.julia\packages\JavaCall\MlduK\src\core.jl:225
 [4] getproperty(#unused#::Type{SparkSession}, prop::Symbol)
   @ Spark C:\Users\joel\.julia\packages\Spark\89BUd\src\session.jl:48
 [5] top-level scope
   @ REPL[4]:1

I have jdk 11 and mvn in path. JAVA_HOME set to jdk 11.

Side question… Will Spark.jl allow me to read a parquet file from hdfs running in kubernetes (and with the spark master running in kubernetes if that’s needed)?

Usually, it’s a JavaCall.jl issue, see Major TODOs and Caveats secion from its README. Spark.jl doesn’t set this variable, so it might be set by your environment. Try checking it in a fresh Julia session:

ENV["JULIA_COPY_STACKS"]

If you are running Windows, it should not be set.

Will Spark.jl allow me to read a parquet file from hdfs running in kubernetes (and with the spark master running in kubernetes if that’s needed)?

Parquet files are certainly supported, HDFS should not be a problem too since we only proxy Java methods to Julia. I’m not so sure about Kubernetes. Spark.jl does not support server modes (e.g. yarn-server), but client modes should work fine. You can always try and report issues if any :slight_smile: