Setting up Julia on Spark on AWS EMR

dfdx · July 29, 2021, 8:26pm

The log you posted points out that the following command launched via the build script fails:

mvn clean package -Dspark.version=2.4.7 -Dscala.version=2.11.12 -Dscala.binary.version=2.11

Can you run this command directly from the <Spark.jl Dir>/jvm/sparkjl directory and post the result?

dacort · July 29, 2021, 10:49pm

Yea, based on this error looks like there’s some issue with outbound access to the maven repo?

[WARNING] Could not transfer metadata org.apache.maven.plugins:maven-source-plugin/maven-metadata.xml from/to central (https://repo.maven.apache.org/maven2): transfer failed for https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-source-plugin/maven-metadata.xml

Not sure why that would be, though, since I’m guessing the other downloads worked fine?

Sumit_Malbari · August 2, 2021, 9:15pm

Hi @dacort , @dfdx ,

I had to create the settings.xml file in the .m2 folder in my home directory so as to allow access to maven through artifactory directory of my company. This worked fine. Now, I am facing another issue while executing the below command:

julia -e ‘using Pkg;Pkg.add(Pkg.PackageSpec(;name=“Spark”, version=“0.5.1”));using Spark;Spark.init();sc = SparkContext(master=“yarn”);sc.parallelize([1,2,3,4])’

Please find the below error:

ERROR spark.SparkContext: Failed to add /home/hadoop/.julia/packages/Spark/9bsuG/src/…/jvm/sparkjl/target/sparkjl-0.1.jar to Spark environment
java.io.FileNotFoundException: Jar /home/hadoop/.julia/packages/Spark/9bsuG/src/…/jvm/sparkjl/target/sparkjl-0.1.jar not found
at org.apache.spark.SparkContext.addJarFile$1(SparkContext.scala:1874)
at org.apache.spark.SparkContext.addJar(SparkContext.scala:1902)
at org.apache.spark.api.java.JavaSparkContext.addJar(JavaSparkContext.scala:701)
ERROR: type SparkContext has no field parallelize

Could you please let me know what can be done about this?

Thanks and Regards,
Sumit Malbari

dfdx · August 3, 2021, 12:26pm

It looks the JAR file hasn’t been created. Please repeat the build and verify the file is actually there.

Sumit_Malbari · January 4, 2022, 3:13pm

hi @dfdx , @dacort

I built the spark and its showing success, but when I am trying to run below command, I am getting below error:

julia> text = parallelize(sc, [“hello world”, “the world is one”, “we are the world”])
Exception in thread “main” java.lang.NoSuchMethodError: org.apache.spark.internal.Logging.init(Lorg/apache/spark/internal/Logging;)V
at org.apache.spark.api.julia.JuliaRDD$.(JuliaRDD.scala:67)
at org.apache.spark.api.julia.JuliaRDD$.(JuliaRDD.scala)
at org.apache.spark.api.julia.JuliaRDD.readRDDFromFile(JuliaRDD.scala)
ERROR: JavaCall.JavaCallError(“Error calling Java: java.lang.NoSuchMethodError: org.apache.spark.internal.Logging.$init$(Lorg/apache/spark/internal/Logging;)V”)
Stacktrace:

Anyidea on this issue?

Thanks and Regards,
Sumit Malbari

Topic		Replies	Views
Connectiong to the Hive metastore on hdfs using Hive.jl or Spark.jl General Usage question	30	3636	April 18, 2019
Julia AWS.jl General Usage	4	800	June 12, 2019
[ANN] SparkSQL.jl release 1.2.0 Package Announcements announcement	0	290	November 15, 2021
Julia on free tier cloud instances General Usage web , cloud-computing	4	645	March 8, 2021
[ANN] SparkSQL.jl release 1.1.0 Package Announcements	0	413	September 1, 2021

Setting up Julia on Spark on AWS EMR

Related topics