Setting up Julia on Spark on AWS EMR

The log you posted points out that the following command launched via the build script fails:

mvn clean package -Dspark.version=2.4.7 -Dscala.version=2.11.12 -Dscala.binary.version=2.11

Can you run this command directly from the <Spark.jl Dir>/jvm/sparkjl directory and post the result?

Yea, based on this error looks like there’s some issue with outbound access to the maven repo?

[WARNING] Could not transfer metadata org.apache.maven.plugins:maven-source-plugin/maven-metadata.xml from/to central ( transfer failed for

Not sure why that would be, though, since I’m guessing the other downloads worked fine?

Hi @dacort , @dfdx ,

I had to create the settings.xml file in the .m2 folder in my home directory so as to allow access to maven through artifactory directory of my company. This worked fine. Now, I am facing another issue while executing the below command:

julia -e ‘using Pkg;Pkg.add(Pkg.PackageSpec(;name=“Spark”, version=“0.5.1”));using Spark;Spark.init();sc = SparkContext(master=“yarn”);sc.parallelize([1,2,3,4])’

Please find the below error:

ERROR spark.SparkContext: Failed to add /home/hadoop/.julia/packages/Spark/9bsuG/src/…/jvm/sparkjl/target/sparkjl-0.1.jar to Spark environment Jar /home/hadoop/.julia/packages/Spark/9bsuG/src/…/jvm/sparkjl/target/sparkjl-0.1.jar not found
at org.apache.spark.SparkContext.addJarFile$1(SparkContext.scala:1874)
at org.apache.spark.SparkContext.addJar(SparkContext.scala:1902)
ERROR: type SparkContext has no field parallelize

Could you please let me know what can be done about this?

It looks the JAR file hasn’t been created. Please repeat the build and verify the file is actually there.