Here are some more details on the errors you’ve mentioned:
File “/usr/bin/hdp-select”, line 205
print "ERROR: Invalid package - " + name
This is not part of Spark.jl or even Spark itself, but instead of Hadoop installation you use - Hortonworks / Apache Ambari. I don’t now much about their stack, but it seems like they still use Python 2 for their scripts, while default environment executable is Python 3. Although Spark.jl is not related to PySpark, the following environment variables may help:
PYSPARK_DRIVER_PYTHON=python2 PYSPARK_PYTHON=python2
Another option is to create virtualenv / conda env with Python 2 as default.
It’s also very possible that this error doesn’t actually prevent you from running the code, but only pollutes the log, so I’d start from checking other things first.
org.apache.spark.SparkException: Detected yarn cluster mode, but isn’t running on a cluster. Deployment to YARN is not supported directly by SparkContext. Please use spark-submit.
Mea culpa, I totally forgot we don’t support cluster mode. Maybe one day when Spark.jl is integrated into the main Spark distribution we will be able to launch it on server using spark-submit
, but it’s not something which is going to happen anywhere soon.
On the good side, in most cases you shouldn’t notice any difference between cluster and client mode.
Container exited with a non-zero exit code 1
This is always the final error that just tells that something went wrong. The actual cause is described somewhere higher in the log.
This actually makes a connection, launches the sparkui, but doesn’t register as a connection on the platform.
Yep, this is exactly the expected result - local
executor helped to distinguish between Spark.jl issues and issues connecting to YARN.
As a solution to all previous issues, please try:
sess = SparkSession(master="yarn-client", enable_hive_support=true)
Spark.jl will try to connect to YARN in client mode (it should automatically read YARN address from the system configuration) and to Hive metastore (using provided hive-site.xml
). Please let us know if this is enough to read data from Hive tables.