Deploying julia packages once in the cloud (google dataflow) (pipeline)

Hi everyone!

I’m trying to run my julia model in google dataflow. The issue is that I have to install Julia(>=v0.5.2) by an script of python (language supported in GCP) via subprocess statements because I need that the VMs I’m running each time install Julia language, givent the fact that after tu running of cloud process everything is deleted. The installation goes well so I have Julia(>=v0.5.2) installed in the cloud system. Nevertheless, I have to use certain Julia packages (as ODBC, DataFrames, CSV, Distributions,…) to run my model.
BUT, as I have to install julia every time I run the program in the cloud, I’m not able to use Pkg.add("package_name") and as result, I can not run my model beacuse packages are not available.

I have looked fo some solutions and I have found that there is the option of modifying thejuliarc.jl file, and to run it before the run of my model. However, I haven’t managed to make it work yet.

Any other possibility? Is there the possibility of writting a piece of code on the begging of my model(.jl file) so that it can be used in the following lines? Maybe including to the script run(Pkg.add("pkg_name") (not working)?

Thanks you,
Guillem

What version of Julia are you using? Can you post the errors you get?

I’m using v0.5.2, whereas, we are working in adapting code to v0.6.2

The errors I’m obtaining are:
ERROR: LoadError: ArgumentError: Module ODBC not found in current path. Run Pkg.add(“ODBC”) to install the ODBC package. in require(::Symbol) at ./loading.jl:365 in include_from_node1(::String) at ./loading.jl:488 in process_options(::Base.JLOptions) at ./client.jl:265 in _start() at ./client.jl:321 while loading /usr/local/lib/python2.7/dist-packages/config/julia.jl, in expression starting on line 21

To contextualize a little bit more:
I’m working with Google Cloud Dataflow, more precisely with Python SDK. So, I have created a Pipeline which takes data as input, a ParDo(Python coded) transofrms data and this data is used in de model.jl
To be able to use my scripts Julia in the cloud:

  1. I have installed Julia via a ParDo function of my pipeline (using subprocess with curl, tar, cp,…)
s = Popen(['curl', 'https://julialang-s3.julialang.org/bin/linux/x64/0.5/julia-0.5.2-linux-x86_64.tar.gz',
                   '-o', '/tmp/julia.tar.gz'], stdin=PIPE, stdout=PIPE, bufsize=1)
        logging.info("***********************")
        logging.info(s.stdout.read())
        logging.info("***********************")

        s = Popen(['tar', 'xzf', '/tmp/julia.tar.gz', '-C', '/tmp'],
                  stdin=PIPE, stdout=PIPE, bufsize=1)
        logging.info("***********************")
        logging.info(s.stdout.read())
        logging.info("***********************")
  1. I have uploaded all my Julia files and libraries via the creation of a Python Package and using setup.py to set up my VM python environnement.
  2. I need to be able to use Julia Packages in the cloud (where each time I run my model a new VM turns on (and installs Julia) and each time models turns off all installatin deseapears). However, I’m not getting to install the Julia Pacakges within the Julia Installation process, so I can’t not use the model)

I hope it’s now clearer.
Guillem

OK. The 0.5 release isn’t supported anymore, so you’ll probably get better results with 0.6. However, the error you get tells to run Pkg.add("ODBC"), which is the obvious next step.

Yes, I know that the thing to be done is Pkg.add("package_name"), but the issue is not that. Is to be able to do that in the cloud platform and in a Python environment. And that is what I’m trying to resolve, but I’m not getting at it.
I have read of juliarc.jl file and LOAD_PATH variable, so everything related to making an startup file and then be able to run:
Popen['julia', '--startup-file = yes', '--','startupfile.jl', 'arg1', 'arg2', ...] but I’ dont know how put the package addition in the startupfile.jl script to make it work when running a VM in the cloud and installing Julia.
Thanks

So what you’re saying is that Pkg.add(...) won’t work if you put it inside startupfile.jl? You can also clone the needed packages manually using git, but that’s going to be tedious (need to handle all dependencies).

Yes, I had also thought about the manually tedious way of doing it, it’s just not really well in terms of efficiency.
Concerning Pkg.add(...) with version 0.5.2 I’m getting the error:
ERROR: LoadError: GitError(Code:ERROR, Class:Net, SSL error: error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version)
So it is a TLS error which TLS1.0 I’ve read it’s no longer supported when accessing to Git (but I’m working on solving it as well, just in case).
I know there are articles like:

and others. But I was looking to avoid that way if possible