Compatibility across Julia 1+ versions

Hm, I tried it on a few different machines and I did not get segfaults.

What do the Xmx128M options do? Just for the hell of it, can you see if it crashes without those? (only thing I did differently)

1 Like

On Ubuntu, without the “-Xmx128M”, still crashes.

On OSX 10.14.5, I get the segmentation fault but Julia doesn’t not crash. JavaCall still doesn’t work.

Factoid: On OSX, it’s Oracle’s JDK not OpenJDK

I tried switching to OpenJDK 13 on OSX. I got the same result. As I stated earlier, the error is thrown at jvm.jl line 177, which is a ccall.

Let me say this: I am setting the environment variable in start.jl like so
ENV["JULIA_COPY_STACKS"] = 1
I can’t imagine that it would be a problem, but just in case.

Actually, I think that is a problem. The environment variable gets parsed in C before Julia initializes. Can you try it by actually setting the variable in the shell?

Can confirm here!

Here is how I test it using OpenJDK docker image; $PWD has the Julia binary.

 docker run -it --rm -v"$PWD":/home openjdk:11.0.2 /bin/bash
  1. I did the ENV["JULIA_COPY_STACKS"]=1, it seg fault.

  2. Set environment variable this way, it seg fault.

    JULIA_COPY_STACKS=1
    
  3. Set environment variable this way, JVM initialized!

    export JULIA_COPY_STACKS=1
    

Thanks for the insight ExpandingMan. I did not realize the bit about “parsed in C…”. I can now call Java successfully. I still get the segmentation fault however, but it is no longer fatal.

Where are you still getting a segfault? That’s still potentailly concerning. I know some of the JavaCall.jl tests fail, but I didn’t think any were segfaults. Can you confirm that most tests pass for you when you do ]test JavaCall? Should only see a few faillures.

I ran the test as suggested.
Let me tell you the tale of two operating systems.

Ubuntu 19.04
OpenJDK = 11.0.4
Julia = 1.3.0-rc2.0

and

JULIA_COPY_STACKS=1

We get a clean go.

OSX = 10.14.5
OpenJDK = 13
Julia = 1.3.0-rc2.0

and

JULIA_COPY_STACKS=1

We get the seg fault. We should also recall that this same seg fault would occur with…

OSX = 10.14.5
Oracle JDK = 8
Julia = 1.0

…and was routinely ignored.

Thanks for the help. I can now at least make progress.

For anyone interested, I just made a PR for JDBC.jl that gets it working in 1.3 with Tables.jl (thanks to @quinnj for that part).

We should be ready to tag this quite soon, and happily it seems that JDBC functionality will be fully restored in 1.3 (with JULIA_COPY_STACKS=1).

3 Likes

@ExpandingMan Thx for championing the cause and doing the PR for JDBC.

@StefanKarpinski NOTE exactly How Important Java support ergo JDBC support is – It’s VERY important in major businesses - in fact its #1 per this >> TIOBE Index - TIOBE

… also consider the following …
— Python talks to Spark AND Hadoop. !!
— Python ALSO has APIs for NEWER NoSQL databases

When making arguments like this, please note that in most open source communities, it is implicitly assumed that if A is very important to B, especially if they use it to make money, B will fix A, or pay someone to do it.

The corollary is that if they don’t, it is not that important.

9 Likes

The farther we (as human beings) can get away from the JVM with regards to data science the happier I will be. Spark is a hot mess. Anyone saying anything about python supporting spark via PySpark or whatever makes me inquire to their mental wellbeing. I’d love to see a graph of the frequency of curse words stated per day after switching to PySpark at a previous job. I couldn’t even begin to describe how finicky and awful writing pyspark statements is compared to something like distributed JuliaDB or similar technologies in python or R or almost anything else. The main advantage I’ve seen is it has been widely adopted, and is pretty stable (when it works and the functionality is properly documented).

That being said I do appreciate this effort because having JVM interop is crucial for lots of projects. Sorry to make this post - I just have post traumatic spark disorder, and needed to vent.

3 Likes

I think it has a use case, but the fixed cost of getting it to work reliably is quite large. People tend to underestimate the scale of projects where investing in the whole stack required for Spark starts to make sense.

2 Likes

It’s basically difficult economics problems. B maybe important for many business but no single business want to pay for it on their own. The solutions a company like Julia computing which can charge each business that want somethibg a small price. But business have alternatives and are risk averse so they don’t pay and instead use something else.