Embedding Julia into Java

Hello,

I decided to check possibility of calling Julia code from Java and prepared some draft binding library - https://github.com/rssdev10/julia4j

There are some simple but workable examples.

First question - how to assign and get values of variables directly without jl_eval_string()? Otherwise we have performance issue on code parsing/compiling.

Second question - is it possible to have isolated Julia engines in one process address space? I’m asking about it because accordingly to jsr233 there are more than one script containers, and they can be isolated from each other. It depends on programmer’s selection.

3 Likes

I have never called Julia from C or Java, but I feel that the topic is important enough to try to answer.

From Embedding Julia part of the documentation it seems like you can call any Julia function directly using jl_get_function() and jl_callN(), see an example.

As for several isolated Julia engines in a single process, it seems like it’s not possible as of today, but maybe what you want to achieve is possible to do using other means, e.g. multiple Julia threads or additional processes?

1 Like

Ok, thanks. I will try to explain one of Java embedding use cases.

There is a product for visual analytics - https://www.knime.com/. Example of workflow see https://github.com/rssdev10/ruby4knime (this example because of it is a scripting language embedding case too)
workflow

Each node of the workflow is a data processor. Actually there are two types of custom code nodes - script and snippet. In case of script the input/output are tables. In case of snippet the input/output are rows of tables (at least from user’s point of view). Ruby code examples you can see by the link above.

Regarding embedding. When we are calling something from embedded language we have to transfer some initial data. Typical way for Java jsr223 container is just transfer data through a map with pairs var_name : value. Internally it should be transformed in some data structures of embedded language. For now I used simple way:

        final Bindings bindings = context.getBindings(ScriptContext.ENGINE_SCOPE);
        if (bindings != null) {
            StringBuilder builder = new StringBuilder();
            bindings.entrySet().forEach(entry -> {
                String formatStr;
                // TODO: add direct write of values without text transformations
                if (entry.getValue() instanceof Number) {
                    formatStr = "%s = %s;\n";
                } else {
                    formatStr = "%s = \"%s\";\n";
                }
                builder.append(String.format(formatStr, entry.getKey(), entry.getValue()));
            });
            if (builder.length() > 0) {
                Julia4J.jl_eval_string(builder.toString());
            }
}

That is, take Java values, transform them into a string expression, send to Julia for compiling… It is too slow…
And I’m creating here global variables, so when there are concurrent engines, we have name intersection here. Therefore I asked about container isolation. I don’t see simple decision here for common case.

Going back to KNIME. If we are implementing per node execution, we can isolate custom code by wrapping it into a Julia function. In that case we can transfer mentioned above variables as arguments of some wrapping function the name of which is unique for each node. But already if we can directly call that function, again, we have an issue how to transfer arguments fast without a textual processing. Imagine, we can have input matrix with dozens of columns and millions of records. So, this is not a way for data transferring.

Looking at

jl_value_t *jl_call(jl_function_t *f, jl_value_t **args, int32_t nargs)

If we are wrapping a call from Java into a Julia function like (hypothetically for now):

function engine_12345(a::Integer64, b::Float64, c::Float64, d::Float64)
  # put custom code here which is using the arguments 
  (a + b) / (c + d)
end

we can do direct jl_call(engine_12345, …).

The question is how to pack arrays and matrices properly… I see some non documented functions but now sure about their usage:

JL_DLLEXPORT jl_svec_t *jl_svec(size_t n, ...) JL_MAYBE_UNROOTED;
JL_DLLEXPORT jl_svec_t *jl_svec1(void *a);
JL_DLLEXPORT jl_svec_t *jl_svec2(void *a, void *b);
JL_DLLEXPORT jl_svec_t *jl_alloc_svec(size_t n);
JL_DLLEXPORT jl_svec_t *jl_alloc_svec_uninit(size_t n);
JL_DLLEXPORT jl_svec_t *jl_svec_copy(jl_svec_t *a);
JL_DLLEXPORT jl_svec_t *jl_svec_fill(size_t n, jl_value_t *x);

And a very big question how it will be efficient in case of real data with Java->Julia->Java transformations… In case of Ruby4KNIME mentioned above the answer is simple. JRuby has direct access to KNIME’s Java objects. So there are no transformations in most cases.

And there is other issue here - processing of stack traces. If we are embedding Julia into some IDE and providing a way to run custom code, we must provide correct information about fails including function names and line number. Jsr 223 doesn’t speak about it. In most cases it is a question of external Java code which gets stderror stream and processes results. So, it is doable but not so easy…

In general, if we are speaking about KNIME, there are two phases of a node running - configuring and execution. Configuring phase is running one time on any configuration changes. This phase can be used for precompiling of any wrapping code. Execution phase is used for real data processing. When there is a loop in KNIME workflow, it will be happened multiple times. So it should be fast.

At the same time, when we are speaking about common case of Java embedding, the mentioned above case with wrapping a custom code inside some hidden Julia function with unique name, is not a good way. We can’t say how many times a Java programmer will call engine.eval(…) function. But he can expect that any global (in his understanding) variables and defined types/functions will be visible after previous eval calls. I don’t see how to make it without Julia engine isolation.

Regarding multiprocess running of Julia engine in separate Java processes. I’m really not ready to write new Mesos, Hadoop Yarn, Flink or something like this one :slight_smile: . Not sure that it is good solution.
In case of running of a cluster by Julia methods, I don’t have enough experience to speak about it. May be it is doable but should be hidden from a user of Julia4J.

1 Like