CoreNLP - Base Name Conflict

Andrei,

I’ve considered responding on this thread occasionally, but your explanations have been much more detailed and comprehensive than I could ever achieve. So thank you for the entire thread. I’ll try and incorporate some of this into JavaCall’s documentation.

For everyone reading, the one thing from Andrei’s response I would highlight is this:

This is exactly what makes using JavaCall mindbending sometimes, so it helps to have a clear idea of types/classes when working with it. I’ve tried to make it as transparent as possible, but obviously open to suggestions to improve this.

Regards

Avik

1 Like

Seems like we will be able to improve not only documentation but also code itself - I’ve just tried to call Class.forName() myself and it failed although @jimport works fine. I believe it’s a classloader issue and it’s definitely not easy to figure out for occasional users. Once I manage with it, I will add something like class_for_name("...") function to ease instantiation of Class<T> objects.

1 Like

The help has been fantastic! Thank you!.

You are welcome! Actually, I pursue selfish goals: years of open source work showed that very often projects you help with pay back later - either you start using them directly or you get from them useful byproducts, or people involved in that projects start contributing to your own projects, and all of these are incredibly valuable!

Back to the story. I’ve just submitted a PR for class_for_name function. You can either wait until it’s accepted or just include the following code to your project for now:

const JThread = JavaObject{Symbol("java.lang.Thread")}
const JClassLoader = JavaObject{Symbol("java.lang.ClassLoader")}

function class_for_name(name::String)
    thread = jcall(JThread, "currentThread", JThread, ())
    loader = jcall(thread, "getContextClassLoader", JClassLoader, ())
    return jcall(JClass, "forName", JClass, (JString, jboolean, JClassLoader),
                 name, true, loader)
end

Given it, we can extend our previous example like this:

pipeline = StanfordCoreNLP(Dict("annotations" =>  
                                "tokenize, ssplit, pos, lemma, ner, parse, dcoref"))   
doc = Annotation("The Beatles were an English rock band formed in Liverpool in 1960.")
annotate!(pipeline, doc)
# create Class<SentenceAnnotationClass>
JSentencesAnnotationClass =        
     class_for_name("edu.stanford.nlp.ling.CoreAnnotations\$SentencesAnnotation") 
# finally, extract the list if sentences
jsentences = jcall(doc.jann, "get", JObject, (JClass,), JSentencesAnnotationClass) 

The next step would be to iterate over the returned list or get its individual elements. In Java syntax it’s for (Foo foo: fooCollection) syntax, but under the hood it boils down to a set of methods and Iterator interface (see this question for some details). Unfortunately, I’m running out of time and can’t explain it in more detail right now, but you still can ask any questions.

Also note that sometimes JNI throws NoSuchMethodError even though the call is correct according to both - Java docs and JavaCall.listmethods. I encountered several such methods during work on Spark.jl and still don’t know exact reason for them. If you see one, better post it here for validation first, and, if it’s really not callable, I’ll explain how to overcome it using custom Java code.

1 Like

So, as a next step I tried to extract a single sentence from jsentences using get. First, I double checked jsentence’s type:

julia> typeof(jsentences)
JavaCall.JavaObject{Symbol("java.lang.Object")}

It’s a JObject. Then, I checked it has the get method.

julia> JavaCall.listmethods(jsentences, "get")
1-element Array{JavaCall.JavaObject{Symbol("java.lang.reflect.Method")},1}:
 java.lang.Object get(int)

get takes a single int argument (I guess Int in Julia) and returns a JObject. I assume the int argument indexes the sentences in jsentence. With that setup, I tried the following jcall

julia> jcall(jsentences, "get", JObject, (Int,), 1)
Exception in thread "main" java.lang.NoSuchMethodError: get
ERROR: Error calling Java: java.lang.NoSuchMethodError: get
Stacktrace:
 [1] geterror(::Bool) at /home/john/.julia/v0.6/JavaCall/src/core.jl:265
 [2] jcall(::JavaCall.JavaObject{Symbol("java.lang.Object")}, ::String, ::Type{T} where T, ::Tuple{DataType}, ::Int64, ::Vararg{Int64,N} where N) at /home/john/.julia/v0.6/JavaCall/src/core.jl:138

Lots of information online about Exception in thread "main" java.lang.NoSuchMethodError but none that I could understand or use in the present context. So…

  • Is the jcall correct?

  • How to get past the NoSuchMethodError?

Thanks again!

Ah, I expected you to meet one of these weird errors, but not so quickly… The easiest way to fix it is to write a bit of Java code that wrap functions that don’t work, but I wonder if accessing elements in a simpler collection would work. Could you please try the following?

JArrayList = @jimport java.util.ArrayList
ja = JArrayList(())
jcall(ja, "get", JObject, (jint,), 0)
julia> JArrayList = @jimport java.util.ArrayList
JavaCall.JavaObject{Symbol("java.util.ArrayList")}

julia> ja = JArrayList(())
JavaCall.JavaObject{Symbol("java.util.ArrayList")}(Ptr{Void} @0x0000000004e46798)

julia> jcall(ja, "get", JObject, (jint,), 0)
Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
	at java.util.ArrayList.rangeCheck(ArrayList.java:653)
	at java.util.ArrayList.get(ArrayList.java:429)
ERROR: Error calling Java: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
Stacktrace:
 [1] geterror(::Bool) at /home/john/.julia/v0.6/JavaCall/src/core.jl:265
 [2] _jcall(::JavaCall.JavaObject{Symbol("java.util.ArrayList")}, ::Ptr{Void}, ::Ptr{Void}, ::Type{T} where T, ::Tuple{DataType}, ::Int64, ::Vararg{Int64,N} where N) at /home/john/.julia/v0.6/JavaCall/src/core.jl:223
 [3] jcall(::JavaCall.JavaObject{Symbol("java.util.ArrayList")}, ::String, ::Type{T} where T, ::Tuple{DataType}, ::Int64, ::Vararg{Int64,N} where N) at /home/john/.julia/v0.6/JavaCall/src/core.jl:139

This one is good - it tells that method get has been called and only failed because we tried to access 0th element of 0-length array.

And now some magic of JVM:

...
jsentences = jcall(doc.jann, "get", JObject, (JClass,), JSentencesAnnotationClass)

# what is the actual class of returned object? 
# note: this syntax is equivalent to JavaCall.getname(JavaCall.getclass(jsentences)) 
jsentences |> JavaCall.getclass |> JavaCall.getname  
# ==> "java.util.ArrayList" 

# call .get()
jcall(jsentences, "get", JObject, (jint,), 0)
# ==> java.lang.NoSuchMethodError: get

# convert ArrayList to... well... ArrayList, and try again
JArrayList = @jimport java.util.ArrayList
jsentences2 = convert(JArrayList, jsentences)
jcall(jsentences2, "get", JObject, (jint,), 0)
sent0 = jcall(jsentences2, "get", JObject, (jint,), 0)
# ==> works fine! 

The reason is that document.get() has a signature that returns java.lang.Object, and jcall returns JavaObject{:java.lang.Object}. However, method actually returns an object of type java.lang.ArrayList (which inherits from java.lang.Object). I believe (need to check an implementation) that subsequent jcall thinks that it should call Object.get(int) because it’s what written in JavaObject{...}, but instead it should call ArrayList.get(int).

I’ll check implementation and maybe submit another PR to JavaCall, but even with the current master you can run:

JProperties = @jimport java.util.Properties
JStanfordCoreNLP = @jimport edu.stanford.nlp.pipeline.StanfordCoreNLP
JAnnotation = @jimport edu.stanford.nlp.pipeline.Annotation
JArrayList = @jimport java.util.ArrayList
JTree = @jimport edu.stanford.nlp.trees.Tree
JSemanticGraph = @jimport edu.stanford.nlp.semgraph.SemanticGraph


JSentencesAnnotationClass = classforname("edu.stanford.nlp.ling.CoreAnnotations\$SentencesAnnotation")
JTreeClass = classforname("edu.stanford.nlp.trees.Tree")
JCollapsedCCProcessedDependenciesAnnotationClass =
    classforname("edu.stanford.nlp.semgraph.SemanticGraphCoreAnnotations\$CollapsedCCProcessedDependenciesAnnotation")

# example from https://stanfordnlp.github.io/CoreNLP/api.html
pipeline = StanfordCoreNLP(Dict("annotations" =>
                                "tokenize, ssplit, pos, lemma, ner, parse, dcoref"))
doc = Annotation("The Beatles were an English rock band formed in Liverpool in 1960.")
annotate!(pipeline, doc)


jsentences = jcall(doc.jann, "get", JObject, (JClass,), JSentencesAnnotationClass)
jsentences = convert(JArrayList, jsentences)

sent0 = jcall(jsentences, "get", JObject, (jint,), 0)
sent0 = convert(JAnnotation, sent0)

tree = jcall(sent0, "get", JObject, (JClass,), JTreeClass)
# tree = convert(JTree, tree)   # tree is null, IIUC, corresponding annotator didn't run on that sentence

semgraph = jcall(sent0, "get", JObject, (JClass,), JCollapsedCCProcessedDependenciesAnnotationClass)
semgraph = convert(JSemanticGraph, semgraph)
1 Like

Actually, it makes sense to keep the current behavior since it plays well with what Java does:

Map<String, Object> map = new HashMap<>();
map.put("foo", new Foo());
map.put("bar", new Bar());

Object fooAsObject = map.get("foo");
Foo foo = (Foo) fooAsObject;
# which is usually shortened in Java as: Foo foo = (Foo) map.get("foo");

fooAsObject.fooMethod();   # error!
foo.fooMethod();           # OK

and corresponding code in Julia:

map = JHashMap(())
jcall(map, "put", JObject, (JString, JFoo), "foo", JFoo(()))
jcall(map, "put", JObject, (JString, JBar), "bar", JBar(()))

foo_as_obj = jcall(map, "get", JObject, (JString,), "foo")
foo = convert(JFoo, foo_as_obj)

jcall(foo_as_obj, "fooMethod", Void, ())    # error! `foo_as_obj` is a JavaObject{:java.lang.Object}, not JavaObject{:Foo}
jcall(foo, "fooMethod", Void, ())           # OK
1 Like

The magic worked and that work flow works fine for me through and including sent0 = ...Funny that you have to cast jsentences to ArrayList even though it already is one, nominally.

Next problem: classforname is not working for me. I ran Pkg.update() and restarted Julia. Then–

using JavaCall

[usual warnings about deprecated syntax]
Loaded /usr/lib/jvm/default-java/jre/lib/amd64/server/libjvm.so

julia> classforname
ERROR: UndefVarError: classforname not defined

Other JavaCall types and methods run ok–

julia> jint
Int32

julia> getname
getname (generic function with 2 methods)

I believe you committed some new code for classforname, @dfdx? Do I need to do something beyond Pkg.update()?

Funny that you have to cast jsentences to ArrayList even though it already is one, nominally.

Java object is indeed an ArrayList, but in Julia it’s wrapped into JavaObject{:java.lang.Object}, not JavaObject{:java.uril.ArrayList}. Julia in this case behaves exactly like Java compiler - they both only know that document.get() returns an java.lang.Object (or something inherited from it), but not an exact type. Dynamic casting (using (MyClass) obj in Java or convert(MyClass, obj)) make information about real type of an object explicit.

Note, that casting may fail if your assumption about object type is wrong. E.g. if you ask for one kind of annotation and then try to convert it to another kind, both Java and Julia will fail on dynamic cast.

Next problem: classforname is not working for me. I ran Pkg.update() and restarted Julia. Then–

Try (from Julia REPL)

Pkg.checkout("JavaCall")

or (from command line)

cd ~/.julia/v0.6/JavaCall
git checkout master
git pull

Pkg.update() only updates packages to their latest registered version. Very recent changes may still be unpublished and live in master branch only. Both methods above checkout latest master for the specified package.

1 Like

Yes, that did it. Thanks!

Summarizing what I’ve learned here for critique, feedback and for the benefit of a future reader. I’ve added an example of extracting data from tokens using the JCoreLabel type and jcall.

using JavaCoreNLP  #Julia interface to Stanford CoreNLP
using JavaCall     #Julia interface to Java

#Import types from Java
#The @jimport macro returns a type, not an object. 
JProperties = @jimport java.util.Properties  #Seems not to be necessary
JArrayList = @jimport java.util.ArrayList

#Import types from Stanford NLP.
JStanfordCoreNLP = @jimport edu.stanford.nlp.pipeline.StanfordCoreNLP
JAnnotation = @jimport edu.stanford.nlp.pipeline.Annotation
JTree = @jimport edu.stanford.nlp.trees.Tree
JSemanticGraph = @jimport edu.stanford.nlp.semgraph.SemanticGraph
JCoreLabel = @jimport edu.stanford.nlp.ling.CoreLabel

#Use JavaCall's 'classforname' to get the class names we will need.
JSentencesAnnotationClass = classforname("edu.stanford.nlp.ling.CoreAnnotations\$SentencesAnnotation")
JTokensAnnotationClass = classforname("edu.stanford.nlp.ling.CoreAnnotations\$TokensAnnotation")
JTreeClass = classforname("edu.stanford.nlp.trees.Tree")
JCollapsedCCProcessedDependenciesAnnotationClass = classforname("edu.stanford.nlp.semgraph.SemanticGraphCoreAnnotations\$CollapsedCCProcessedDependenciesAnnotation")

#Construct a StanfordCoreNLP object using the desired annotators.  
#List of annotators is here: https://stanfordnlp.github.io/CoreNLP/annotators.html
#This throws some warnings and can take about a minute the first time it runs.
pipeline = StanfordCoreNLP(Dict("annotations" =>
              "tokenize, ssplit, pos, lemma, ner, parse, dcoref"))

#Construct an Annotation object with the text to be analyzed.
doc = Annotation("I'm a lumberjack and I'm ok.  
                      I work all night and I sleep all day.")

#Perform the annotation and place the result in doc.jann.
annotate!(pipeline, doc)

#Use 'get' method to get the text broken into annotated sentences.
jsentences = jcall(doc.jann, "get", JObject, (JClass,), JSentencesAnnotationClass)
#Cast this from JavaObject{:java.lang.Object} to JavaObject{:java.uril.ArrayList}
jsentences = convert(JArrayList, jsentences)
#All JObjects thus 'gotten' using jcall(..., "get", ...) will need to be similarly converted.

#Get a first sentence.
#Indexing starts at 0 for Java, not at 1 as in Julia.
sent0 = jcall(jsentences, "get", JObject, (jint,), 0)
sent0 = convert(JAnnotation, sent0)

#Can now explore the data structures corresponding to the annotators 
#that were run. Start with tokens.
jtokens = jcall(sent0, "get", JObject, (JClass,), JTokensAnnotationClass)
jtokens = convert(JArrayList, jtokens)
jtokens_size = jcall(jtokens, "size", jint, ())

#Can loop thru each token in jtokens and use its methods to extract info. 
#Use listmethods(token) to find possible methods or look here:
#https://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/ling/CoreLabel.html
#Here, e.g., is a table of lemmas and parts of speech. 
for i = 0:jtokens_size-1
   token = jcall(jtokens, "get", JObject, (jint,), i)
   token = convert(JCoreLabel, token)
   lemma = jcall(token, "lemma", JString, ())
   tag = jcall(token, "tag", JString, ())
   println(i, "   ", lemma, "   ", tag)
end

#The tree data structure is similar but uses its own JClass in the jcall.
tree = jcall(sent0, "get", JObject, (JClass,), JTreeClass)
# tree = convert(JTree, tree)   
# Per @dfdx, tree is null, IIUC, corresponding annotator didn't run on that sentence.

#Similarly, the semantic graph.
semgraph = jcall(sent0, "get", JObject, (JClass,), 
                          JCollapsedCCProcessedDependenciesAnnotationClass)
semgraph = convert(JSemanticGraph, semgraph)

Looks good! A couple of notes.

  1. Casting a returned object to an appropriate subclass turns to be such a common pattern that I added a special function narrow to do it automatically. Now instead of:
sent0 = jcall(jsentences, "get", JObject, (jint,), 0)
sent0 = convert(JAnnotation, sent0)

you can write:

sent0 = narrow(jcall(jsentences, "get", JObject, (jint,), 0))  # sent0 is automatically converted to JAnnotation

or, if you prefer chaining syntax:

sent0 = jcall(jsentences, "get", JObject, (jint,), 0) |> narrow
  1. Iterating over a collection is very common in both - Java and Julia, so JavaCall wraps Java Iterator interface to make looping more natural. Instead of:
jtokens = convert(JArrayList, jtokens)
jtokens_size = jcall(jtokens, "size", jint, ())

for i = 0:jtokens_size-1
   token = jcall(jtokens, "get", JObject, (jint,), i)
   ...
end

you can now write:

jtokens = native(convert(JArrayList, jtokens))
for token in JavaCall.iterator(jtokens)
   token = jcall(jtokens, "get", JObject, (jint,), i)
   ...
end
  1. In code
tree = jcall(sent0, "get", JObject, (JClass,), JTreeClass)

you can check whether tree (or, more generally, any JavaObject returned from any function) is null by looking at its pointer address. null means that pointer is all zeros and thus object looks something like this:

JavaCall.JavaObject{Symbol("java.lang.Object")}(Ptr{Void} @0x0000000000000000)

If you are going to further improve this it makes sense to fork my repository (or create totally new) and put your code there to make it easier to reuse.

Still haven’t been able to figure out how to get the parse tree or dependencies. As we saw above,

julia> jtrees = jcall(sent0, "get", JObject, (JClass,), JTreeClass)
JavaCall.JavaObject{Symbol("java.lang.Object")}(Ptr{Void} @0x0000000000000000)

gives a null result (pointer is zero). I also tried

JTreeAnnotationClass = classforname("edu.stanford.nlp.trees.TreeCoreAnnotations\$TreeAnnotation")
JavaCall.JavaObject{Symbol("java.lang.Class")}(Ptr{Void} @0x0000000004ae0038)
julia> jtrees = jcall(sent0, "get", JObject, (JClass,), JTreeAnnotationClass)
JavaCall.JavaObject{Symbol("java.lang.Object")}(Ptr{Void} @0x0000000000000000)

The pipeline includes parse

pipeline = StanfordCoreNLP(Dict("annotations" =>
              "tokenize, ssplit, pos, lemma, ner, parse, dcoref"))

Can anyone advise me how to obtain the parse and dependency trees? Or point me to some example code? Thanks in advance.

According to this demo, dependency parser requires MaxentTagger to be run first. Can you try to translate that snippet to Julia? At first glance, there’s only one language feature that hasn’t been covered in previous examples - access to a constant field DependencyParser.DEFAULT_MODEL, but you can simply copy its value from the source.

1 Like

I will give it a try, thanks.

Just noticed that the demo is for DependencyParser while earlier we were talking about SemanticGraph, so previous comment might be not relevant. Not sure why example with SemanticGraph doesn’t work, but the first thing I would do is to take the complete code in Java and try to run it. If it works in Java but not in Julia, we can dive deeper to find out why. And if it doesn’t work in Java either, it’s worth to ask CoreNLP developers (they seem to be active enough on StackOverflow).

The workflow that was working ok above is now choking on the JavaCall.classforname() call. Specifically, ERROR: UndefVarError: penv not defined. I updated via Pkg.update("JavaCall").

penv is all over the place inside of JavaCall but I don’t see where it is defined. Maybe I’m doing something dumb or maybe something has changed? Any suggestions welcome. Thanks.

julia> using JavaCall

julia> using JavaCoreNLP

julia> JProperties = @jimport java.util.Properties 
JavaCall.JavaObject{Symbol("java.util.Properties")}

julia> JArrayList = @jimport java.util.ArrayList
JavaCall.JavaObject{Symbol("java.util.ArrayList")}

julia> JStanfordCoreNLP = @jimport edu.stanford.nlp.pipeline.StanfordCoreNLP
JavaCall.JavaObject{Symbol("edu.stanford.nlp.pipeline.StanfordCoreNLP")}

julia> JAnnotation = @jimport edu.stanford.nlp.pipeline.Annotation
JavaCall.JavaObject{Symbol("edu.stanford.nlp.pipeline.Annotation")}

julia> JTree = @jimport edu.stanford.nlp.trees.Tree
JavaCall.JavaObject{Symbol("edu.stanford.nlp.trees.Tree")}

julia> JSemanticGraph = @jimport edu.stanford.nlp.semgraph.SemanticGraph
JavaCall.JavaObject{Symbol("edu.stanford.nlp.semgraph.SemanticGraph")}

julia> JCoreLabel = @jimport edu.stanford.nlp.ling.CoreLabel
JavaCall.JavaObject{Symbol("edu.stanford.nlp.ling.CoreLabel")}

julia> JSentencesAnnotationClass = classforname("edu.stanford.nlp.ling.CoreAnnotations\$SentencesAnnotation")
ERROR: UndefVarError: penv not defined
Stacktrace:
 [1] jcall(::Type{JavaCall.JavaObject{Symbol("java.lang.Thread")}}, ::String, ::Type{T} where T, ::Tuple{}) at /home/john/.julia/v0.6/JavaCall/src/core.jl:134
 [2] classforname(::String) at /home/john/.julia/v0.6/JavaCall/src/reflect.jl:164

You have to do a JavaCall.init() (optionally setting your classpath) before you can use any other JavaCall methods.