CoreNLP - Base Name Conflict


#1

Hi,
Trying to use the CoreNLP.jl package at the Julia 0.6.0 command line on Ubuntu 16.04. I’m following the instructions shown here. The session looks like this–

julia> using CoreNLP
WARNING: deprecated syntax “_ as an rvalue”.
julia> corenlp_init("/home/john/CoreNLP/stanford-corenlp-full-2014-01-04")
PyObject '<'corenlp.corenlp.StanfordCoreNLP instance at 0x7f00038dd710>
julia> my_parse = parse(“John ate a baked, green apple.”)
WARNING: both CoreNLP and Base export “parse”; uses of it in module Main must be qualified
ERROR: UndefVarError: parse not defined

I tried to qualify parse in two ways–

julia> my_parse = CoreNLP::parse(“John ate a baked, green apple.”)
ERROR: UndefVarError: parse not defined

julia> my_parse = CoreNLP.parse(“John ate a baked, green apple.”)
ERROR: UndefVarError: int not defined
Stacktrace:
[1] extract_index at /home/john/.julia/v0.6/CoreNLP/src/CoreNLP.jl:131 [inlined]
[2] parse_raw(::Dict{String,Any}, ::Symbol) at /home/john/.julia/v0.6/CoreNLP/src/CoreNLP.jl:166
[3] parse(::String) at /home/john/.julia/v0.6/CoreNLP/src/CoreNLP.jl:219

Apparently both the Base and CoreNLP modules create ‘parse’ and they conflict. The discussion here says to use the :: notation but, as shown above, this did not work. The discussion here seems sort of relevant but not actually useful to me.

I’d appreciate any help in getting CoreNLP working in Julia, viz., in solving this problem of the name conflict of parse() in CoreNLP vs. Julia/Base. Thanks!


#2

CoreNLP.jl has not been updated for three years, so I assume it does not support Julia versions more recent than 0.3. If you really want to use it, you’ll have to port it to Julia 0.6, which will likely require quite some work (though it should be doable).


#3

And in the following post the author admitted that it’s C++ syntax, not Julia’s or Python’s. The syntax:

CoreNLP.parse("bla bla bla")

is correct. Your call fails because CoreNLP.jl uses type name int which has been deprecated a while ago.


#4

I tried to quickly fix it, but there’s so many issues (outdated syntax, no tests, required Python packages not installed automatically) that I’d just rewrite it altogether, removing Python as a dependency (CoreNLP itself is a Java library). It’s quite easy actually and shouldn’t take more than a couple of days. If you are brave enough to start your Julia life with creating a new package, I will help you to get started.


#5

Thanks for the replies. I’m brave enough to try but will need all the help you can give. Available Sunday or Monday and I will post again then. Thanks!


#6

At the time I wrote it, there was no reliable way to call directly into the Java library. Would love to know if that’s changed in the intervening years.


#7

#8

I made a starter kit for you:

Using it you can translate first code snippet from CoreNLP docs as follows:

Pkg.clone("https://github.com/dfdx/JavaCoreNLP.jl")
Pkg.build("JavaCoreNLP")  # to download and assemble Java dependencies, requires Maven

using JavaCoreNLP

pipeline = StanfordCoreNLP(Dict("annotations" =>
                    "tokenize, ssplit, pos, lemma, ner, parse, dcoref"))
doc = Annotation("The Beatles were an English rock band formed in Liverpool in 1960.")
annotate!(pipeline, doc)

What’s included:

  • maven project that downloads CoreNLP and English model JARs and assembles them into a single “uberjar”; see pom.xml if you want to add more models or customize build in any way
  • initialization script that loads JVM with assembled dependencies whenever you call using JavaCoreNLP
  • a few wrappers for Java classes and examples of calling their methods
  • a very short intro into JavaCall so you could start writing code quickly

Feel free to clone / rename / refactor or ask any questions on the topic.


#9

THANKS. I followed the above and could make it work with some warnings from both the REPL and from Atom (Julia 0.6). The warnings were:

WARNING: deprecated syntax “inner constructor JavaObject(…) around /home/john/.julia/v0.6/JavaCall/src/core.jl:30”.
Use “JavaObject{T}(…) where T” instead.

WARNING: deprecated syntax “inner constructor JavaObject(…) around /home/john/.julia/v0.6/JavaCall/src/core.jl:35”.
Use “JavaObject{T}(…) where T” instead.

Loaded /usr/lib/jvm/default-java/jre/lib/amd64/server/libjvm.so

SLF4J: Failed to load class “org.slf4j.impl.StaticLoggerBinder”.
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.

Ignoring those for now, my bigger problem is, how to access the result in ‘doc’?

For comparison, from bash, I can use curl to post a query to Stanford’s online parser and send the response to an output file, e.g.,

curl --data query=‘John ate a baked green apple.’ http://nlp.stanford.edu:8080/parser/index.jsp > output.xml

The result is some xml which can be parsed to extract the desired info. Presumably, this same info exists within ‘doc’ but how do I get it?

(Usual disclaimers that I’m really dumb and ought to know this myself.)

Thanks.


#10

These come from JavaCall which still has some deprecated syntax. You can indeed ignore it.

These lines tell that logger isn’t properly configured on Java side. Adding appropriate logger implementation to pom.xml should fix it, but normally ignore it for years :smiley:

This demo URL shows the usage of a dependency parser - one specific feature included in CoreNLP. I believe in Java API it’s represented as a Tree class (or maybe SemanticGraph, I’d check both). Here’s seems to be the relevant section in the Getting Started guide:

...
List<CoreMap> sentences = document.get(SentencesAnnotation.class);

for(CoreMap sentence: sentences) {
  ...

  // this is the parse tree of the current sentence
  Tree tree = sentence.get(TreeAnnotation.class);

  // this is the Stanford dependency graph of the current sentence
  SemanticGraph dependencies = sentence.get(CollapsedCCProcessedDependenciesAnnotation.class);
}

So basically you need to call Java methods (using jcall) to iterate over the list of the sentences and extract parsed objects. Hope this is enough to get you started and feel free to follow up with any questions.


#11

Hi again. In–

function Annotation(text::AbstractString)
    jann = JAnnotation((JString,), text)
    return Annotation(jann)
end

and also in–

function StanfordCoreNLP(props::Dict{String, String})
    jprops = to_jprops(props)
    jpipeline = JStanfordCoreNLP((JProperties,), jprops)
    return StanfordCoreNLP(jpipeline)
end

it looks like the returns are calling the functions themselves. Either this is a simple return and I’ve misunderstood Julia’s syntax, or it’s a recursion and I’ve misunderstood the function. Could you let me know, pls?


#12

Julia’s functions work using the types of their arguments. To take the Annotations example, the code you have posted is a function that is called when its argument is some kind of a string. In that function, the jann variable is created, which holds an object of type JAnnotation. Then, when you call Annotation(jann) in the last line of the function, it dispatches to definition of the Annotation function that takes a single argument of type JAnnotation.

Now this function is generated automatically by the compiler. If you see the definition of the type Annotation in https://github.com/dfdx/JavaCoreNLP.jl/blob/master/src/pipeline.jl#L4 , you will notice that it is a composite struct containing one member (or field) of type JAnnotation. The compiler then will automatically create a function with name Annotation that takes a JAnnotation object as a parameter. This function will then create the Annotation object in its body. This function is usually called the default constructor, but you can think of this simply as a function that creates the object.

Hope that helps. See here for more: https://docs.julialang.org/en/stable/manual/types/#Composite-Types-1


#13

Hi. Struggling with this, maybe b/c no Java experience. I understand how to construct the pipeline and call the annotator. In order to interrogate the result, I understand I need to–

  1. Import CoreMap into Julia. I tried–
    JCoreMap = @jimport edu.stanford.nlp.util.CoreMap
    and it worked error-free in Julia.

  2. Act on doc::Annotation using CoreMap’s “get” method to obtain a List called, e.g., sentences. I tried to build a jcall like this–
    jcall(JCoreMap, "get", RETURN TYPE?, (Annotation,), doc::Annotation)
    but didn’t succeed. I tried various return types but always got MethodErrors such as–

jcall(JCoreMap, "get", JCoreMap, (Annotation,), doc)
ERROR: MethodError: no method matching write(::Base.AbstractIOBuffer{Array{UInt8,1}}, ::Void)
Closest candidates are:
  write(::IO, ::Any) at io.jl:284
  write(::IO, ::Any...) at io.jl:286
  write(::IO, ::Complex) at complex.jl:175
  ...
Stacktrace:
 [1] write(::Base.AbstractIOBuffer{Array{UInt8,1}}, ::Void) at ./io.jl:284
 [2] method_signature(::Type{T} where T, ::Type{T} where T, ::Vararg{Type{T} where T,N} where N) at /home/john/.julia/v0.6/JavaCall/src/core.jl:281
 [3] jcall(::Type{JavaCall.JavaObject{Symbol("edu.stanford.nlp.util.CoreMap")}}, ::String, ::Type{T} where T, ::Tuple{DataType}, ::JavaCoreNLP.Annotation, ::Vararg{JavaCoreNLP.Annotation,N} where N) at /home/john/.julia/v0.6/JavaCall/src/core.jl:128

Some concrete questions–

  1. How to construct the jcall for “get”.
  2. What type does the jcall for “get” return? It should be whatever is stored as the value in the object.
  3. How to distinguish between document.get, sentence.get, etc in the jcall?
  4. How to construct the jcall so that it accepts a List<CoreMap> in return?

Hope that’s clear enough. Thanks again for the help.


#14

Correct.

Act on doc::Annotation using CoreMap’s “get” method to obtain a List called, e.g., sentences.

Not quite. Let’s break it down.

// Java
document.get(...)

Here we call method get of object document which has type Annotation in Java. In Julia we have a handle to such an object in doc.jann and its type, again in our Julia application, is JAnnotation (while Julia’s Annotation is a wrapper around this handle). In other words:

struct Annotation       #  <-- this is a Julia wrapper
    jann::JAnnotation   #  <-- this is the handle to Java object or type edu.stanford.nlp.pipeline.Annotation 
                        #      which we aliased in Julia as JAnnotation
end

So the basic call should be:

jcall(
   doc.jann,  # <-- handle to Java object
   "get",       # <-- method name
    ...,           # return type, ignore for now,
    (...),         # argument types
    ...            # actual arguments
)

Note, that get is a method inherited from ArrayCoreMap and is defined here. It has signature:

<VALUE> VALUE get(Class<? extends Key<VALUE>> key)

Whoa! Looks unreadable for non-Java people, right? Let’s further break it down:

  • <VALUE> is a type parameter, similar to <typename VALUE> in C++ templates or ... where VALUE in Julia
  • VALUE means that we return this exact type
  • Class<VALUE> (I simplified a bit) means that get takes a single argument of Java type Class<VALUE>

We pass SentencesAnnotation.class to this get which is a value of type Class<SentencesAnnotation>. From it we can infer that VALUE in this case is SentencesAnnotation! Method signature thus is:

SentencesAnnotation get(Class<SentencesAnnotation> key)

It’s interesting to note that SentencesAnnotation actually implements interface List<CoreMap> which we see as a return type in Java code:

List<CoreMap> sentences = document.get(SentencesAnnotation.class)

Ok, we understood what it all means in Java, but how should it look like in Julia? It’s a bit more tricky question that boils down to:

  • what signature this method has in JVM itself (because type parameters don’t actually exist after compilation)
  • how to specify type of Class<SentencesAnnotation> since JavaCall doesn’t support generics too

To answer these questions I need to inspect JAnnotation object that I can’t do right now. Would you mind running the following for me?

JavaCall.listmethods(doc.jann, "get")

And while I wait for the result of the previous code I will address your questions (3) and (4) (I hope previous explanation shed some light on questions (1) and (2)).

How to distinguish between document.get, sentence.get, etc in the jcall?

As simple as

jcall(jann, ...)
jcall(jsentence, ...)
# etc.

The first argument is always an object you call method on. Literally, obj.method(...) is translated into jcall(jobj, "method", ...).

How to construct the jcall so that it accepts a List< CoreMap > in return?

It depends on the actual method signature in JVM. The tricky part is that what you see in Java code doesn’t always correspond one-to-one to what is really created in JVM. As I mentioned earlier, JVM knows nothing about type parameters, so definitely there’s no information about CoreMap in there. Moreover, there may be nothing about List in JVMs signature for that method, but instead any interface that extends List or any class that implements it.

It’s not the easiest part of working with JVM. Fortunately, it’s actually rarely needed to dive so deep - as you will see soon, most of the time you just construct objects and call jcall on clear method signature from Java docs.


#15
julia> JavaCall.listmethods(doc.jann, "get")
1-element Array{JavaCall.JavaObject{Symbol("java.lang.reflect.Method")},1}:
 java.lang.Object get(java.lang.Class)

#16

Read and understood, thanks. I still don’t see how to write down the last three arguments of the jcall call for document.get,sentences.get, etc. But I understand there may be more answer coming based on JavaCall.listmethods(doc.jann, "get") that you asked me to run. So I’m waiting for now.


#17

Looks good! From it we see that:

  • return type is just a plain java.lang.Object aliased as JObject in JavaCall
  • method accepts an instance of java.lang.Class or JClass in JavaCall

Our jcall now looks like:

jcall(doc.jann, "get", JObject, (JClass,), ...)

The only thing that we don’t have yet is an instance of java.lang.Class<SentencesAnnotation>. We can do it by calling the following in Java:

Class.forName("edu.stanford.nlp.ling.CoreAnnotations.SentencesAnnotation")

or equivalently in Julia:

jcall(JClass, "forName", JClass, (JString,), "edu.stanford.nlp.ling.CoreAnnotations.SentencesAnnotation")

Finally, the Java code:

List<CoreMap> sentences = document.get(SentencesAnnotation.class);

is translated into:

jsentences_annotation_class = jcall(JClass, "forName", JClass, (JString,), "edu.stanford.nlp.ling.CoreAnnotations.SentencesAnnotation")
sentences = jcall(doc.jann, "get", JObject, (JClass,), jsentences_annotation_class)

Please run it and report if there are any issues. If everything is fine, I’ll move to iterating over the list of sentences (which is much simpler, but requires knowledge of Java collections).


#18

The forName call failed as follows–

julia> jcall(JClass, "forName", JClass, (JString,), "edu.stanford.nlp.ling.CoreAnnotations.SentencesAnnotation")
Exception in thread "main" java.lang.ClassNotFoundException: edu/stanford/nlp/ling/CoreAnnotations/SentencesAnnotation
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:264)
ERROR: Error calling Java: java.lang.ClassNotFoundException: edu/stanford/nlp/ling/CoreAnnotations/SentencesAnnotation
Stacktrace:
 [1] geterror(::Bool) at /home/john/.julia/v0.6/JavaCall/src/core.jl:265
 [2] _jcall(::JavaCall.JavaMetaClass{Symbol("java.lang.Class")}, ::Ptr{Void}, ::Ptr{Void}, ::Type{T} where T, ::Tuple{DataType}, ::String, ::Vararg{String,N} where N) at /home/john/.julia/v0.6/JavaCall/src/core.jl:223
 [3] jcall(::Type{JavaCall.JavaObject{Symbol("java.lang.Class")}}, ::String, ::Type{T} where T, ::Tuple{DataType}, ::String, ::Vararg{String,N} where N) at /home/john/.julia/v0.6/JavaCall/src/core.jl:131

Per google, maybe that’s a problem with the classpath. Not sure how that works for Julia.

I’m also wondering why this is a jcall and not a @jimport like–

jsentences_annotation_class = @jimport edu.stanford.nlp.ling.CoreAnnotations.SentencesAnnotation

since the purpose is to create a Jobject that matches the SentencesAnnotation class. (I thought that’s what @jimport does.) Also, the earlier examples (pipeline and annotator) first called @jimport and then used the resulting object in the jcall call, as you explain above.


#19

Ah, it’s a bit more interesting. It seems like SentencesAnnotation is a so-called “inner class”, i.e. class defined inside another class - CoreAnnotations. So I guess the correct syntax is (not a $ in the name):

jcall(JClass, "forName", JClass, (JString,), "edu.stanford.nlp.ling.CoreAnnotations$SentencesAnnotation")

Although these are not issues with your call, your concerns still deserve explanation:

Per google, maybe that’s a problem with the classpath. Not sure how that works for Julia.

You specify classpath (e.g. list of JAR files to include) in JavaCall.init(). In this project I made a trick: built a single large JAR that encapsulates all libraries you need. If you ever want to add more libraries, edit JavaCoreNLP/jvm/corenlp-wrapper/pom.xml, add these libraries as dependencies and rebuild using either mvn clean package from a shell or Pkg.build("JavaCoreNLP") from a Julia REPL.

I’m also wondering why this is a jcall and not a @jimport like–

This is one thing that sometimes blows my mind too. The reason is that you have to work with 4 different kinds of objects - Java types, Java objects, Julia types and Julia objects. Let’s start with the Java side.

In Java you normally work with objects, i.e. instances of classes. E.g. in (Java):

Foo foo = new Foo();

Foo is the name of the class and foo is an instance of that class. Just like int is the name of a type and 42 is an object of that type.

In Julia syntax is different, but in general idea is the same:

bar = Bar()

Bar - name of a type, bar - an object of that type.

JavaCall let’s you treat Java classes (types of Java objects) just like you would normally do with other types:

JFoo = @jimport Foo  # import Java class Foo
jfoo = JFoo(())              # create an object of Java class Foo

This is how you normally use JavaCall and it should be pretty straightforward, right?


But both - in Julia and Java - types/classes are themselves objects! In Java each class is an instance of java.lang.Class (e.g. java.lang.Class<Foo>) and in Julia each type is an object of type DataType. So when you have a name of a class/type at hand, you can treat them either as types or objects. For example in Java:

void myFunction(Foo foo) { ... }       // <-- Foo acts as type, i.e. qualifier of objects that can be passed to the function

Class<Foo> fooClass = Foo.class               // <-- Foo acts as an object. You can create an instance of that class
Class<Foo> fooClass = Class.forName("Foo")    //     or call class methods (as opposed to instance methods) 

When in Java want to pass SentencesAnnotation to a function, so we need this class as an object, not as a type. In Java we can get an it as an object using either of the 2 syntaxes:

SentencesAnnotation.class
Class.forName("Foo")

Unfortunately, the first one relies on an object field which JavaCall can’t read (or it already can?), so we are left with the second option:

jcall(JClass, "forName", JClass, (JString,), "Foo")

To summarize:

  • @jimport makes Java class available as a type; this is how you normally work with Java classes
  • jcall(JClass, "forName", JClass, (JString,), ...) creates an object of Java class and returns a pointer to it

#20

By the way, try (insert any existing class instead of Foo):

JFoo = @jimport Foo
typeof(JFoo)    # DataType

foo = Foo(())
typeof(foo)      # JavaObject{:Foo}

foo_class = jcall(JClass, "forName", JClass, (JString,), "Foo")
typeof(foo_class) .  # JavaObject{Symbol("java.lang.Class")}