CoreNLP - Base Name Conflict

JavaCoreNLP.init() also takes care of the classpath. However, it should get called automatically and this is what actually happens on my machine.

@aether could you try to run Pkg.checkout("JavaCoreNLP") (note Pkg.update() only fetches the latest released version, while Pkg.checkout() gets the latest master)? In case it fails, please try to initialize it explicitly and check if it works.

Thanks @avik and @dfdx. I knew I was missing something simple. Meanwhile, I have configured Atom and Git/GitHub so, hopefully, I will avoid the dumb mistakes and go back to the plain ignorant ones :wink: .

I see that JavaCoreNLP calls init() automatically. I think there is no problem with that and I will confirm but at the moment Pkg.checkout("JavaCoreNLP") gives ERROR: JavaCoreNLP is not a git repo. I think it’s looking at a local directory of mine instead of a list of git hosts.

Andrei, in post 34 of this thread you wrote,

jtokens = native(convert(JArrayList, jtokens))
for token in JavaCall.iterator(jtokens)
   token = jcall(jtokens, "get", JObject, (jint,), i)
   ...
end

What is native? I can’t find it as a Julia function and searching is not useful b/c it brings up so many results. Thx.

Ah, I think I meant narrow(), sorry for confusion.

ERROR: JavaCoreNLP is not a git repo.

Is JavaCoreNLP in your ~/.julia/v0.6/ directory? If not, you can clone it first using:

Pkg.clone("git@github.com:dfdx/JavaCoreNLP.jl.git")  # or your own repo address

Thanks, I thought that might be it.

It is now, thanks.

Another question–

classforname calls such as–

JSentencesAnnotationClass = classforname("edu.stanford.nlp.ling.CoreAnnotations\$SentencesAnnotation")

are throwing errors such as–

Exception in thread "main" java.lang.ClassNotFoundException: edu.stanford.nlp.ling.CoreAnnotations$SentencesAnnotation
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:348)

CLASSPATH is not set globally, i.e.,

john@latitude-E7440:~$ echo ${CLASSPATH}

john@latitude-E7440:~$

I see that the classpath is set by [JavaCall.init]. I added a line to print it out and got–

/home/john/.julia/v0.6/JavaCoreNLP/src/../jvm/corenlp-wrapper/target/corenlp-wrapper-0.1-assembly.jar

The first just points to the JavaCoreNLP source. The second points to directories that don’t exist on my system–

john@latitude-E7440:~$ whereis corenlp-wrapper
corenlp-wrapper:
john@latitude-E7440:~$

The closest I get is–

john@latitude-E7440:~$ whereis jvm
jvm: /usr/lib/jvm

I googled corenlp-wrapper-0.1-assembly.jar thinking I might need to download it but didn’t find any useful information. On the other hand, the jars I think it wants are here–

john@latitude-E7440:~/CoreNLP/stanford-corenlp-full-2014-01-04$ ls -l | grep stanford
-rw-rw-r-- 1 john john   5139881 Jan 10  2014 stanford-corenlp-3.3.1.jar
-rw-rw-r-- 1 john john   6564610 Jan 10  2014 stanford-corenlp-3.3.1-javadoc.jar
-rw-rw-r-- 1 john john 206757847 Jan 10  2014 stanford-corenlp-3.3.1-models.jar
-rw-rw-r-- 1 john john   3209801 Jan 10  2014 stanford-corenlp-3.3.1-sources.jar

In summary, the following questions–

  1. Are these the correct jars?
  2. Is setting classpath the right solution?
  3. Why is JavaCall.init() setting classpath the way it is?
  4. Am I missing corenlp-wrapper-0.1-assembly.jar?
  5. What is corenlp-wrapper-0.1-assembly.jar? Maybe a manifest listing other jars?

Thanks.

Pkg.build("JavaCoreNLP")

This should fix it. If not, try from console:

cd ~/.julia/v0.6/JavaCoreNLP/jvm
mvn clean package

Both methods assume you have Java 8 and Maven installed and available from console.

What all this means?

First of all, to work with Java libraries we need to have these libraries in our project. We could, for example, download all needed JARs by hand, put to some directory on a system and point JavaCall to it. But libraries are upgraded, their dependencies change, etc. Much easier way is to use Java’s standard dependency management tools, e.g. Maven.

Minimal Maven project consists of a single pom.xml file. pom files may be quite verbose and tangled, but the most important sections are those defining dependencies (which in our case include 2 Stanford CoreNLP jars) and what artifacts to produce (nope, I don’t fully understand that config too, just copied and pasted :)). We are interested to create a single JAR artifact that includes all the dependencies. In Java parlance it’s called “an assembly”, and since Maven project name is “corenlp-wrapper” and version is 0.1, Maven creates a JAR with the name “corenlp-wrapper-0.1-assembly.jar” in target/ subdirectory.

Calling Maven directly isn’t very convenient. Julia provides a standard interface to build package dependencies (e.g. JARs) through Pkg.build(). Pkg.build() simply runs <project dir>/deps/build.jl, and in our build.jl we simply run mvn clean package to create our artifacts.

Once the artifact is built (e.g. JavaCoreNLP/jvm/corenlp-wrapper/target/corenlp-wrapper-0.1-assembly.jar) is created, we can add it to a classpath using JavaCall.init(...). Of course, if the artifact hasn’t been built, JVM doesn’t know where to look for CoreNLP’s classes and throws a ClassNotFoundException exception.

1 Like

That indeed fixed it and thanks for the explanation.

More problems with: java.lang.ClassNotFoundException.

I realized I had an old version of CoreNLP (2014-01-04) which did not include certain classes such as MaxentTagger which is required for the DependencyParserDemo you sited above in Post 35. I downloaded the latest from here following the instructions at the bottom under Steps to setup from the GitHub HEAD version. Specifically, I did as follows–

  1. git clone git@github.com:stanfordnlp/CoreNLP.git
  2. cd CoreNLP
  3. ant jar
  4. wget http://nlp.stanford.edu/software/stanford-corenlp-models-current.jar
    As a result, I have
john@latitude-E7440:~/CoreNLP$ ls -la | grep .jar
-rw-rw-r--  1 john john  10123623 Aug 27 14:25 javanlp-core.jar
-rw-rw-r--  1 john john 362594065 Jul  4 03:55 stanford-corenlp-models-current.jar

and the rest of the package is at ~/CoreNLP/src/edu/stanford/nlp/...

The JavaCoreNLP build files are still here–

john@latitude-E7440:~/.julia/v0.6/JavaCoreNLP/jvm/corenlp-wrapper$ ls -l
total 16
-rw-rw-r-- 1 john john 8347 Aug 20 15:13 pom.xml
drwxrwxr-x 3 john john 4096 Aug 27 13:35 target

I’ve been reading up on Maven but at this point, I’m beginning to go cross-eyed. So the following questions–

  1. How to integrate CoreNLP into the project?
  2. Should I move CoreNLP under .../edu/stanford/nlp?
  3. Or do I leave the CoreNLP jars where they are and tell JavaCoreNLP about them?
  4. Or something else?
  5. Was I right to build CoreNLP separately? Or should that be built together with JavaCoreNLP?

In the weeds now so any help appreciated.

How did you find it? Our pom.xml includes CoreNLP 3.8 which is the latest official release as far as I understand. Also, corenlp-wrapper-0.1-assembly.jar (the artifact produced by Maven from our pom.xml) does include edu/stanford/nlp/tagger/maxent/MaxentTagger.class, so you shouldn’t experience ClassNotFoundException.


Regarding other questions:

How to integrate CoreNLP into the project?

The easiest way is to add it as a dependency in pom.xml and run mvn clean package - exactly as we do it. If a version of jar you need is not registered in Maven repository you can build it manually (e.g. using ant), but it’s very rare case nowadays.

Should I move CoreNLP under …/edu/stanford/nlp?

No, it’s a path inside jar. In fact, jar file is nothing more than a ZIP archive. Try to unpack it using your favorite archive manager and see what’s inside.

Or do I leave the CoreNLP jars where they are and tell JavaCoreNLP about them?

From the package perspective, it’s better to keep all dependencies inside.

Was I right to build CoreNLP separately? Or should that be built together with JavaCoreNLP?

Again, from the package perspective it’s much better to build it during package installation. Think about usability: do you want some package to have a long complicated instructions for setting up or just a single command Pkg.add("MyPackage") that will download and build the dependencies itself?

All in all, working via Maven should solve most of these issues.

It came from downloading CoreNLP.jl here which downloads a file stanford-corenlp-full-2014-01-04.zip.

That’s an old Python-based version and it hasn’t been updated for 3 years. It’s no surprise that it lacks some of the newer CoreNLP’s features.

As discussed earlier, JavaCoreNLP is a different project that calls Java directly rather than through Python. It uses Maven to download CoreNLP 3.8 (latest to the moment). All you need to do is to clone/checkout latest master of JavaCoreNLP and run:

Pkg.build("JavaCoreNLP")

If this doesn’t solve ClassNotFound exceptions, please report.

I installed a clean system, forked dfdx/JavaCoreNLP to Aether/JavaCoreNLP, cloned that, and built it. It still threw the same ClassNotFound errors for MaxentTagger and two other classes.

By looking at the contents of corenlp-wrapper-0.1-assembly.jar, I figured out that the problematic classes were actually outer classes, not inner classes as I had assumed. Thus, the correct call is
JMaxentTaggerClass = classforname("edu.stanford.nlp.tagger.maxent.MaxentTagger")
and not
JMaxentTaggerClass = classforname("edu.stanford.nlp.tagger.maxent\$MaxentTagger")
as I had assumed. (As we saw above, most of the other required classes are inner classes.)

This fixed the problem.

What’s the right way to construct the call–
DependencyParser parser = DependencyParser.loadFromModelFile(modelPath);
from the DependencyParserDemo?

I created an instance of JDependencyParser and then called its loadFromModelFile method as follows–

JDependencyParser = @jimport edu.stanford.nlp.parser.nndep.DependencyParser

type DependencyParser
    jdp::JDependencyParser
end

function DependencyParser()
    jdp = JDependencyParser(()) 
    return DependencyParser(jdp)
end

function DependencyParser(parser::DependencyParser, modelPath::AbstractString)
    return narrow(jcall(parser.jdp, "loadFromModelFile", JObject, (JString,), modelPath))
end

parser = DependencyParser()
parser = DependencyParser(parser, modelPath)

Only the last line fails with:
java.lang.NoSuchMethodError: loadFromModelFile

I also called the class directly, i.e.,

function DependencyParser(modelPath::AbstractString)
    return narrow(jcall(JDependencyParser, "loadFromModelFile", JObject, (JString,), modelPath))
end
parser = DependencyParser(modelPath)

This also fails with:
java.lang.NoSuchMethodError: loadFromModelFile

I also tried the jcall call with–
JDependencyParserClass = classforname("edu.stanford.nlp.parser.nndep.DependencyParser")
Same fail.

Questions–

  1. How to make this call?
  2. Why the NoSuchMethodError since loadFromModelFile is definitely a method of DependencyParser?
  3. In jcall, can the first argument always be either an instance or the class itself?

Thanks!

Looks good! I’m glad you’ve got general idea. As for questions, let’s start with the last one:

In jcall, can the first argument always be either an instance or the class itself?

In Java, there are 2 types of methods - instance methods and class (or static) methods. For example, if you have an instance of Integer (e.g. number 58) and want to convert this specific instance to a string “58”, you call instance method toString. Clearly, toString depends on an instance value, not just its class.

Static methods are bound to classes instead. For example, if you want to parse string “58” to an integer, you can call static method Integer.parseInt(). parseInt() doesn’t depend on any specific instance of Integer, rather it resides in Integer “namespace”.

So if you have the following definition in Java:

class MyClass {
    void foo() {...}
    static void bar() {...}
}

you can call foo() and bar only like this:

mc = JMyClass(())
jcall(mc, "foo", Void, ())
jcall(JMyClass, "bar", Void, ())

you cannot call an instance method on a class (this method just won’t have enough information to run!) and you cannot call static methods on instance (although Java the language allows it, JVM itself doesn’t).

loadFromModelFile is a static method, so the first argument to jcall should definitely be JDependencyParser, not its instance jdp.

Why the NoSuchMethodError since loadFromModelFile is definitely a method of DependencyParser?

However, If we call loadFromModelFile like this:

# WRONG!
jcall(JDependencyParser, "loadFromModelFile", JObject, (JString,), modelPath)

we will still get NoSuchMethodException. Let’s see what methods are available:

julia> listmethods(JDependencyParser, "loadFromModelFile") 
 2-element Array{JavaCall.JavaObject{Symbol("java.lang.reflect.Method")},1}:  
 edu.stanford.nlp.parser.nndep.DependencyParser loadFromModelFile(java.lang.String)    
 edu.stanford.nlp.parser.nndep.DependencyParser loadFromModelFile(java.lang.String, java.util.Properties)

We are calling the first of these 2 methods, but provide JObject as a return type while method signature requires (J)DependencyParser! Fixing it gives us correct call:

jcall(JDependencyParser, "loadFromModelFile", JDependencyParser, (JString,), modelPath)
1 Like

Understood re: static vs. instance methods. I also see there are two separate jcall methods in the source. That helps a lot. Thanks!

I’m translating this line from the demo
DocumentPreprocessor tokenizer = new DocumentPreprocessor(new StringReader(text));
and I get this error–

JStringReader = @jimport java.io.StringReader
JDocumentPreprocessor = @jimport edu.stanford.nlp.process.DocumentPreprocessor

type StringReader
    jsr::JStringReader
end

function StringReader(text::AbstractString)
    jsr = JStringReader((JString,), text)
    return StringReader(jsr)
end

stringreader  = StringReader(text)

type DocumentPreprocessor
    jdp::JDocumentPreprocessor
end

function DocumentPreprocessor(stringreader::StringReader)
    jdp = JDocumentPreprocessor((JStringReader,), stringreader.jsr)
    return DocumentPreprocessor(jdp)
end

tokenizer = DocumentPreprocessor(stringreader)
LoadError: No constructor for edu.stanford.nlp.process.DocumentPreprocessor with signature (Ljava/io/StringReader;)V

I understand (Ljava/io/StringReader;)V means something like
void foo(stringreader::StringReader), i.e. the V indicates void return.

The DocumentPreprocessor documentation shows there is indeed no constructor that returns a void. But how could there be since it is constructing a DocumentPreprocessor object?

The argument of DocumentPreprocessor should be a Reader. I am calling it with a StringReader. I presume this is ok since StringReader is a subclass of Reader, and since that’s exactly what the demo does as well.

Just one question: why is this failing? TIA.

The DocumentPreprocessor documentation shows there is indeed no constructor that returns a void. But how could there be since it is constructing a DocumentPreprocessor object?

Unlike in Julia, constructors in Java aren’t just functions, but rather a special feature… a kind of. You define them like:

class Point {

    int x;
    int y;

    Point(int x, int y) {
        this.x = x;
        this.y = y; 
    }

}

Note that there’s no return type because you already know what it returns. In JavaDocs constructors are also listed separately from methods.

The argument of DocumentPreprocessor should be a Reader. I am calling it with a StringReader. I presume this is ok since StringReader is a subclass of Reader

Yes in Java, no in JVM. Java the language takes care of inheritance, but JVM requires from you to strictly follow the signature it has. Change this:

jdp = JDocumentPreprocessor((JStringReader,), stringreader.jsr)

to this:

JReader = @jimport java.io.Reader
jdp = JDocumentPreprocessor((JReader,), stringreader.jsr)

I can now translate the entire DependencyParserDemo to Julia.

To get access to the content of GrammaticalStructure gs, I have to apply some methods of edu.stanford.nlp.trees.GrammaticalStructure and, in particular, the instance method

Collection <TypedDependency> typedDependencies()

The call should be

gs = jcall(parser, "predict", JGrammaticalStructure, (JList,), tagged)
tdcollection = jcall(gs, "typedDependencies", ???, ())

What should the return type be in the call? Ordinarily, it would be something like

JCollection = @jimport java.util.Collection

but Collection is a java “interface” not a type or class.

Therefore, the following general question: What is the proper return type for a jcall call when the underlying java method returns a Collection?

I’d test it with JCollection. In general, it makes sense to inspect an object with listmethods(obj, "method_name") and follow the signature it outputs.

Let me know if it works.

I did. It didn’t. Am going to try some more.

Can you make a snippet for creating an instance of GrammaticalStructure so I could test it myself?

I think JCollections works but haven’t confirmed 100% yet. Will get back to this Sunday morning. Meanwhile, the following should be enough to get to GrammaticalStructure. Not cleaned up yet…

using JavaCall
JavaCall.init()

#MaxentTagger
JArrayList = @jimport java.util.ArrayList
JList = @jimport java.util.List
JHasWord = @jimport edu.stanford.nlp.ling.HasWord
JMaxentTagger = @jimport edu.stanford.nlp.tagger.maxent.MaxentTagger

type MaxentTagger
    jmet::JMaxentTagger
end

##Path is the location of parameter files for a trained tagger.
function MaxentTagger(path::AbstractString)
    jmet = JMaxentTagger((JString,), path)
    return MaxentTagger(jmet)
end

#DependencyParser
JDependencyParser = @jimport edu.stanford.nlp.parser.nndep.DependencyParser
JGrammaticalStructure = @jimport edu.stanford.nlp.trees.GrammaticalStructure
JCollection = @jimport java.util.Collection

#DependencyParser parser = DependencyParser.loadFromModelFile(modelPath);
function DependencyParser(modelPath::AbstractString)
    return narrow(jcall(JDependencyParser, "loadFromModelFile", JDependencyParser, (JString,), modelPath))
end

#GrammaticalStructure gs = parser.predict(tagged);
function predict(parser::JDependencyParser , tagged::JList)
    jcall(parser, "predict", JGrammaticalStructure, (JList,), tagged)
end

#StringReader
JStringReader = @jimport java.io.StringReader

type StringReader
    jsr::JStringReader
end

function StringReader(text::AbstractString)
    jsr = JStringReader((JString,), text)
    return StringReader(jsr)
end

#DocumentPreProcessor
JDocumentPreprocessor = @jimport edu.stanford.nlp.process.DocumentPreprocessor
JReader = @jimport java.io.Reader

type DocumentPreprocessor
    jdp::JDocumentPreprocessor
end

function DocumentPreprocessor(stringreader::StringReader)
    jdp = JDocumentPreprocessor((JReader,), stringreader.jsr)
    return DocumentPreprocessor(jdp)
end

#Test code
using JavaCoreNLP

modelPath = "/home/john/.julia/v0.5/JavaCoreNLP/jvm/corenlp-wrapper/target/edu/stanford/nlp/models/parser/nndep/english_UD.gz"
taggerPath = "/home/john/.julia/v0.5/JavaCoreNLP/jvm/corenlp-wrapper/target/edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger";

text = "I can almost always tell when movies use fake dinosaurs. This is a second sentence.  And this is a third and final sentence."

tagger = MaxentTagger(taggerPath)
parser = DependencyParser(modelPath)
stringreader  = StringReader(text)
tokenizer = DocumentPreprocessor(stringreader)

for sentence in JavaCall.iterator(tokenizer.jdp)
    s = convert(JList, sentence)
    tagged = tagSentence(tagger, s)
    gs = predict(parser, tagged)
    #read out gs
end