I’m afraid your snippet misses implementation of tagSentence
and customizes some other details like model paths. Can you push your code e.g. to GitHub so we could dive right into working example?
##Will get back to this Sunday morning.
My mistake. I am on vacation this week and will have smartphone for communications but not laptop for coding. I have pushed the project as-is to Aether59/JavaCoreNLP.jl. It is rough. Feel free to chew on it or leave it for me to do next week.
Hmm, I tried the code, but the tokenizer returns an empty sentence iterator. Did you succeed to use it in the first place?
I succeeded to run it without errors. I did not have time to check the
contents. That’s next.
The version here contains the DependencyParserDemo (DependencyParserDemo.jl
) and the basic pipeline demo (runtests.jl
). They both run and give sensible results.
I replaced the custom paths to modelPath
and taggerPath
with paths that subtend from Pkg.Dir.path()
. If there is a better solution, please let me know and I will implement.
I will clean this up but for now, I believe it is a starting point.
Looks good!
You can get a subpath of package directory using:
Pkg.dir("JavaCoreNLP", "jvm", "corenlp-wrapper", "target", "edu", "stanford", "nlp", "models", "parser", "nndep", "english_UD.gz")
This will also work with Windows path separator. Even better solution would be to make paths relative to @__DIR__
- this additionally covers cases of non-standard package installation.
Also, consider adjusting names according to Julia naming conventions. Most developers follow them (with some differences in using underscores / squashing names) which makes it much easier to read someone else’s code.
I have–
- Implemented the
Pkg.dir
solution above. (@__DIR__
not supported in v0.5.) - Implemented Julia naming conventions.
- Added README.md text.
@dfdx would you rather leave this here or pull it into your repo?
It’s time to introduce Compat
package to you Compat
makes features from different Julia versions available on other versions, thus providing a good level of compatibility.
# Julia 0.5
using Compat
@__DIR__ # works fine
would you rather leave this here or pull it into your repo?
Given that you are now the main driving force of the package I suggest you to keep the changes in your repo. Later we may want to move it to JuliaText
organization for easier discovery (if guys from it won’t mind, of course).
The package is now expanded to include nearly all annotators. pipelines.jl
lets you choose one pipeline and one text and then process to confirm pipeline working and the form of the output. scaling.jl
lets you run one pipeline on small, medium, and large texts to see the scaling. A typical output is shown below.
It took me a little more than a “few days” but I needed to learn a few things.
Any critique/suggestions welcome.
Tokens=========
index= 1 word= The lemma= the pos= DT ner= O
index= 2 word= success lemma= success pos= NN ner= O
index= 3 word= of lemma= of pos= IN ner= O
index= 4 word= our lemma= we pos= PRP$ ner= O
index= 5 word= economy lemma= economy pos= NN ner= O
index= 6 word= has lemma= have pos= VBZ ner= O
index= 7 word= always lemma= always pos= RB ner= O
index= 8 word= depended lemma= depend pos= VBD ner= O
index= 9 word= not lemma= not pos= RB ner= O
index= 10 word= just lemma= just pos= RB ner= O
index= 11 word= on lemma= on pos= IN ner= O
index= 12 word= the lemma= the pos= DT ner= O
index= 13 word= size lemma= size pos= NN ner= O
index= 14 word= of lemma= of pos= IN ner= O
index= 15 word= our lemma= we pos= PRP$ ner= O
index= 16 word= Gross lemma= Gross pos= NNP ner= O
index= 17 word= Domestic lemma= Domestic pos= NNP ner= O
index= 18 word= Product lemma= Product pos= NNP ner= O
index= 19 word= , lemma= , pos= , ner= O
index= 20 word= but lemma= but pos= CC ner= O
index= 21 word= on lemma= on pos= IN ner= O
index= 22 word= the lemma= the pos= DT ner= O
index= 23 word= reach lemma= reach pos= NN ner= O
index= 24 word= of lemma= of pos= IN ner= O
index= 25 word= our lemma= we pos= PRP$ ner= O
index= 26 word=prosperity lemma=prosperity pos= NN ner= O
index= 27 word= ; lemma= ; pos= : ner= O
index= 28 word= on lemma= on pos= IN ner= O
index= 29 word= the lemma= the pos= DT ner= O
index= 30 word= ability lemma= ability pos= NN ner= O
index= 31 word= to lemma= to pos= TO ner= O
index= 32 word= extend lemma= extend pos= VB ner= O
index= 33 word=opportunity lemma=opportunity pos= NN ner= O
index= 34 word= to lemma= to pos= TO ner= O
index= 35 word= every lemma= every pos= DT ner= O
index= 36 word= willing lemma= willing pos= JJ ner= O
index= 37 word= heart lemma= heart pos= NN ner= O
index= 38 word= - lemma= - pos= : ner= O
index= 39 word= not lemma= not pos= RB ner= O
index= 40 word= out lemma= out pos= IN ner= O
index= 41 word= of lemma= of pos= IN ner= O
index= 42 word= charity lemma= charity pos= NN ner= O
index= 43 word= , lemma= , pos= , ner= O
index= 44 word= but lemma= but pos= CC ner= O
index= 45 word= because lemma= because pos= IN ner= O
index= 46 word= it lemma= it pos= PRP ner= O
index= 47 word= is lemma= be pos= VBZ ner= O
index= 48 word= the lemma= the pos= DT ner= O
index= 49 word= surest lemma= surest pos= JJS ner= O
index= 50 word= route lemma= route pos= NN ner= O
index= 51 word= to lemma= to pos= TO ner= O
index= 52 word= our lemma= we pos= PRP$ ner= O
index= 53 word= common lemma= common pos= JJ ner= O
index= 54 word= good lemma= good pos= NN ner= O
index= 55 word= . lemma= . pos= . ner= O
Parse tree=========
(ROOT
(S
(NP
(NP (DT The) (NN success))
(PP (IN of)
(NP (PRP$ our) (NN economy))))
(VP (VBZ has)
(ADVP (RB always))
(VP (VBD depended)
(PP
(CONJP (RB not) (RB just))
(PP (IN on)
(NP
(NP (DT the) (NN size))
(PP (IN of)
(NP (PRP$ our) (NNP Gross) (NNP Domestic) (NNP Product)))))
(, ,)
(CC but)
(PP (IN on)
(NP
(NP (DT the) (NN reach))
(PP (IN of)
(NP (PRP$ our) (NN prosperity))))))
(: ;)
(S
(PP (IN on)
(NP (DT the) (NN ability)))
(VP (TO to)
(VP (VB extend)
(NP
(NP
(NP (NN opportunity))
(PP (TO to)
(NP (DT every) (JJ willing) (NN heart))))
(: -)
(UCP (RB not)
(PP (IN out)
(PP (IN of)
(NP (NN charity))))
(, ,)
(CC but)
(SBAR (IN because)
(S
(NP (PRP it))
(VP (VBZ is)
(NP
(NP (DT the) (JJS surest) (NN route))
(PP (TO to)
(NP (PRP$ our) (JJ common) (NN good))))))))))))))
(. .)))
Dependencies=========
-> depended/VBD (root)
-> success/NN (nsubj)
-> The/DT (det)
-> economy/NN (nmod)
-> of/IN (case)
-> our/PRP$ (nmod:poss)
-> has/VBZ (aux)
-> always/RB (advmod)
-> size/NN (dobj)
-> not/RB (neg)
-> just/RB (advmod)
-> on/IN (case)
-> the/DT (det)
-> Product/NNP (nmod)
-> of/IN (case)
-> our/PRP$ (nmod:poss)
-> Gross/NNP (compound)
-> Domestic/NNP (compound)
-> ,/, (punct)
-> but/CC (cc)
-> reach/NN (conj)
-> on/IN (case)
-> the/DT (det)
-> prosperity/NN (nmod)
-> of/IN (case)
-> our/PRP$ (nmod:poss)
-> ;/: (punct)
-> ability/NN (nmod)
-> on/IN (case)
-> the/DT (det)
-> extend/VB (acl)
-> to/TO (mark)
-> opportunity/NN (dobj)
-> heart/NN (nmod)
-> to/TO (case)
-> every/DT (det)
-> willing/JJ (amod)
-> -/: (punct)
-> charity/NN (nmod)
-> not/RB (neg)
-> out/IN (case)
-> of/IN (case)
-> ,/, (punct)
-> but/CC (cc)
-> route/NN (conj)
-> because/IN (mark)
-> it/PRP (nsubj)
-> is/VBZ (cop)
-> the/DT (det)
-> surest/JJS (amod)
-> good/NN (nmod)
-> to/TO (case)
-> our/PRP$ (nmod:poss)
-> common/JJ (amod)
-> ./. (punct)
-> depended/VBD (root)
-> success/NN (nsubj)
-> The/DT (det)
-> economy/NN (nmod:of)
-> of/IN (case)
-> our/PRP$ (nmod:poss)
-> has/VBZ (aux)
-> always/RB (advmod)
-> size/NN (dobj)
-> not/RB (neg)
-> just/RB (advmod)
-> on/IN (case)
-> the/DT (det)
-> Product/NNP (nmod:of)
-> of/IN (case)
-> our/PRP$ (nmod:poss)
-> Gross/NNP (compound)
-> Domestic/NNP (compound)
-> ,/, (punct)
-> but/CC (cc)
-> reach/NN (conj:but)
-> on/IN (case)
-> the/DT (det)
-> prosperity/NN (nmod:of)
-> of/IN (case)
-> our/PRP$ (nmod:poss)
-> ;/: (punct)
-> ability/NN (nmod:on)
-> on/IN (case)
-> the/DT (det)
-> extend/VB (acl)
-> to/TO (mark)
-> opportunity/NN (dobj)
-> heart/NN (nmod:to)
-> to/TO (case)
-> every/DT (det)
-> willing/JJ (amod)
-> -/: (punct)
-> charity/NN (nmod:out_of)
-> not/RB (neg)
-> out/IN (case)
-> of/IN (mwe)
-> ,/, (punct)
-> but/CC (cc)
-> route/NN (conj:but)
-> because/IN (mark)
-> it/PRP (nsubj)
-> is/VBZ (cop)
-> the/DT (det)
-> surest/JJS (amod)
-> good/NN (nmod:to)
-> to/TO (case)
-> our/PRP$ (nmod:poss)
-> common/JJ (amod)
-> ./. (punct)
-> depended/VBD (root)
-> success/NN (nsubj)
-> The/DT (det)
-> economy/NN (nmod:of)
-> of/IN (case)
-> our/PRP$ (nmod:poss)
-> has/VBZ (aux)
-> always/RB (advmod)
-> size/NN (dobj)
-> not/RB (neg)
-> just/RB (advmod)
-> on/IN (case)
-> the/DT (det)
-> Product/NNP (nmod:of)
-> of/IN (case)
-> our/PRP$ (nmod:poss)
-> Gross/NNP (compound)
-> Domestic/NNP (compound)
-> ,/, (punct)
-> but/CC (cc)
-> reach/NN (conj:but)
-> on/IN (case)
-> the/DT (det)
-> prosperity/NN (nmod:of)
-> of/IN (case)
-> our/PRP$ (nmod:poss)
-> ;/: (punct)
-> ability/NN (nmod:on)
-> on/IN (case)
-> the/DT (det)
-> extend/VB (acl)
-> to/TO (mark)
-> opportunity/NN (dobj)
-> heart/NN (nmod:to)
-> to/TO (case)
-> every/DT (det)
-> willing/JJ (amod)
-> -/: (punct)
-> charity/NN (nmod:out_of)
-> not/RB (neg)
-> out/IN (case)
-> of/IN (mwe)
-> ,/, (punct)
-> but/CC (cc)
-> route/NN (conj:but)
-> because/IN (mark)
-> it/PRP (nsubj)
-> is/VBZ (cop)
-> the/DT (det)
-> surest/JJS (amod)
-> good/NN (nmod:to)
-> to/TO (case)
-> our/PRP$ (nmod:poss)
-> common/JJ (amod)
-> route/NN (nmod:on)
-> ./. (punct)
root(ROOT-0, depended-8)
det(success-2, The-1)
nsubj(depended-8, success-2)
case(economy-5, of-3)
nmod:poss(economy-5, our-4)
nmod(success-2, economy-5)
aux(depended-8, has-6)
advmod(depended-8, always-7)
neg(size-13, not-9)
advmod(size-13, just-10)
case(size-13, on-11)
det(size-13, the-12)
dobj(depended-8, size-13)
case(Product-18, of-14)
nmod:poss(Product-18, our-15)
compound(Product-18, Gross-16)
compound(Product-18, Domestic-17)
nmod(size-13, Product-18)
punct(depended-8, ,-19)
cc(depended-8, but-20)
case(reach-23, on-21)
det(reach-23, the-22)
conj(depended-8, reach-23)
case(prosperity-26, of-24)
nmod:poss(prosperity-26, our-25)
nmod(reach-23, prosperity-26)
punct(reach-23, ;-27)
case(ability-30, on-28)
det(ability-30, the-29)
nmod(reach-23, ability-30)
mark(extend-32, to-31)
acl(ability-30, extend-32)
dobj(extend-32, opportunity-33)
case(heart-37, to-34)
det(heart-37, every-35)
amod(heart-37, willing-36)
nmod(extend-32, heart-37)
punct(ability-30, --38)
neg(charity-42, not-39)
case(charity-42, out-40)
case(charity-42, of-41)
nmod(ability-30, charity-42)
punct(ability-30, ,-43)
cc(ability-30, but-44)
mark(route-50, because-45)
nsubj(route-50, it-46)
cop(route-50, is-47)
det(route-50, the-48)
amod(route-50, surest-49)
conj(ability-30, route-50)
case(good-54, to-51)
nmod:poss(good-54, our-52)
amod(good-54, common-53)
nmod(route-50, good-54)
punct(depended-8, .-55)
root(ROOT-0, depended-8)
det(success-2, The-1)
nsubj(depended-8, success-2)
case(economy-5, of-3)
nmod:poss(economy-5, our-4)
nmod:of(success-2, economy-5)
aux(depended-8, has-6)
advmod(depended-8, always-7)
neg(size-13, not-9)
advmod(size-13, just-10)
case(size-13, on-11)
det(size-13, the-12)
dobj(depended-8, size-13)
case(Product-18, of-14)
nmod:poss(Product-18, our-15)
compound(Product-18, Gross-16)
compound(Product-18, Domestic-17)
nmod:of(size-13, Product-18)
punct(depended-8, ,-19)
cc(depended-8, but-20)
case(reach-23, on-21)
det(reach-23, the-22)
conj:but(depended-8, reach-23)
case(prosperity-26, of-24)
nmod:poss(prosperity-26, our-25)
nmod:of(reach-23, prosperity-26)
punct(reach-23, ;-27)
case(ability-30, on-28)
det(ability-30, the-29)
nmod:on(reach-23, ability-30)
mark(extend-32, to-31)
acl(ability-30, extend-32)
dobj(extend-32, opportunity-33)
case(heart-37, to-34)
det(heart-37, every-35)
amod(heart-37, willing-36)
nmod:to(extend-32, heart-37)
punct(ability-30, --38)
neg(charity-42, not-39)
case(charity-42, out-40)
mwe(out-40, of-41)
nmod:out_of(ability-30, charity-42)
punct(ability-30, ,-43)
cc(ability-30, but-44)
mark(route-50, because-45)
nsubj(route-50, it-46)
cop(route-50, is-47)
det(route-50, the-48)
amod(route-50, surest-49)
conj:but(ability-30, route-50)
case(good-54, to-51)
nmod:poss(good-54, our-52)
amod(good-54, common-53)
nmod:to(route-50, good-54)
punct(depended-8, .-55)
root(ROOT-0, depended-8)
det(success-2, The-1)
nsubj(depended-8, success-2)
case(economy-5, of-3)
nmod:poss(economy-5, our-4)
nmod:of(success-2, economy-5)
aux(depended-8, has-6)
advmod(depended-8, always-7)
neg(size-13, not-9)
advmod(size-13, just-10)
case(size-13, on-11)
det(size-13, the-12)
dobj(depended-8, size-13)
case(Product-18, of-14)
nmod:poss(Product-18, our-15)
compound(Product-18, Gross-16)
compound(Product-18, Domestic-17)
nmod:of(size-13, Product-18)
punct(depended-8, ,-19)
cc(depended-8, but-20)
case(reach-23, on-21)
det(reach-23, the-22)
conj:but(depended-8, reach-23)
case(prosperity-26, of-24)
nmod:poss(prosperity-26, our-25)
nmod:of(reach-23, prosperity-26)
punct(reach-23, ;-27)
case(ability-30, on-28)
det(ability-30, the-29)
nmod:on(reach-23, ability-30)
mark(extend-32, to-31)
acl(ability-30, extend-32)
dobj(extend-32, opportunity-33)
case(heart-37, to-34)
det(heart-37, every-35)
amod(heart-37, willing-36)
nmod:to(extend-32, heart-37)
punct(ability-30, --38)
neg(charity-42, not-39)
case(charity-42, out-40)
mwe(out-40, of-41)
nmod:out_of(ability-30, charity-42)
punct(ability-30, ,-43)
cc(ability-30, but-44)
mark(route-50, because-45)
nsubj(route-50, it-46)
cop(route-50, is-47)
det(route-50, the-48)
amod(route-50, surest-49)
nmod:on(reach-23, route-50)
conj:but(ability-30, route-50)
case(good-54, to-51)
nmod:poss(good-54, our-52)
amod(good-54, common-53)
nmod:to(route-50, good-54)
punct(depended-8, .-55)
Relations Triples=========
Mentions=========
to our common good
The success of our economy
our
our economy
not just on the size of our Gross Domestic Product
our
our Gross Domestic Product
the reach of our prosperity ; on the ability to extend opportunity to every willing heart - not out of charity , but because it is the surest route to our common good
our
our prosperity
the ability to extend opportunity to every willing heart - not out of charity , but because it is the surest route to our common good
opportunity
to every willing heart
not out of charity
it
the surest route to our common good
our
Coref Chains=========
CHAIN32-["our government" in sentence 2, "it" in sentence 2, "it" in sentence 2, "their government" in sentence 5]
CHAIN2-["the cynics" in sentence 1, "them" in sentence 1]
CHAIN50-["the market" in sentence 6, "Its" in sentence 7, "this crisis" in sentence 7, "the market" in sentence 7]
CHAIN66-["our Gross Domestic Product" in sentence 8, "it" in sentence 8]
CHAIN20-["the answer" in sentence 3, "the answer" in sentence 4]
CHAIN6-["families" in sentence 2, "they" in sentence 2]
CHAIN28-["us" in sentence 1, "we" in sentence 2, "our" in sentence 2, "we" in sentence 3, "us" in sentence 5, "our" in sentence 5, "we" in sentence 5, "us" in sentence 6, "us" in sentence 7, "our" in sentence 8, "our" in sentence 8, "our" in sentence 8, "our" in sentence 8]
CHAIN45-["a nation" in sentence 7, "it" in sentence 7]
CHAIN31-["a people" in sentence 5, "their" in sentence 5]
Annotation pipeline timing information:
TokenizerAnnotator: 0.1 sec.
WordsToSentencesAnnotator: 0.0 sec.
POSTaggerAnnotator: 0.1 sec.
MorphaAnnotator: 0.1 sec.
ParserAnnotator: 8.7 sec.
DependencyParseAnnotator: 1.3 sec.
NERCombinerAnnotator: 2.3 sec.
NaturalLogicAnnotator: 0.3 sec.
MentionAnnotator: 0.0 sec.
CorefAnnotator: 3.0 sec.
OpenIE: 2.7 sec.
TOTAL: 18.6 sec. for 265 tokens at 14.2 tokens/sec.