Guidance on POS Tagging and lemmatization approach

Hello, I’d like to do some basic NLP tasks (POS Tagging, lemmatization) in languages other than English. . Or, more specifically, German.

Currently, I am using SpaCy through PyCall with the german model. This is fine, however I need to tag lot sof items and this is pretty slow. I have not been able to paralellize it due to the GIL, however I think there might be ways to do this. So I guess I have the following options:

  • Keep using Python, try to use Distributed and pmap as shown here: Run multiple python instances with pycall in different threads - #2 by cjdoris This might require some heavy restructuring in my code: I have very long chains of functions with the python calls intertwinned. I don’t know if I can just slap @everywhere in front of every single function.

  • Try to use Transformers.jl with some state of the art model. However, I am kind of clueless as to how do to that. There is an example to encode a sentence, but not sure how to get from there to somewhere practical.

  • Since I don’t really need state-of-the-art but just something workable, I could port a simple Python package that achieves these tasks. This is tedious but at least a known quantity.

Does anyone have any insights or recommendations? Some package I overlooked or a simple solution?

Thanks in advance and I hope the post is not too unfocused.