Julia motivation for machine learning

Out of curiosity, in the domain of machine learning and NLP -

  1. what would be the reasons to choose julia over python?

  2. What would be the reasons to choose python over julia?

  3. What are some of the gaps in julia as compared to python?

4 Likes

1.) When coding in julia, you machine learning library is likely in julia. You can inspect the code, understand it, modify it, debug it. You do not have to write your code in a DSL using a subset of the host language, you can write pretty much any valid julia code and use it together with AD. This holds to such an extent that learning works even if it’s an afterthought, using a model that was originally built for simulation only.

You can write fast code easily.

7 Likes

If we consider a traditional NLP pipeline:

  • Sentence Segment / Tokenize (WordTokenizers.jl; though useless if you need to do Chinese etc)
  • POS Tag (WIP https://github.com/JuliaText/TextAnalysis.jl/pull/131)
  • Parse (We got nothing, pycalling NLTK at least works)
  • Named Entity Recognize (we got nothing)
  • Word Sense Disambiguate (We got nothing, but nor does python. WSD remains an open problem.)
  • Coreference Resolution linking multiple references to the same named entity e.g. pronouns. (We got nothing. Python has a few things)

Then you can see we have a number of gaps.
This GSOC @avik and I and others will be mentoring a few students to try and close up some of those gaps.

Of course, in a deep learning, throw out the last 90 years of linguistics approach.
Actually you need very little of the standard pipeline.
And can be happy enough with your Tokenization,
plus some pretrained embeddings (Embeddings.jl).
and a kickass Deep Learning Library (strong vote for Flux.jl),
and and a useful set of data tools (MLDataUtils.jl)

Just to throw out a few more packages

  • TextAnalysis.jl: has a number of things, including LSI, various text cleaning and Sentiment analysis
  • WordNet.jl which is a fairly reasonable WordNet front end
  • CorpusLoaders.jl has some loaders and predefined datadeps for some data (so does MLDatasets.jl)
  • MultiResolutionIterators.jl is how i think text data should be represented.
6 Likes

Look at all these opportunities for someone to write useful libraries! If I weren’t up to my armpits in my package/work/research I know I’d be looking at some of these gaps.

2 Likes

I think that most Julia packages were written mostly because the author had a problem to solve (in work/research), not because they wanted to make a contribution to fill a gap.

So I imagine these gaps will be filled when someone needs the functionality bad enough. :wink:

2 Likes

You make a good point. My package was half born out of the fact that I needed random forests, regression, etc IN JULIA. And yea, Julia is currently my research base for good reason. I do ML research in general though, so pretty much any gap in ML capabilities are of interest to me.

Making a package to make a package doesn’t have the same heart and soul in it. Unless someones genuinely interested in the problems contained therein. Good chance they won’t maintain/improve it as time rolls on.

To be fair, most of the things available in python could readily be improved in julia, that’s motivation enough in my opinion to get cracking :D.

3 Likes

Many of those packages I listed are me building things after I needed them.
Like I will complete a project, and some time later be like:
“That was a hell of a hack, lets make the stuff I wish I had when I started”

10 Likes