Hey all, I’m a statistician and I’m pretty proficient in R and Python, I want to hear all of the reasons I should look into Julia.
If you need to be convinced to switch to a programming language it’s probably not a good idea to do so. I would start by evaluating what you don’t like about your current workflow in R and Python. If you’re happy with it that’s awesome, if not you could see if Julia performs better in that aspects.
It’s less about having to convince me and more just I want to hear what the community loves about Julia. I just didn’t know a better way to ask haha.
I think you find many many threads here on discourse where people express their love to the language.
But to answer your implicit question: I personally love the new ways multiple dispatch provides to structure code and logic and make them sharable unlike any other language does.
Watch this: JuliaCon 2019 | The Unreasonable Effectiveness of Multiple Dispatch | Stefan Karpinski - YouTube
I use R for my research on a daily basis and after I start using Julia I haven’t needed to use Rcpp ever since. Writing R and cpp at the same time has been really a big distraction to my workflow.
Also, there are a lot of legacy statistics code written in Matlab which can be easily translated into Julia.
We get “convince me to switch to Julia” posts every week.
My buddy sent me this link today morning
Would you be posting here if you didn’t already deep down know that Julia was a better option?
This is actually the article that got me more interested in Julia!
The article gives a summary of the key features of Julia.
Having read that, the only way for you decide if it’s right for you is to download Julia, and your favorite editor (I like Juno), and test it out yourself.
I learned a bit from: https://people.smp.uq.edu.au/YoniNazarathy/julia-stats/StatisticsWithJulia.pdf
You don’t need Numpy.
For Julia, it’s built-in.
The syntax of Julia is a lot cleaner than Python or R. Python has a proliferation of dots (.), and you get a proliferation of .dot conventions depending on who wrote it. “Hmm, is it .min() or .minimum() for this one?”
Julia has a missing datatype. Python, depending on the author, can represent missing data with np.NAN, the python None, or Pandas.NA. It’s annoying.
Python, for being a dynamic language, you stumble over datatypes all the time. You call a function and get an error back, “No, no, I didn’t want that datatype. You need to send me this other datatype.” For example, I send a function:
- Integers, and it wanted floats
- booleans, and it wanted 1/0
- 1/0, and it wanted booleans
Julia eliminates a lot of that due to multiple dispatch. Say a function natively works with 1’s and 0’s. The user sends it a column of booleans. No problem, a short method can be written to convert those booleans to 1/0 and submit to the function. Now, that working depends on the function authors writing those methods to do those conversions – but this is the norm, not the exception.
Not sure Julia should be recommended over Python if one is concerned about a proliferation of dots
I wonder whether some aspects of Julia compared to other languages is just because it’s younger. The older Julia gets, the more functionality will be there. The same holds true for any other language. As soon as people start to write their own functions, it’ll grow, and it possibly grows into another direction the authors of the language intended. @blackeneth points out that there are several types of missings in python. Aren’t there already multiple types in Julia? Each one serves its own purpose. Some follow a new path. It’s just like evolution. Time will tell which one superseded the others. It’ll show whether the necessity is there for multiple types. Heterogeneity of some sort is good. If it’s equal, I guess it’ll become equal over time or gets lost in space. Just consider how many different data.frame approaches there are in R. It always starts with someone who’s missing something, implements the missing part, renames it, and future will tell what wins. Let’s have a look into Linux and what can happen if the community does not pull the same string. You end up having hundreds of different linuxes. Yes, a lot of things are much nicer in Julia than in other languages, and I don’t want to miss Julia. I love what it has become. It was promising in 2014 when I got to know the language, and it superseded my expectations a lot. Nowadays I ask myself whether R or Python has a raison d’être. Up until now, there is, and there might be (probably) in the future. Julia lacks a lot of things. And I guess we all like to share the things we hate about Julia the same way we love to talk about the things we love. What I miss in either language is that there aren’t many people sharing their workflow. You’ll develop your own workflow over time in each language, but coming from R, Matlab or Python, Julia is different. You’ll need to adapt to new workflows in either language. It is going to be hard, but you’ll love some things. Once you go back to other languages, you will really realize what you’re missing. Only time will tell whether writing this language was worth the effort, was worth all the sweat and blood. There were many languages of which people thought, wow, this is going to be the future, but it wasn’t. but there are languages which survived because they had a future, people believed in it, and the language showed them a new horizon. Be part of a future which might be there, or don’t. As long as you accomplish your task, there’s no reason to switch. Your work is not about the language but about the work itself. Noone will appreciate your work because it’s done in python and Fortran when it could have been in Julia. If it saves you time, if it gives you joy, if you get the results you want, then switch. The main reason, imo, not to switch to Julia is the package availability, but Julia is not the first language, and you can always go back just for that specific task to another language, Julia needs time to grow and to mature, and only your help can improve the environment everybody is programming in.
Julia has a really good solution to the expression problem. Very few languages do.
If your programs grow to a complexity where you start banging your head against the wall because you spend way too much time refactoring and rewriting you code over and over again to do something you think should be simple. Simple things like, adding a new type that works with all your current operators/methods smoothly. Or adding a new operator and having to edit n files where n is the number of types you have defined. You may be ready for Julia.
Bored of writing the same boiler-plate over and over. You may be ready for Julia.
If you stress yourself out trying to figure out if a circle isa ellipse or an ellipse isa circle and keep your code optimally performant and flexible. You may be ready for Julia.
Shape A collides with Shape B where A is a Box and B is a Sphere or some other Shapes. If you think solving this problem by writing 1 method and no new data should be normal… You may be ready for Julia.
Well, there are two variants, missing
and nothing
, and they serve different purposes, as you say. It would be more of a problem if they served overlapping purposes, I don’t know if that is the case in Python. But it seems problematic that numpy needs its own Nan, and Pandas its own NA, instead of sharing common types across the language. Can you mix these two, NA inside a numpy array, for example?
The Python None
corresponds to Julia’s nothing
.
NaN is following the IEEE 754 definition for floats, both in Python and Julia. As Numpy and Python core have different float implementations, both have their own NaN.
There is no real equivalent of Julia’s missing
in Python to my knowledge.
The main issue in Python is that NaN is often used to signal missing or not existent entires, i.e. used in a different way as intended in IEEE 754. Furthermore, NaN is only defined for floats, therefore e.g. integer fields with undefined values may need to be casted to floats.
Pandas 1.0 recently introduces pd.NA
to address this issue, but it is still experimental.
In Julia, you can use Union{T, Missing}
or Union{T, Nothing}
for all types T
.
This is running the risk of derailing the thread (if that’s possible given the broad question that started it!), but Missing
isn’t the only approach to missing data in Julia, there’s also the Query.jl
approach:
The fact that you can define your own “missing data” type with the desired semantics and easily make it as fast as the “built-in” one is one of the most appealing features of Julia.
Generally, in the long run it should not matter what a language has available out of the box, but how easy it is to add new things seamlessly. Very few languages even compare to Julia for its combination of expressiveness and speed.
I you like R and statistics, I have some examples where Julia is better (in my opinion) here :