Stability of Julia

PetrKryslUCSD · October 10, 2022, 2:52am

I think you missed an important point: if a certain constellation of packages worked for you in 2022, recording the project configuration in the Project and Manifest files means that it will also work at any time in the future.

jar1 · October 10, 2022, 2:55am

That means pinning everything to an old version, which isn’t what people mean by stability.

austin-putz · October 10, 2022, 3:00am

Mostly tidyverse as of course that’s 90%+ of what I do and most within our company. Yet of course lots more like mixed model software and lots of other statistical packages we rely on. Just seemed like there was issues with packages being maintained so I still warn people not to move to Julia because I never knew if it would be stable in terms of packages being maintained long term. Seems like from discussion, it’s more stable than it was due to 1.0 but still really hasn’t been answered directly. I guess time will tell as we move to 2.0 and 3.0, etc. Until then, it’s difficult for many of us to move over, even if we can keep old software running.

Thanks for the tip. I’ve simply given up, I guess I just ask each on discourse or other platform like Reddit. Just no idea why devs get so sensitive… Maybe it happens in all languages, but if I even hit that they could include more documentation I just get blown away with hate. I mean if someone asks me a question I just add to my documentation, as it must not have been good enough, I guess that’s a problem for most devs. I couldn’t even figure out how to change types in CSV.read() when I started and the dev got mad I emailed him… Just not a great community for documentation help.

PetrKryslUCSD · October 10, 2022, 3:02am

On the contrary, that is what stability means. You can always reproduce the calculation that was done way back.

Fortran is usually seen as making a commitment of backward compatibility. Unfortunately, that only applies to the source code, which indeed can mostly be compiled even when it is very old. However, it certainly does not apply to libraries. And certainly not to data. There is no way I can reproduce now calculations I did
with my own Fortran codes, because the libraries I used to use either have changed or are no longer accessible.

I will take the Julia compatibility model any time.

PetrKryslUCSD · October 10, 2022, 3:10am

YMMV, but that is not the experience of everyone here. I hope you realize that your statement is an unwarranted generalization.

jar1 · October 10, 2022, 3:12am

Tidyverse is mostly covered by Base and DataFrames.jl, especially after the next version DataFrames 1.5 (based on the roadmap). Last I looked the missing pieces I noticed were mostly functions from Forcats which could be added to CategoricalArrays.jl and tidyr’s rectangling functions. AlgebraOfGraphics is less mature than ggplot2, but still quite usable.

The linear models ecosystem is much stronger in R imho and I expect it’ll be years before Julia has the complete toolkit at a comparable level of maturity.

dlakelan · October 10, 2022, 3:37am

DataFrames.jl and DataFramesMeta.jl have been very consistent and stable since 2019 when I finally made the leap you are contemplating. CSV.jl is excellent for reading separated value type data files. I’m planning to look into Arrow.jl for handling big binary column oriented files. I hear good things.

For plots I’ve tried a number of packages and I keep coming back to StatsPlots.jl / Plots.jl (StatsPlots is just Plots.jl + some extra plot types). It works, it’s very Julian, and I don’t miss the serious serious problems that ggplot2 has for programmatically constructing plots (did you know for example that aes is deprecated? do you know the difference between aestr and aes_ and all the different things? And if you want to loop over the names of a dataset and plot pairs of vars… it can be a huge pain, particularly if the vars have weird names.

When it comes to statistical models I’m a hard-core Bayesian and I tend to go straight to Turing.jl. But sometimes you don’t need that you just need to fit a simple linear OLS type model, GLM.jl and MixedModels.jl are both good for that, probably not as comprehensive as what’s in R, but as I said, I truly believe if you are going to do anything more complicated than a simple regression like y ~ a + b + c type stuff that you’d do in R, you should move to Turing.jl, even if you don’t wind up sampling and just do Optimization to get a point estimate.

I experienced some of the stuff you are worried about back in 2017 and 2018 or so when I first played with Julia. In 2019 I jumped in with two feet, and haven’t looked back. I now don’t run R at all, and I’d been using it as a daily driver since 1998.

DNF · October 10, 2022, 7:16am

Julia adheres to ‘semantic versioning’, which is a guarantee that there should be no breaking changes without updating the major version number. To be precise:

Given a version number MAJOR.MINOR.PATCH, increment the:

MAJOR version when you make incompatible API changes

MINOR version when you add functionality in a backwards compatible manner

PATCH version when you make backwards compatible bug fixes

There’s always a risk of bugs being introduced, or that you may have relied on something which is in fact a bug, but this is a pretty good guarantee.

This is probably many years away.

Are you sure you’re not exaggerating the reaction? And have you carefully considered the tone of your requests?

Also, you should know that e-mailing devs is not considered the right way of going about this in modern open source development. If there is missing documentation, open an issue at the github/gitlab/etc. repository. If you want support, ask questions at discourse/stackoverflow/slack/zulip. E-mailing devs directly is often seen as ‘poor form’, which may prompt a negative reaction, especially if you are persistent and come across as demanding.

oheil · October 10, 2022, 9:29am

What you describe here is exactly the type of problems I have with R. Whenever I need to use a new package because of some recent publication I need a recent R version and I am 100% sure that the old packages I’m using for some pipelines will not work on this new R version. (admitted this is not true for ggplot2).

In general this is somehow expected, but, with Julia I have full control over versions using environments, pinning, completely self contained Julia versions (every installed Julia version does not interfere with others).

Maturity is great, but this type of maturity, we are talking about here, is not yet achieved for Julia v1 and Julia v2 will not be the solution for this, I am pretty sure. And I doubt that it is the state in R, not for recent publications, and it will never be, because the process of development doesn’t enforce this (PhDs do not think of this until it’s to late). For me it’s better to adapt to the fast changes and freeze something as I need it. And this can be done in Julia with ease, not so in R and not so in Python. Probably Matlab is more of this type of maturity and stability, but I know not much about it.

sylvaticus · October 10, 2022, 11:05am

A very quick answer: yes, julia and many of the most used packages, having reached 1.x version, are now stable. Of course, it is a “promise” , but it is a promise that wasn’t made before…

austin-putz · October 10, 2022, 1:27pm

I’ve made every plot I can think of for almost 10 years (started in 2013 with ggplot2). So I think your ggplot2 attack is confusing to me. Don’t care about all Hadley’s issues with his syntax. He has changed functions, but often into a new package (reshape2 → tidyr), some breaking changes I will admit in dplyr and tidyr from time to time, but very well documented unlike almost every package in Julia.

Thank you for the rest of the information, I will consider these packages if we move some items to Julia. I especially need a function for quadratic programming (quadprog I think we use in R, constrained linear regression problem that constrains output % to sum to 1). As this takes many hours on HPC with R and I’m sure would be very quick with Julia. Thanks for your comments, I’ll consider all of them.

austin-putz · October 10, 2022, 1:31pm

Thanks for your comments. I don’t have that experience with R. Of course Hadley and the tidyverse have changed things over time that make them breaking I guess (even recently I noticed there is no top_n() I think it’s now some slice functions). This is maybe what you refer to, but all in all, I’ve made the same plots in ggplot2 for 10 years and it’s well maintained, my worry is UnicodePlots will be gone the next time we need to code something, leaving me yet again having to learn a new plotting package for simple plots. Maybe I chose the wrong packages before and I did learn a lot before v1, so perhaps I just need to dive back in, but wanted to ask the question from real users, I’m an outsider today wanting to look back in but hard to see what has changed and if devs are staying around for the long haul. We’ve already had many in my field give up on Julia do to changes so it’s hard for us to go back to it. JWAS is one we use heavily and still maintained, but others have abandoned it and frustrating for users.

austin-putz · October 10, 2022, 1:36pm

Thanks for the info. I appreciate it.

Yeah, I don’t understand the tone part… I guess that’s up for interpretation. Tone is something you can tell when you talk to someone, but I just feel devs get very defensive. If the documentation doesn’t exist, why do devs get so defensive? For me, it’s easier to write documentation the 1st time, than respond to emails and have to go back and forth that way. I didn’t know we can’t email devs, just seems odd I can’t email and ask a question or suggest a documentation addition. R is poorly documented in many areas and it runs circles around Julia documentation (at least the many times I tried learning Julia). Hadley really improved with all of his documentation (cheat sheets, intro to x package online, etc). I’m just always so curious why devs spend years developing packages and yet there is hardly any documentation for much of it online. Maybe that has changed in the last 3-4 years, I’ve been in industry and haven’t had time to go back and learn, why I asked this question.

austin-putz · October 10, 2022, 1:37pm

Your reaction is exactly my point.

pdeffebach · October 10, 2022, 1:40pm

Here is the “Examples” section of the CSV.jl documentation. It seems very robust to me.

austin-putz · October 10, 2022, 1:54pm

This was like 3 years ago I was talking about. If most packages have gotten a lot better, that’s what I was looking to find out with my question.

pdeffebach · October 10, 2022, 2:00pm

Then things have gotten better, yes.

I personally put a lot of work into a tutorial for dplyr users looking to get started with DataFramesMeta. See here.

dlakelan · October 10, 2022, 2:17pm

ggplot2 and all of the Hadleyverse makes heavy use of the nonstandard evaluation in R. This is also known as fexpr in the LISP tradition where it was first described Fexpr - Wikipedia

Fexpr have widely been regarded as a bad idea for a large number of reasons. Both for performance reasons, and for analysis by the programmer reading the code. A huge reason for me to leave R was that it’s become polluted with fexpr everywhere. Many people will find it convenient, particularly if they don’t go any deeper than a surface level user of the Hadleyverse. But there is a good reason that they were tried in LISP in the 1970-1980s and rejected in favor of macros.

Julia’s speed is absolutely dependent on the fact that it doesn’t have fexpr and therefore can be optimized and compiled effectively. But beyond speed it’s also the case that to a Julia programmer it is always possible to look at a piece of code and know what will be evaluated to what kind of thing. This is a huge boon for anyone reading code or attempting to utilize Julia code as scaffolding for new functionality. IMHO if you want to extend the functionality of ggplot2 etc good luck to you. If you want to extend Julia code you have a much better chance.

pdeffebach · October 10, 2022, 2:19pm

dplyr has its known problems, but that seems like a digression from OP’s point about documentation.

jules · October 10, 2022, 2:20pm

I personally tend to get a little defensive as well if I read these stability discussions. I mean, ggplot2 was first created in 2005 if Google is right, so in 2013 it was already 8 years old. That’s more time than most Julia packages had to become stable, so it seems unfair to expect the same kind of stability from this ecosystem. How are packages in the Julia space supposed to innovate if users get frustrated by breakage? It can’t really be done. I’m often surprised that Julia itself has fared so well since 1.0, and that all these features could be added since then without breaking API.

So if I read “why is the Julia ecosystem not stable/mature” and the answer “time” is not enough, I’m not sure where to go from there. Nobody blames you for using R if stability is paramount to you, I guess people reflexively bring up points where Julia excels aside from stability because they think there’s more to the discussion.

Unless we talk about specific instances of breakage that could have been avoided if we had been more careful, I don’t think this topic is all that fruitful.

Topic		Replies	Views
Forward compatibility and stability of Julia vs. Packages General Usage	97	3499	August 27, 2023
Stability of Julia between versions General Usage	12	520	June 14, 2023
Julia version change [how much change , programmer should fear] Community	4	1022	December 14, 2020
Simple question: Time to move from 1.0 to 1.5? New to Julia question	9	1366	December 18, 2020
Is there a good reason to have an LTS version of Julia? Internals & Design	22	2312	September 29, 2023

Stability of Julia

Related topics