Stability of Julia

Since you referred to tidyr, let me comment about DataFrames.jl which I co-maintain (already @jar1 and @pdeffebach touched this).

The status is:

  • it has been, and we plan to keep it in the future, actively maintained; all bugs are fixed max in several days from being reported;
  • it is backward compatible (barring some “corner cases” which are on a border of a bug - such things happen in every ecosystem and devs need to make a decision what to do in such cases; these decisions are never taken lightly)
  • it is almost feature complete (and AFAICT all the missing features are doable, but maybe not always very convenient)
  • I have written over 100 blog posts how to work with it (you can find them here) + there are I think 10 curated tutorials (i.e. kept up to date along with releases of the package, here you can find them). Additionally you already have available Julia for data analysis book covering the package + required knowledge of Base Julia to work with it efficiently (now you can browse it on-line for free; soon it will be possible to buy its hardcopy).

Given this, my question cold be (and I really mean it - I currently have a grant that can finance hiring a developer who would add things that are missing): what more would you want in DataFrames.jl so that it would meet your needs?

22 Likes

I do not think it helps anyway, and actually offtopic here - but that kind of sweeping generalization could only be warranted if you knew about the current state of the documentation for at least sufficiently large and diverse sample of Julia packages. As you simultaneously admit you are not aquianted with the current state of the Julia eco-system, that comes over as arrogant, and may get the kind of reaction you are getting.

Maybe first look into documentation for some major packages like CSV, DataFrames, Plots?

7 Likes

Unfortunately there is a simple explanation for it: They are not paid for documentation, nor writing docs is something most of them like to do.

2 Likes

Interesting I didn’t know this. I’ve just never had any issues with it. It’s made R so much more user friendly. I can pipe and manipulate all sorts of data (filter, pivot, summarize, etc) and make plots that would have taken me much longer before in many many more lines of code. I guess programmers can debate the implementation, but as a user, it’s phenomenal and user friendly and well documented. For me, I’ve never understood why we need macros, they are much more annoying to figure out than pipes when I’ve learned Julia. And when I ask, I get all sorts of odd answers to why we need macros. Does Python have macros? Does every language have them? Seems like they are unnecessary to users, but devs love them.

I can get on board with this. I just feel like if I developed a package I would spend many hours developing documentation. I have a whole website and much of my github contains documentation for OTHER people’s packages (these are not Julia related). Seems so odd we have to email devs dozens of time (this is a current R package we are working on) instead of them just writing documentation in maybe less than 1 weeks work. They’d rather spend hundreds of hours emailing back and forth with us to figure out simple tasks? Very odd… I’ve written full documentation for software that took me less than a day, people come to my website to learn because this software has little to no documentation on critical elements… I’ll always be confused by this.

1 Like

Yes, you are correct, I’m referring to Julia of 3-4 years ago as 2018-2019 versions. This is what I came to learn improved at all or not. I didn’t have time to look through 100 packages, just wanted to know if you thought they were all 100% better than they were. Please let me know if you think they do now have every option documented and how to change them.

What package are you referring to, in particular? We’ve linked to you many well documented packages and tutorials.

Perhaps this is better handled by filing issues (not emailing the devs) on the packages you find problematic.

Great information, thank you very much. I will look at your references. I didn’t mean to single out any one package. Just in general I remember learning a plotting package and then I went back to Julia, updated (to use a new package, I think JWAS) and then realizing that package no longer existed (I thought it was Gadfly but I just downloaded in julia 1.7… so must have been different one). This is the sort of thing that drives us base dummy users to go away from Julia and I’ve had many conversations. Often 99% of my responses are from devs who love the language and I can see why, while many users may just want to code something for production and may go back to other languages (like we do). I need Julia for some tasks, but overall, R much easier and don’t have to deal with some of the issues that plagued Julia. I was just wondering if this was stabilizing (I guess I should have used maturing…) in current versions.

This documentation is so key and was lacking when I tried to learn Julia. Thank you. Do you have any Julia from R books or documentation that may make it easier. Even so easy as to say stringr → ??? package in Julia. Or things like that. Most of my coworkers and grad students at the university will be coming from R.

1 Like

Thanks for your answer. I completely understand it takes time. But also nice if devs and the Julia community warned us at that time… Some people got excited and we all moved over and then maintaining our code was 99% of our time vs actually doing analyses. Easier for devs who do this full time, very difficult or impossible for industry in biological sciences to spend that much time updating code. I understand it’s likely got more stable since v1 (packages mostly, not core). I was just wondering how mature and stable these were, and then a side conversation broke out about how much better documentation has gotten, which first needed to stabilize the language to then write permanent documentation, maybe this was 99% of the problem. I really like the language but it was so difficult to learn 3-7 years ago as I started. Gave up many times and came back, same problems, stopped again. And R seemed to keep getting better (tidymodels and many more packages). I don’t mind the updates, but also at some point wanted it to slow down so I could come back in 6 months and not have to relearn packages.

That’s a needless worry, since it’s impossible for UnicodePlots or any other registered package to go away (that’s something that happened in JavaScript ecosystem, and we learned from it, disallowing it).

UnicodePlots is up to 3.1.3, and it doesn’t mean e.g. 2.0 or 1.0 went away. I’m not up-to-speed on the major version upgrades for it, they ARE meant for breaking changes, in general, maybe some software does it to only signal new features? [E.g. Genie is up to 5.7.0, and such high version numbers are rare.] “Technically breaking” changes are a thing in Julia, and maybe a very minor thing “broke” or something more major with a major version jump, but as I said, you can always use old versions if that were a problem. I think I would have heard if there were any actual problems.

While you can’t revoke a package (there have been some rare allowed exceptions for new packages that shouldn’t have been registered), a malicious developer could in theory just change code in a bad way (or unintentionally), in a minor (or major) version. That’s a problem for all open source and closed/proprietary, but you guard against it my not updating.

Julia’s package manager is awesome, but it’s good to know it can downgrade packages (or likely will upgrade, usually good, in theory bad), and "tiered is the default, but if you worry you can use:

pkg> add --preserve=all Example
2 Likes

This theead is looking more and more as a provocation… of course “100%”, “all options” documented is not something to expect from any library or language.
You has been given the answer: compared to 3 years ago Julia and most of its main packages are much more stable, now it is up to you to try it and see if it suits your needs.

5 Likes

Yes, I’d say fortunately fexpr are not required for this level of convenience.

When it comes to “piping” data through a pipeline please see DataFramesMeta and its @chain macro. It does the same exact type of thing, but without gotchas that occur in fexpr based systems. For example, suppose one of your variable names is “sum” but “sum” is also the name of a function you want to call on this variable… @transform(:thesum = sum(:sum)) is perfectly fine because sum() is a function call and :sum is a quoted symbol which in the context of a DataFramesMeta macro means a particular column of the data frame named “sum”, and they have well defined syntax separating them out. There is a difference between a symbol :sum and a function named by that symbol sum and there is a context in which it’s clear that meaning is altered (ie inside @ macros). Why do we need macros? Because macros enable the creation of domain specific languages within Julia itself. This makes expressing complex calculations convenient. In fact it’s what makes R so convenient, except all Hadley has to work with is nonstandard evaluation, which are slightly different from macros.

Since this is meant as an off-the-cuff example please be generous when interpreting the following… in R suppose you say sum(sum) now in base R the function sum is a well defined function. And functions can take other functions as their arguments. So sum(sum) should mean “call the function sum, passing it the function sum as its first argument”. This is of course meaningless but is very clearly what you asked for… Of course, within the Hadleyverse he maybe will capture that expression and convert it to mean something like “add up all the values in the column named sum” under the hood somewhere but it’s not clear exactly where or in what context such a thing will occur… It is a tremendous source of bugs for anyone doing anything other than being a pure user of the hadleyverse.

Seriously, I understand the frustration from earlier but I think already by mid 2019 onward the ecosystem had gelled dramatically. By moving it’s not just “I get most of what R can do, but some of it is faster”. You actually get dramatic improvements in the precision of the semantics of the language and also things are faster.

The point of me saying all this is to change your view on what it is that moving to Julia gets you. What it buys you is a number of advantages which are hard to explain to a average bio-informatics user, but after you’ve been doing them for a while you will come to say “I never want to go back”.

4 Likes

stringr - AFAICT Base Julia essentially covers what stringr does. There might be some advanced features that are missing, but I have not done a detailed comparison to list them with confidence.

As for R → DataFrames.jl you have:

3 Likes

I think it can be summarized briefly: the design is geared towards reproducible research.

This means that the data ecosystem is designed in a way that you should not accidentally get a wrong result. If something is wrong you will get an error. For example in DataFrames.jl we never emit warnings. Either something is correct - and we produce the result - or is incorrect and we error.

Similarly there is no “magic happening under the hood”. All is explicit. Sometimes it adds a bit of typing (not much), but the benefit is that a well written Julia code can be parsed in your head without having to run it.

The example that @dlakelan has given is one of such cases. Column name is prefixed with : in DataFramesMeta.jl. Does it add typing? Yes, you need extra :. But what is the benefit? If you see sum(:sum) you are 100% sure this means that you want to make a sum of a column named :sum (you do not need to run the code to be sure that this is what the expression means). In R the result of sum(sum) would depend on how the sum gets resolved in nested variable scopes.

7 Likes

I don’t use R, but for curiosity I checked this table: https://github.com/rstudio/cheatsheets/blob/main/strings.pdf

What I find nice in Julia is that I know how to do all of those operations using base julia functions, which apply for any other type of data as well.

Thus, I don’t actually have to learn any package to do all that, and from my point of view it is quite annoying to think that I would have to discover which specific package and function name has to be used for each of such operations.

Once you get familiar with Julia, something about the mindset changes, because you actually don’t need to find a package for everything you want to do, even if you need that to be done super fast. Many (many) times just combining a few functions one knows already or a simple loop solves the problem.

3 Likes

Thanks for your comments. That helps me.

Thanks for your comments. I agree, probably much of that can be done with base Julia. R just has really good additional packages, the tidyverse functions are so much better than base R in many ways. With Julia coming later, I think base functions are much better.

Thanks, I didn’t know this. I will explore.

This is an excellent answer (toward the bottom) and exactly what I was looking for in my question. Thank you. I do think it’s gotten much better and I may be able to convince others to move with me, but in a large company, we do have to consider that everyone else is using R and causes compatibility issues, especially if I were to ever leave and someone has to maintain my pipelines in Julia vs R.

On the top part, I still don’t think this type of thing is an issue for the average user. Unless they are really dumb and name something sum over the base function, it really should never come up. I’ve been programming for 10+ years in R and never once had this type of thing come up… Biggest issues in R is functions overlapping and having to use the :: to indicate which function which has happened from time to time, I now always load tidyverse last and no longer have these issues. This seems to be more of a theoretical programming design thing that no users care about or really impacts them. Maybe if there is a performance difference or whatever, I could see, but dplyr has really good performance in general, I think it’s written in C++ with Rcpp from what I understand as many of his packages are.

On macros, okay I see your point, but this also makes learning Julia a nightmare. I tell people it’s like trying to learn 15 languages in one. I took all the best parts of all the languages and smashed them into Julia. Which is awesome, because you get all those 15 best parts in one language… Trouble is, us dummies from R, only know R and now have to learn 15 different styles of coding (including things like macros) vs R where most things are a function (I think python went from stuff like Julia to ‘everything is a function’ and I appreciated Python3 much more than 2). I guess it does make some things better, but having to learn it all is cumbersome to say the least. I can’t imagine teaching my students (in animal sciences) Julia. R is difficult enough for them as they had 0 programming background. Tidyverse makes it even more easy.

I mean, not really. When you submit to CRAN, all options have to be documented for the most part. I at least get something when I go to help in R. 3 years ago I went to some really big packages in Julia and not much behind the first 3 options were documented. I then emailed the dev to get answers on how to read in strings vs numbers (my IDs were numeric but should be treated as a string). It was CSV.read() and I googled for a few days and had to email the dev, who got mad at me for emailing him. All I wanted was for him to document the other options in his function, simple request I thought. colClasses within read.table() is well documented in R and not a problem for new users.