Discussion on "Why I no longer recommend Julia" by Yuri Vishnevsky

It’s tricky… There is a price we pay for having interfaces evolving according to consensus and need, and this post points right at it. But I think what we are doing is also quite powerful in combination with multiple dispatch and interoperability. That positive side is more difficult to explain and grasp, so it’s important to be very conscious about it

3 Likes

Standardized interface tests sound great, and the easier they are to pass around, the better. But adding tests isn’t the whole solution. Take the example where the author ran into StatsBase.jl still assuming 1-based indexed arrays for @inbounds code. That assumption isn’t just there; for example, you can also find it in matrix multiplication. The functions require_one_based_indexing and has_offset_axes exist to throw errors if people try to use some code with non-1-based arrays, very much instead of editing that code. My guess is it’s going to take a LOT of developer-hours to make all those changes and test, and developers are likely spread thin on more pressing priorities; each cited bug or enabling OffsetArrays may seem critical in isolation, but they seem small if you put them next to many of the other issues.

For now, it might be better just to document limitations more clearly: for example, “The AbstractArray interface intends to allow non-1-based indexing and you could fully leverage this in your own code. However, some AbstractArray code in Base or other packages have not been updated to match, so it may not be possible to fully compose that code with yours.”

4 Likes

@mcabbott I broadly agree with this, but I also believe it would be extremely useful to try and figure out what the quirks of these packages are, and isolate them in a standard set of tests in the stdlibs / Base, or some other minimal package (preferably with no deps outside the stdlibs) called AbstractArraysTesting.jl or something.

To my mind the limitations with asking people to test using a collection of arrays found in the wild on an ad-hoc basis is that it complicates their test deps, and it requires everyone to go hunting for interesting arrays for their package, which is a pain and will presumably lead to different people testing on different things and getting different levels of robustness / coverage. Of course, you could fix the latter limitations by just collecting and publishing a collection of recommended arrays in a package like AbstractArrayTesting.jl or something, but that wouldn’t get you around having loads of additional deps.

6 Likes

one way of solving the (2) part is to define the interface by tests. E.g. create a package “AbstractArrayInterface.jl” that defines functions and the correct behaviour of abstract arrays and gives you the canonical examples how to use them correctly (and maybe even implements a trivial example). Then you have a versioned and test-able interface which you can use for testing the compliance of your arrays or summon for your own tests to test the functions.

As a poc I created this package: GitHub - kalmarek/GroupsCore.jl: Interface for abstract groups which standardises/sets in place something called “GroupInterface”. If I implement a structure representing a group I immediately test the interface, e.g. Groups.jl/group_constructions.jl at 2e544d623f490da9997a751c7ad02f1a5c313903 · kalmarek/Groups.jl · GitHub

6 Likes

Hi new Chris, it wasn’t very dangerous (more so than no checking in e.g. C/C++), until OffsetArrays, and then only in that scenario, so lets not throw the baby out with the bathwater. But you made me think, I think we should disable it is that case.

Note, all user can disable globally:

--check-bounds={yes|no}   Emit bounds checks always or never (ignoring declarations)

So you can test your OffsetArray-using code with a package that wasn’t assuming it used, already. I would just want “yes” answered there for any non-1-indexed array, otherwise “no”. My question is it possible, yes, I think it is, the macro has access to the array, knows it’s of the OffsetArray type.

What would be ideal is that if the OffsetArray is actually indexed by an index given by eachindex(A) then the current behavior is retained. But it returns an Int64 (i.e. Int), the same as e.g. 1:length(A), so it seems impossible. But could eachindex be made to return a new type Int64_safe_array_index just to signal that to the macro, and indexing (i.e. getindex) be made to accept it?

Testing (and bounds checking is testing), can show the presence of bug, but not absence of. In a lot of cases it would and we could fix the package ecosystem. The problem would be when we do, OffsetArrays would be slower and second-class that way. We might need to make up a new macro @oinbounds (might not be needed if the Int64_safe_array_index-idea works) that keeps the current semantics of @inbounds, and/or add a global user-setting to similar effect.

1 Like

I get this completely. Honestly, I find contributing to Julia packages very unrewarding in most cases except for a few. Due to this, I think that few people actually have the chance to learn and grow into becoming a great software developer/maintainer.

Contributing is often unrewarding because if the PR is not good enough, it is ignored or feedback is given in one sentence. You could have spent multiple hours trying to do your best on something and all you get back is two thumbs down and a “Doesn’t work. Will fail when X”. At the same time, many repositories have CI now disabled by default for new contributors and reviewers don’t often spent a few minutes on actually modifying the PR if it is at 80%. This disabled CI and having to fix minor styling issues is a tremendous waste of time for the person who opens a PR. Nobody likes to spend 30 minutes on a style change, then wait 24 hours to get CI approved, spend 30 minutes fixing a Windows 32-bit problem, wait 24 hours for another review comment and spend 30 minutes on fixing a minor detail. I don’t get this. Re-reading a PR again and again is also a waste of effort for the reviewer. Allowing newcomers to run CI automatically takes 1 minute once if you know where to find the setting.

Why not take the following as the guideline for merging: If the PR is an improvement over the current state or can be easily reverted, then merge or finish it and merge. This would motivate newcomers much more. Chris Rackauckas already does this and I don’t understand why this isn’t the main philosophy under reviewers.

And, yeah, sure. I admit that I have submitted some terrible PRs, created terrible packages and terrible issues. I should have spent much more time on some PRs to think about things and what not. In my defense, let’s bring in the quote from Michael Jordan:

I’ve missed more than 9,000 shots in my career. I’ve lost almost 300 games. Twenty-six times I’ve been trusted to take the game-winning shot and missed. I’ve failed over and over and over again in my life. And that is why I succeed.

So, let’s all try to be more supportive of people who open terrible PRs or issues. Let’s support them to try again, so that they can succeed.

EDIT: Tim Holy posted some insight on why reviewing PRs is not as easy as it may sound: Improving the Julia issue tracker - #4 by tim.holy.

46 Likes

Fully agree. A mechanism to solve the issue should consider some reward system.

I don’t know if I fully support this guideline. The real problem is not that we have too many terrible PRs. The problem as I see it is that the Julia community is comprised of two extreme profiles:

  1. Extremely qualified researchers with excellent software skills that can follow good software development practices by heart. They usually come from advanced HPC, linear algebra, stats communities and have programmed in C, C++, Fortran, … in the past. They know how to cope with the complexity of major projects.

  2. Very beginner users who never programmed in low-level languages before. They were not exposed to these software development issues in their past projects, which are usually short scripts or tiny packages that combine existing packages.

We lack the profile in the middle of the scale: professionals in the industry who are capable of contributing PRs of good quality. They have experience working in large projects in teams and value the mechanisms adopted to reduce the noise in peer review (e.g. code style, file tree structure).

My conclusion is that we need to target the middle profile more in our social events.

How to convert an intermediate user into a maintainer?

That is the main question in my opinion.

26 Likes

It’s almost as if you get users from both ends of the two-language spectrum :wink:

18 Likes

Do you mean like this?

for i in eachindex(A) 
    B[i] = 0
end

:wink:

2 Likes

Hi everyone,

As a new user of julia, I have some questions about this article. (I need to apologize that I didn’t understand most part of this article) Personally I want to use julia to do some calculation about physics and math. In this article, the author mentions some issues about StatsBase, does it mean that the results can be inaccurate when I use functions from StatsBase? Also, can I think that most of the problems from the package OffsetArrays? If I define my abstract array carefully, I can avoid these problems, right? Thanks.

4 Likes

Well, normally there is no need to define your own abstract array…

If you use one based arrays there should be no problem at all.

7 Likes

I think this is the most unfortunate thing about Yuri’s article – you are now globally worried about StatsBase, but the examples that Yuri gave work fine with standard arrays – they are just failing if you use non-standard indexing like OffsetArrays allows.

30 Likes

“Professionals in the industry”, which “industry” do you have in mind ?

One other issue is that many professionals can not contribute to Julia without entering a conflict of interest : Julia Computing and other businesses around Julia are considered as a competitors by their employers.

Am I the only one is that situation ?

1 Like

No particular industry.

Can you give an example of conflict you experienced?

My trade is in Electrical Design Automation, and JuliaSpice aims to compete with tools from my company.

This is a great point about the “sticks” or barriers to contributing. But the “carrots” are just as big a problem. Maybe bigger, when you consider what obstacles people endure to publish papers.

In academia there are no incentives or rewards for making minor or mundane contributions to a software project, at least outside of CS. It fails on many counts: it’s not a paper, it’s not original, it’s not intellectually dazzling, you’re not the leader, it’s not X (where for me, X = math). Software contributions are more akin to referee work than research, but it doesn’t even get proper service recognition.

Even though I’m at a career stage where I can mostly ignore the consequences of that attitude for myself, such work makes me undesirable to work with, especially for students and postdocs who think they want a research career. Never mind that I have been credited with 49,000 downloads on the MathWorks file exchange, while most math papers are lucky to get 2 non-self-citations: the system, and the culture that thrives on it, are bent on perpetuating themselves.

44 Likes

If you can separate the commercial components from the open source components, then you can contribute to the open source components maybe? I am assuming that you still rely on many open source dependencies that are not only used for Electrical Design Automation?

Previously when I was working for IBM I had to submit forms to the company to contribute to specific open source projects. You may request a similar process at your company so that everyone is on the same page. I believe you will be able to contribute to open source components without compromising your competitive advantage.

Thank you for your advice.

From what I was told, It seems that when Julia Computing was in the early rounds of funding, many companies were approached for some kind of partnership.

Unfortunately (for me) mine decided not to go along, having their own solution(s) to promote.
I do use Julia at my company, but it’s clear I can’t use it for things that get shipped outside.

I had no issues contributing device drivers to Linux long ago without even checking with my employer, understanding that I wasn’t hurting their business. But in my case, anything I know related to modeling, simulation and system identification is considered strategic and sensitive.

1 Like

I have been fortunate to have recently moved to a Software Engineer role from being a research academic while being employed at an institution willing to invest in open source infrastructure. We recently launched an Open Science Software Initiative:

We have a ways to go to raise the standards for academic open source software, but I do think we are starting to some funding for this purpose. That said Julia’s package and Github architecture make this a lot easier. We just need to expand the tooling as we gain collective experience with Julia.

15 Likes

As such, when iterating over an entire array, it’s much better to iterate over eachindex(A) instead of 1:length(A) . Not only will the former be much faster in cases where A is IndexCartesian , but it will also support OffsetArrays , too.

Ok, I guess if that was a part of the official documentation, then that’s the solution – we should consider 1:length(A) as bad practice, or at least highlight eachindex(A) as the best practice, especially in conjunction with @inbounds.

I’ve just summited PRs to change this to “notes” and “warnings” in the docs (this info was not visible enough imho). Maybe the docstrings in @inbounds could be edited as well…

7 Likes