Quality vs quantity in OSS development - a contributor's perspective

As someone who has been doing open source development in the Julia community for a while, both paid and unpaid, I find myself often having to choose between developing or maintaining more packages or focusing my time and effort to improve the existing packages I have. Often I would find my time and attention stretched out too thinly that I can’t sit down and get a good high quality PR done in most of the repos I develop or maintain. If additionally the repos I maintain are being actively contributed to by other people, I might start getting alienated from the code base and design decisions the longer I stay away from a project if I am not the BDFL (Benevolent Dictator For Life) of the project. I know some people here seem to be able to maintain an unimaginable number of packages even next to their day job and having high quality contributions in many different places. But in my experience, this is not always feasible or sustainable and there is definitely a trade-off sometimes between the number of packages one maintains, the quality of PRs contributed to those packages and the quality of life of the OSS developer! Not exactly complaining here by the way. I enjoy developing OSS in Julia and I know I owe people very little (unless I work for them!) but I still want to help whenever I can. I am just curious how people here handle this dilemma.

9 Likes

Just work on what you want to, and ignore everything else.

17 Likes

For sure it’s an interesting dilemma. Sometimes it feels like an open source project exists in one of two states: nobody uses it, or it requires non-trivial maintenance. Having users isn’t always beneficial for the developer’s work-life balance!

I don’t think open source contributors should have any demands made of them, or feel guilt about letting projects slip. That said, I have a lot of respect for maintainers who do the legwork to keep projects active that they themselves no longer use or develop regularly. It may not excite them, but they recognise the value to the community. That makes them want to do it even if they want to work on their cool new project more.

3 Likes

My perspective is pretty limited; I am not a big contributor to packages with lots of people working on them. But, one thing I found is to pick packages that I enjoy working on, but also, make them stable from the get-go. Don’t get me wrong integrating things with the latest greatest is super cool and a good learning experience. But it also means things will be rapidly changing and breaking.

I agree that you don’t owe anyone anything(except whoever is paying your salary), but at the same time, it does stink to make something and have it not be usable for someone else.

So I recommend having a few orthogonal interests, and making design decisions for the long haul. For example, I haven’t had to update my main package in roughly a year, and it still works no problems. Could I have done some cutting edge optimizations using the latest packages? Yes. But, it still works fine, and the maintenance is minimal due to julia being awesome.

In buissiness this is pretty common, yes you can, make the best thing ever for a brief moment, but - can you sustain that, what’s the cost? Is that really your interest? Or do you just want it to be 80% amazing and have time to cook a nice dinner and therefore still have made something great? Do you need to stick out in the community, or are you just happy making cool things :)?

Again your mileage may vary - assuredly it will - but, picking projects based on how interwoven they are, how you can design them so you can still keep a hold on them, and how fun they are has helped me maintain my job/hobby balance.

Sure. But there will be people who will make them anyway. People who write FOSS should learn to shrug these off, otherwise they will inevitably burn out.

I also have a lot of respect for these people, but at the same time wonder why no one steps up to maintain a project if it really has “value to the community”.

IMO the best strategy is to lower the barriers to contributions: write well-documented, maintainable code. Then people can make nontrivial PRs without a hassle and eventually take over if they want to.

3 Likes

There’s two phases to an open source project: the personal phase and the distributed phase. In the personal phase, write whatever you want, make a good quality software, it’s all yours. This is the part everyone knows well. When a project starts to get into the distributed phase, the roles start to split.

When things get to that phase, “build systems not software”. What I mean by that is, it’s harder to start a completely new section or add a completely new idea to a project than it is to “fill in features”, so I leave the feature fill-in as training exercises to grow the project and mainly focus on building the systems to fill in. As my PhD advisor used to say, only work on things others can’t do. Things like working on OrdinaryDiffEq, normally I’m building the pieces to create the first split-ODE solver, and then once there’s one, it’s a GSoC project to add the higher order ones. Once things are there, it’s more important to grow the maintainer base than anything else, so then a lot of work becomes very personal and strategically helping people in ways that grow “the project” moreso than the software. Sometimes this means spending a lot more time on tutorials than making every little feature.

Part of growing a system like that is creating a flow, and constant progress through small PRs really help maintain that flow. I don’t feel like large PRs are a sign of quality, rather, I feel like it’s a sign that the individual didn’t figure out how to properly split up the work into reasonable chunks and instead slapped everyone in the face with a PR of doom which we now all need to keep rebasing. Some changes need this, but to maintain the flow of the project I would try to see how to turn that into small chunks, get the easy thing in, then the next step, then the next step. The point is that when it’s distributed it’s not one person’s work, but if you make a giant PR to a domain you are informally claiming it as your work. If you don’t finish that PR, you tend to then drag that area of the project, so then it’s time to “break it up”, mostly to let others in. I think this would count as quantity of PRs, but to me this shows that the quality of PRs is highly tied to quantity, where moving in a way that has few large PRs is lower quality in how it hinders collaborative development.

One last thing is, time is expensive and compute is cheap, so really use and abuse CI. If you want to see if something works, let Travis tell you. These PRs look like bad quality because sometimes there’s an obvious typo, but this means you’re free to use the Github website interface to make a change, throw it into a PR, see if CI passes, and come back later. You might not even be the one to come back later: you might just be sharing the information that’s needed to unblock someone else in the form of a WIP PR. Is that low quality? In a sense you know it could be a “bad commit”, but the point is to always keep things moving. Github issues and PRs don’t have a maximum, so use the systems to your advantage. Hell, I’ve even looked at a PR and thought “I don’t need this right now, I have other things to do right now, and it would be a great training exercise, so let’s just push what we have now and use it as a teaching example later”. Such a PR isn’t even supposed to be in a working state at that point, but it still has a point :wink:.

One final thing I’d like to mention is that some of the top notch contributions are just answering questions on Discourse, SO, and Slack. Again, growing the project is always the main goal (once it’s past the personal phase), since the growth itself makes it easier to fix a lot of the issues, and the way to grow is to show users exactly what to do. The moment there’s a second, third, or fourth person answering questions of that type the true amount of time-sensitive work drops dramatically, and that’s just good for everyone’s sanity.

25 Likes

As a recent julia user who wants to contribute to the community, I feel like even though the need of well-documented and maintable code is needed for people to contribute, the design of the Julia Language along with its community help a lot in comparison with other languages which are less accessible or less attractive. I guess the only thing that could discourage me to contribute to some Julia packages, is the need to cope with an overcomplicated or obscure set of new types.

3 Likes

I maintain a lot of packages (ApproxFun, BandedMatrices, BlockArrays, FillArrays, LazyArrays, SingularIntegralEquations, HypergeometricFunctions, …) but only because I use them in my own research. I do think it’s useful to do “high quality PRs” in the sense that codecov is high as it avoids headaches down the line: if tests are robust one can be confident in making massive changes as long as tests pass.

Documentation is a weak point in this but I think “well-documented” is overrated: just look at Base. Debugging+tests is more useful because the code doesn’t lie: I know exactly what Base.Broadcast._bc1 does because the code is available. Note that Base (and my own code) is very functional-programming heavy, which takes some getting used to, and is very hard to document.

So my recommendation echos Chris’s: if you find maintaining packages burdensome, then don’t. Do what you find useful, fun, and what you feel others wouldn’t be able to do.

5 Likes

Many great answers. Thanks everyone for your input.

1 Like