about a year ago @TheCedarPrince and I created https://github.com/Wikunia/Javis.jl as a small little project to create some neat little gifs.
Well it grew a “bit” more than expected Now we are more and more at the stage where we are thinking about splitting parts up, moving into an organization and face some other problems.
We would like to get some feedback from you about this process as you might have experienced it in the past. Some specific questions that came to our mind:
GitHub Organization
When we transfer the repository from my GitHub account to an organization
will all links from outside of GitHub break (like from blog posts/twitter/YouTube comments)
what are the steps we need to take regarding the registry such that Javis can be installed from the new repo?
Issues and general management
Currently we see more and more issues piling up due to more people using Javis. We are mostly two developers at the moment + GSoC student @arsh . We partly switched from students to employment and need to spend our time more wisely Some questions that come to mind but maybe add your own through the comments that we should be aware of:
When we create PRs we normally had the other co-developer look reviewing it
we feel this takes quite some time due to long waiting periods when one is busy due to the job
are there any tips on improving this? For which kind of PRs is it better to have something merged fast? Those can be improved by another PR. For which should one wait for reviews?
What kind of tools do you use to prioritize issues over others?
Any specific tools other than the GitHub issue tracker?
Labels that helped you out a lot?
Thanks in advance for tips and tricks for developers who are new to the world of bigger organizations?
All! Making a problem chunked into short and simple segments is best for everyone. Take big PR ideas and collapse them into little segments which someone would feel comfortable reviewing for 5 minutes on their phone in the train and merging. Make sure you have great test coverage, add any downstream tests that you need, and start to really trust your tests. You should have enough test coverage that with a glance of your code you think “well, this isn’t actively worse, and nothing is broken, so let’s merge. But I’ll post that I think lines 105-115 are probably slow and worth an issue”.
Where I see high velocity projects run into trouble is when they don’t trust their tests (I’m looking at you Flux ). Will this break anything? Will this break user code downstream? When your system is “maybe a very smart person will think about these code changes enough to find all of the things that could break”, then you are in velocity trouble. However, getting to the point of trusting your tests really requires you go all in and “just do it”. With the DiffEq stack, we jumped into this philosophy in 2016. And… 2016 through 2017 were a somewhat well-documented mess . But these days if I see green I know that I could merge code with a blindfold on and 99.9% of users will not have a broken system or a major performance regression, and that’s a good guarantee that makes “yeah why not” turn into “quick merge and let’s follow up”.
You know who knows the codebase well. Trust those who you trust.
Email bumps. People let you know what they care about. Also chatrooms. When the same thing is talked about 10 times a day, you know what you need to fix .
Some specific ones, like “new algorithm” in OrdinaryDiffEq.jl, help a ton for planning GSoCs. Most are a waste of time.
I mention this in the JuliaSim video a little bit, but it’s less about constructing a software package and more about constructing a community. You don’t want to not merge someone PR because they don’t know performance very well. Instead, you want to recruit them to be the one to trail-blaze features, while noting to your performance guru some fun self-contained and well-tested examples that would be good to take a look at. You want to build off of other people’s packages, integrate their developers into yours, package up your successes thing they can put into their talks as something they helped create, and help people see this cool vision that motivates them to add their own ideas as to how that vision can be reached.
As a maintainer, you know the package best, and so at first it might seem like your job is to lead the package from the front of the army, building everything first for everyone. But that’s really not the case after the foundation has been laid. Your job is to use your knowledge of the project to do what others cannot. You help everyone’s 80% PRs turn into successes so that every student can write a cool blog post and everyone goes “wow, how come every student that works on that project is successful?”, driving more people to the project. You want to be the director behind the stage working with the lighting technician. Everyone else is like “wow, that scene was really dramatic!”, while you know that you had to change out a bulb at 5am to make that effect actually work. No one else needs to know the background if you can fill in all of the gaps . Lead with that mindset, always aim to do what’s best for growing the project, and the project will give you back what you need.
As long as you don’t recreate the repo with the old name, for example by forking the now moved repository to your account: in that case the automatic redirect created by GitHub is broken.
Thanks so much for your answer @ChrisRackauckas
One question about this part:
Our reviews are often things like: Change formulation/grammar/nuance here and there for this and that tutorial or docstring.
Things that can’t be tested automatically (at least not that well). Regarding that we currently have tutorials/documentation (other than docstrings) in the feature PR as well. We often ask ourselves whether it would be better to make a separate PR for that to keep PRs smaller. On the other hand we don’t want to forget about it. What is the best approach in your experience. I can see the option of creating a new issue and the PR for creating the docs for it later.
I find the culture of pointing out small grammatical mistakes and waiting for someone else to fix it as needless friction. Github’s suggestion feature is a great way to make a comment and merge the change ASAP. If that’s all that was holding up the PR, just fix and merge. They will both see the feedback but at the same time there’s no reason to make someone else guess at what would make the reviewer happy enough to merge.
Always split. The idea is “you don’t own the code, the project does”. When PRs get too long, it’s “someone else’s job” to finish PR #8942 that has been sitting for 6 months. It’s cookie licking in its most sincere form: someone is actually trying to do the work, but they took on too big of a job that will never get finished. In many cases, they may have 0 intention to ever return and finish the PR (or might say they will “when they have the time”). Big PRs are a mess to maintain, and then no one wants to review them in the end.
Just split it down to make everyone happy. Just merge and open a few issues if it needs more: this features needs more docs, this needs a performance boost etc. When you don’t merge the PR, you’re basically saying it’s that one person’s job to do everything before it’s allowed to be merged. No one wants to touch that piece of code until it’s in master. Once it’s in master, people who like to work on docs will clean up the docs, your test people will add a few tests, your performance people will chime in a few small PRs that get it to top notch, etc. But if you lock it down in the PR stage because it’s not the a beautiful Greek sculpture of what the code could be, you lock out the community from fully engaging with it, and that’s what should be avoided. In theory people can do PRs to PRs, but it’s a mess and people do not do that, they wait.
I think the point about trusting your tests is valid, at the same time it’s quite hard to achieve. What I notice in Makie quite often is that there are additive or multiplicative changes. Additive means there’s something new that works similar to something existing, so new tests are similar in length and scope to what exists already. Multiplicative changes interact with many existing parts of the system or introduce completely new interactions, thereby increasing the surface area that needs to be tested a lot. In these cases merging and worrying about followup issues later feels dangerous to me, as it’s usually much harder to remove a part than it is to add a new one. So the project only tends to grow organically, but you have to forcibly prune it as well from cruft.
@ChrisRackauckas has said all the good thing I was going to say.
I think Chris and I have been around each other for too long.
Yep, trust your tests.
And if you don’t know if you can trust your tests, release early and release often, until you can trust your tests.
If you tag a release after every non-breaking PR then:
you immediately get feedback as to if it broke something as people will complain,
and it is really easy to isolate what it was as they can pin to the old release, and see it go away, and then you can revert the PR in question.
(Often it is best to revert the PR immediately, then tag a patch then go and fix it properly, since the longer you leave an issue to fester the more people start to depend on the bug, or add ugly workarounds.).
Once you fix that thing you can improve your tests so that similar bugs don’t slip through again.
And soon you can trust your tests.
We trust the people we trust.
The ColPrac guidance is that either the reviewer, or the author need to have merge rights.
So someone with merge rights can go and make a PR, and get literally any human to review it, maybe the author of the issue that it fixes, maybe a interested random.
I think review is generally worthwhile.
Don’t be like python where a core pip developer skipped review and broke pip for some large portion of the world.
There is a related social hack to get more maintainers:
if someone makes a good PR (or 3), you can just grant them merge rights after you merge it.
They are probably not going to do evil, and if you tell them the processes to follow (e.g. from a contributors guide like ColPrac) they will not do it by mistake.
And worst comes to worst you can revert a bunch of stuff – it is git.
But further: giving people merge rights often causes them to step up to the plate and contribute more.
Including triaging issues etc.
So much this:
The most important work for a maintainer is triaging issues, reviewing PRs, and onboarding new contributors (which is often not more than asking on a issue “Do you want to make a PR to fix this? I think do …”, and then reviewing maybe a few shaky PRs at first).
Open Source maintainance is basically a kind of management.
It is very possible to be a package maintainer and have the only code you write youself be some trivial little bug fixes or something.
I don’t use these consistently everywhere, and they kind of overlap, but I in general get a lot of value out of:
Pending Clear Nead for features that we think might be useful, and have a sketch for how it would be done, but actually do not know if anyone actually needs it.
Speculative: similar to the above but move vague. Have an idea that maybe such a feature might belong in the package
Requires Careful Thought: indicating that an issue is not at all simple or obvious. Kind of the opposite of Good First Issue. Generally I use it to highlight that the API or Design is going to be hard to get right.
These are useful for de-prioritizing issues.
Bug: I use this as part of triaging things. If someone opens a issue then at very least they get that acknowledged: someone has looked at it.
Most of these (interesting!) questions are more general than just Julia: I spent some time looking at other popular repositories on GitHub in other programming environments to learn more about this subject, I highly recommend it! For example, I learned a lot from Evan Czaplicki’s talks, who works on the Elm language. My favourite: The Hard Parts of Open Source.
Some of my expierences to throw into the mix
Invest some time in managing notifications: try to avoid reading notifications directly as they come in, since it distracts from long term and less visible work. It also gives time for others to contribute to conversations (or for people to find solutions themselves and close their own issue). I created a new email address just for your github notifications and I use email filters to make notifications highly customizable.
I use labels mostly to communicate to (relatively) new contributors: which part of the codebase is affected & expected difficulty.
I find that GitHub Discussions is better than GitHub Issues for almost everything: they allow a conversation to “sidetrack” without disrupting the overall conversation.
Finally, for me, video calling is always the best way to communicate. Consider setting up a regular developer video call with a public link. GitHub makes it (too) easy to have cold and technical discussions, you have to work a little to get the opposite.
I saw a suggestion that any time a PR has had to go through more than X rounds of review,
or a discussion have been going for more 3 days just get everyone in a call and talk it out.
I don’t do it enough but everytime I do it has rapidly helped us get on the same page.