I have more experience with PRs since that post two years ago, but I still flub it up over half the time. At the risk of looking like a fool, I provided a list of my botched PRs at the bottom. I still find GitHub and Documenter extremely confusing. Is there something that can be done for better onboarding of the required concepts?
Details
Git
I only use git on my personal projects, which basically means I just click the buttons that show up here:
I get lost managing forks, branches, origins, merges, and syncs. Most of the git tutorials I found assume you are using the command line and have full access to a repository you own.
Documenter
I don’t use Documenter.jl at all because I have never made a registered package, but updating online documentation forces you interact with Documenter. Its syntax is somewhat different than other extended markdown flavors, so what I see in Markdown Preview Enhanced is not the same as what I see on the final rendered website. (Even worse if using GitHub’s online preview.) The instructions I did find for testing/building the actual html files do not work out of the box either:
I got around the last error by opening a separate REPL beside my terminal and navigating to the same folder to perform package management, but that still didn’t work.
Why does a documentation preview require a terminal shell? Will the local build even allow me to check my reference links? There is no way I am figuring out how to run my own server.
I don’t think asking beginners to make more documentation PRs is going to go very well without simplifying the on-ramps. What can be done to make this process easier?
I totally get your point, but I also wouldn’t be so hard on yourself. You’ve made over two dozen well-received contributions!. And I wouldn’t even say that your list of “botches” make you look like a fool — most of them I wouldn’t even call botched.
Despite its many flaws, the one superpower of git is that you can’t irreparably break anything. Heck, just look at all the merged commits that were rolled back on Julia itself. Again, I know that this is beside the point, and yes, there could be more/better on-ramps — I see you’re already helping document documenter, too! — but I think level-setting expectations is also helpful. Folks don’t expect PRs to be perfect. It’s ok to have misfires.
My point is: yes, the curve is steep and I get your frustration, but you’re doing it, and that is great! Bravo!
One concrete thing that I think would be super helpful is if the CI infrastructure would allow previewing the docs built from within a PR.
Today, I made a PR updating the README for Tricks.jl (a package I do not own). I did the ENTIRE PR from the GitHub GUI. I clicked “edit file” and then GitHub prompted me through the whole process of forking, commiting, and opening a PR.
I admit, it was a very small change, but if you just want to do documentation fixes, I think a lot can be done without ever leaving the GitHub website.
I do feel like I was ultimately helpful. I’m just super burnt out by the whole process.
#10 was a simple docstring edit that took 5 months to merge because three different people couldn’t get the cross-references to work.
#8 is a big beginner tutorial for DataFrames that I finished writing 8 months ago, but it is still not merged because I can’t get the doctests to pass. I was super proud and enthusiastic about it when I finally finished getting the content accepted, but I have lost all motivation to push it across the finish line.
In both cases, I expected the reviewer to be able to quickly find and fix my mistakes, but they did not know how either. Instead the PRs just sat there, and I don’t know a better way.
——
#1 and #7 are “edit file on Github” gone wrong. For #1, I was apparently looking at the wrong version of the docs when I clicked the edit button. For #2, I had no way to test the links I needed to create. That button is a great option for just fixing a typo, but if you need to create a link or an admonition, good luck.
——
I was sold on how easy it is to contribute to open source, and I felt like I was really doing something upon submitting the PRs. But reality set in afterwards, and the level of effort and frustration required to actually get a merge do not leave me eager to dive in to more.
I hear you. I don’t have a great answer, ultimately. Git is notoriously hard, and I don’t know a way of solving for that. Thinking about file histories at the intersection of commits organized into merges is just pretty complicated.
GitHub has spent a huge amount of resources on trying to make a friendly interface for git, and It is still quite unintuitive (your examples being good ones).
I don’t think there are really any tutorials or tools that can fix the learning curve required.
The JuliaLang/julia CI, at least, does include a PDF of the docs among the artifacts. I use it to check my documentation contributions.
FTR, my experience with git became more enjoyable after discovering I can plug meld into git, with git difftool -t meld (for diffing and simple editing) and git mergetool -t meld (a must for fixing merge conflicts).
Building and previewing the documentation for Julia itself is a bit tricky (at least on Windows) since you also need to build the julia binary. Contributing to packages should in theory be much easier, but sometimes the setup is tricky there too.
I would suggest you change your workflow to directly clone a fork of your own. A fork on your account is a repository you fully own yourself, so the same workflow you use for personal projects should work then too. I am guessing your current workflot goes something like this:
Git cloning the upstream repository (git clone https://github.com/JuliaData/DataFrames.jl.git)
Make changes
Commit with the VS Code buttons
??? I am guess this is where things get complicated when you need to attach your own fork URL and configure VS Code to push to that since you can’t push to https://github.com/JuliaData/DataFrames.jl.git directly?
If instead you did this:
In the Github UI, click “Fork” to create your own DataFrames repository (e.g. https://github.com/nathanrboyer/DataFrames.jl.git in your case.
Git clone your repository (git clone https://github.com/nathanrboyer/DataFrames.jl.git)
Make changes just like in a personal repo
Commit just like in a personal repo
Git push just like a personal repo
In the GitHub UI, in the upstream DataFrames repository you now have the option to create a pull request. If the time between step 5. and 6. is not too long Github will even show a “Create a pull request from your branch” button that is difficult to miss.
When it comes to writing and previewing package documenation with Documenter I spent some time trying to make documentation contributions to Ferrite.jl as easy and enjoyable as possible, see the documentation section of CONTRIBUTING.md. Basically, after cloning the repository you just need to include("docs/liveserver.jl"). The initial build (with precompilation and such) takes a bit, but after a while you should see the URL http://localhost:8000 in the REPL, if you visit this in your browser you have a preview of the documentation. This sets up a configuration based on LiveServer.jl which automatically detects changes to the documentation source files, rebuilds the documentation, and automatically refresh your browser. When you have almost instant feedback in your browser it becomes really fun and enjoyable to write documentation I think.
Would you mind trying if this setup works for you and report back whether it is as easy as I think or if there is more we can do.
I always wonder where to put LiveServer.jl as a dependency, since I don’t want it in my Base environment but it isn’t used by the Documenter.jl CI workflow either. I guess putting it in the docs env nonetheless doesn’t hurt, and makes pre-visualization easier. Thanks for the tip!
I have LiveServer in the docs env and it is very practical. I wonder if it could even further be integrated to Documenter to just provide the preview function.
I do generally create a Fork and then a Pull Request from the GitHub website. (GitHub often forces you to, which is good.) If the change is small and fast, this usually works well. However, things get more confusing when merging takes longer.
In my own projects, data only needs to flow one direction, pushed from my local VSCode to GitHub. With a PR, there are three distinct locations that may each have separate commits injected into them: upstream GitHub, downstream GitHub, and local. I have trouble getting these locations to synchronize and knowing if they are synchronized.
Upstream repository may merge other PRs before yours. Now your downstream main and mypr branch need to be re-synced with upstream before they can be tested and merged.
Commits to your PR may come from suggested commits on the GitHub website and local commits in your VSCode. Sometimes it is hard to know if Github website, Github Desktop, and VSCode have all synchronized their branch states, including syncing origin and local versions of branches in VSCode. Trying to commit and push changes from an unsynchronized branch creates problems.
If you try to open a second PR on the same package, then you cannot use the typical buttons to create a fork. The fork already exists, so you must manually make a new branch. Then you must try to keep it synced up but distinct from your other open PR branch.
The other issues with my linked PRs are related to GitHub expectations and best practices.
What tests need to pass?
I don’t really know anything about how CI works or how to review/fix its results, but reviewers typically expect you to do this on your own.
Where should I ask questions related to my PR?
You can make PR comments, commit comments, code comments, code review comments, new files, etc. Hard to know what content to put where. I’m sure I will learn GitHub etiquette over time.
This worked amazingly! I was able to quickly view and test my updates. I would be very happy to see this strategy employed across the wider ecosystem.
The first reference I tried to make to “Degrees of Freedom” did not work, but links to most other pages did work fine. In any case, it was nice to get that feedback before pushing commits to a PR and waiting for test results.
The REPL spit out a lot of errors and warnings, but they didn’t seem to affect my changes.