Best practices for documentation

Hi all,
I would like to know the best practices for package/organization document handling. The current situation is here http://www.juliafem.org and here https://github.com/JuliaFEM/JuliaFEM.jl/tree/master/docs. We would like to move the documentation and especially examples out of the package repo. While making this move, if it is a smart thing to do in the first place, we would like to utilize all the best practices at the same time.

Currently, we are using Literate.jl for examples and it’s perfect for the job.

My feeling is that it makes sense to do this if your docs apply to multiple packages or a whole organization at once.

The current practice seems to be to have a repository like JuliaFEMDocs which contains and builds the docs, and have it deploy to the root of the docs subdomain e.g. docs.juliafem.org. For inspiration on how to set it all up you can check out the docs for DifferentialEquations.jl and Plots.jl.

Side note: I am not sure it makes sense to have .jl in the documentation repo name, since it is not an installable package.

One additional question. Is it possible to build and deploy pull request documents. I would like to encourage people to contribute just using the pen button at the Github repo web page. I mean a full workflow one could do in web browser.

I was always confused about the value of keeping documentation separate from the package? (Unless of course there are multiple packages) what is the reason for the split?

Very valid question, this is exactly why I am asking about the best practices. @ahojukka5 can you open why you think we should split the documentation from the code?

Maybe also in the spirit of this thread: many popular and important packages don’t provide inline documentation which I always foUnd very frustrating (as I’m sure others have as well) Maybe this could become part of “best practise” as well?

1 Like

For me, it’s because then I don’t need to worry about repo size with adding so many images. That grows fast even when I try to keep most as links to issues. The documentation is also just huge and points to its own subdomain which is easier to control from a separate repo. It also doesn’t add to the main build time from a separate repo either. And since it is an overall documentation, we found it easier to just keep that concern separate from others.

But we do need to go back and add some docstrings now that things have stabilized.

1 Like

This definitely used to make sense with the old Pkg. is this still relevant with Pkg3?

The current doc files are part of the tag (though not the built docs).

1 Like

I have couple of points, supporting the idea of having a separate documentation repository:

  1. Keep clean version history. If there is a lot of “pen”-edited doc files, they are overwhelming the Git history. It’s not a big problem actually, but I would not like to see 3-4 pages of commits “update theory.md” made of GitHub pen. This could in principle be avoided by having a separate doc-branch where it is squashed and merged periodically to the master. But then it is kind of same than having a separate repository.

  2. Permission control. It’s convenient to update documentation by editing directly in GitHub. I would like to go more the direction of Wikipedia, where you don’t even need an account to edit something. Of course I would not give such an access to the actual code. So we need to have two areas with different permissions.

With multiple packages, under assumptions that there are several maintainers, it makes sense that everyone takes care of the documentation of their own packages because they are the best ones to explain how their packages work and in what conditions. For a reader there should however be a single entry point collecting distributed documentation back to one single source.

Here is a quite good interview describing the hardness of making a good documentation:
https://numfocus.org/blog/matplotlib-lead-developer-explains-why-he-cant-fix-the-docs-but-you-can

If there are several people making documentation with different target audience in their mind, there will probably be variance in the details. Some parts of the documentation are too hard to follow and some of the parts are, with that same knowledge level, obvious. I have heard people saying, when they are working with two different open source projects, that the first one does not have clear enough documentation (=too hard to follow) and the other one is giving it too much details (=too much to read), so it looks it’s hard to make a documentation that fits for everyone needs. After all it looks that a good documentation is not a random success, it’s a result of careful design and good decisions.

1 Like

The DataFrames ecosystem is in a similar situation, with documentation scattered across different packages (DataFrames, CSV, CategoricalArrays, DataFramesMeta, Query, StatsModels…). I also wonder whether (and how) we should move docs to a common repository. It would still be nice to provide documentation in each package, since they are also useful on their own. However in our case it wouldn’t be absurd to have the main docs in the DataFrames repo (or in an associated repo), plus more technical docs in each package.

One additional question. Is it possible to build and deploy pull request documents.

Not at the moment, not with the usual workflow. Allowing PRs to deploy documentation would mean allowing anyone opening a PR access to the DOCUMENTER_KEY variable, which would give them full write access to your repository.

It might be possible to create a bot to help with that. So there’s an idea for a project to hack on if someone is bored :slightly_smiling_face: