In the recent giant thread on Julia popularity, and in the recent user/developer survey, the topic of documentation often comes up.
Usually when people bring it up, they focus on things for brand new users – guides, blogs, etc.
Focusing on new users is great, but as someone who’s no longer quite a newbie (but also makes zero claim to proficiency), I tend to run into a different set of challenges WRT documentation. Specifically, how to quickly get basic information about functions in a way that doesn’t disrupt my code-writing flow.
Rather than add to that giant thread, I figured I would post separately to mention a few aspects of documentation I personally find troublesome, perhaps to see whether others have similar issues, and maybe float a couple of ideas for discussion.
When I’m writing code (as a user, not a package developer – I’ve never tried developing a package), I mainly look to function documentation when I need to jog my memory about one of three things:
- What kind of inputs a function takes
- What kind of outputs a function produces
- What functions I most likely need to create the inputs a function needs, or to process the outputs it creates (e.g., associated functions)
Certain function names (especially in statistics) are very common (e.g.,
fit), and way too many methods pop up when using
?. To pick a random package, if I do a clean install of GLM.jl (with nothing else) and look up
fit, I get 5 screens of info (11 methods), dumped in my terminal all at once. Things like this make it hard to quickly look up information on one particular method – which is what I very often want to do.
Function docstrings hardly ever mention details about output, only inputs, which seems like a rather significant omission to me. If I want to find out what sort of “thing” a function creates, I’ve usually got to run the function and then use a bunch of terminal commands (e.g.,
fieldnames, etc., which gets very hard if some of the inputs aren’t easily constructed on the fly); browse the source code; or sometimes go browse the package’s website, and hope it’s mentioned in an example vignette somewhere. All of those approaches are (imho) rather time-consuming and distracting.
It’s very rare for packages to list “related functions” in docstrings, even though it’s mentioned in the Manual section on Writing Documentation. So after I run a function, if I have trouble remembering “do I run
summaryor something else on this thing”, I can’t usually depend on the help to provide that. I most often have to open my browser, go to google, find the package website, then scan through one or more long, unstructured vignettes.
I know people will respond[**] with “PRs!” And fair enough (especially for the third issue above).
But I also think there are also some small changes to the Manual that might help the situation more generally, by providing more substantial guidance to people writing function documentation, and helping set some firmer community expectations around documentation.
I thought I’d post them here to see if they are things that people find reasonable. Specifically:
Make language in the Manual clearer and stronger about the elements of documentation that are considered essential. Right now, the Manual’s section on Writing Documentation puts important, specific instructions (e.g., “Always show the function signature at the top”) on the same list as much less-important stylistic advice (e.g., “Don’t repeat yourself”), and fails to mention certain aspects of documentation (e.g., of outputs) entirely. It also prefaces the entire list using rather weak language (“we recommend following…”), which (imho) sets a weak tone about community expectations for those things. IMHO, the Manual needs to more clearly describe which elements of documentation are considered essential for functions and types, and differentiate those from aspects that are merely “recommended”, or stylistic things.
As part of the above, make sure to mention function output/results. For example, strongly suggest package authors add a “Results” section to function docstrings to provide some minimal information/references on the output (at least for the functions users are likely to use). E.g., “The output of this function is a MyType.” (This assumes, of course, that “MyType” is well-documented in the first place, and that someone could easily do
? MyTypeand get something useful. If that’s not the case, then key elements of the output, like any fieldnames, need to be spelled out here.)
Provide advice in the Manual about dealing with (especially documenting) too-common/popular function names. My understanding from spending time looking for fixes for this is that there’s not an easy technical fix to allow pulling up help on a single specific method added/induced by a particular package. But a next-best step would be to give advice to package authors about ways to handle it. To continue the GLM.jl example I mention above, that package handles it by creating a unique alias that can be looked up. This doesn’t fix the issue (
fitstill produces pages of methods, if I have to look it up for other reasons), but does make things a little better.
This is a more general plea, to package authors: I love example vignettes. They are wonderful ways of showcasing functionality to new users. But online vignettes and model zoos are not a substitute for structured documentation of specific functions. I should be able to get enough basic information to use a function without going to Google to read your vignettes, and if my internet goes down.
[**] People might also respond with “you’re using Julia wrong”. Again, fair enough. But if that’s the case, I think it might say something about the learning pipeline if a person can use Julia for several years but never discover the “correct”/best way to quickly and easily look up some of these things on the fly.