Some thoughts on improving basic function documentation

In the recent giant thread on Julia popularity, and in the recent user/developer survey, the topic of documentation often comes up.

Usually when people bring it up, they focus on things for brand new users – guides, blogs, etc.

Focusing on new users is great, but as someone who’s no longer quite a newbie (but also makes zero claim to proficiency), I tend to run into a different set of challenges WRT documentation. Specifically, how to quickly get basic information about functions in a way that doesn’t disrupt my code-writing flow.

Rather than add to that giant thread, I figured I would post separately to mention a few aspects of documentation I personally find troublesome, perhaps to see whether others have similar issues, and maybe float a couple of ideas for discussion.

What I look for in function documentation

When I’m writing code (as a user, not a package developer – I’ve never tried developing a package), I mainly look to function documentation when I need to jog my memory about one of three things:

  1. What kind of inputs a function takes
  2. What kind of outputs a function produces
  3. What functions I most likely need to create the inputs a function needs, or to process the outputs it creates (e.g., associated functions)

Things that frustrate me when using the terminal help (‘?’)

  1. Certain function names (especially in statistics) are very common (e.g., fit), and way too many methods pop up when using ?. To pick a random package, if I do a clean install of GLM.jl (with nothing else) and look up fit, I get 5 screens of info (11 methods), dumped in my terminal all at once. Things like this make it hard to quickly look up information on one particular method – which is what I very often want to do.

  2. Function docstrings hardly ever mention details about output, only inputs, which seems like a rather significant omission to me. If I want to find out what sort of “thing” a function creates, I’ve usually got to run the function and then use a bunch of terminal commands (e.g., typeof, fieldnames, etc., which gets very hard if some of the inputs aren’t easily constructed on the fly); browse the source code; or sometimes go browse the package’s website, and hope it’s mentioned in an example vignette somewhere. All of those approaches are (imho) rather time-consuming and distracting.

  3. It’s very rare for packages to list “related functions” in docstrings, even though it’s mentioned in the Manual section on Writing Documentation. So after I run a function, if I have trouble remembering “do I run describe or summary or something else on this thing”, I can’t usually depend on the help to provide that. I most often have to open my browser, go to google, find the package website, then scan through one or more long, unstructured vignettes.

Suggested fixes

I know people will respond[**] with “PRs!” And fair enough (especially for the third issue above).

But I also think there are also some small changes to the Manual that might help the situation more generally, by providing more substantial guidance to people writing function documentation, and helping set some firmer community expectations around documentation.

I thought I’d post them here to see if they are things that people find reasonable. Specifically:

  1. Make language in the Manual clearer and stronger about the elements of documentation that are considered essential. Right now, the Manual’s section on Writing Documentation puts important, specific instructions (e.g., “Always show the function signature at the top”) on the same list as much less-important stylistic advice (e.g., “Don’t repeat yourself”), and fails to mention certain aspects of documentation (e.g., of outputs) entirely. It also prefaces the entire list using rather weak language (“we recommend following…”), which (imho) sets a weak tone about community expectations for those things. IMHO, the Manual needs to more clearly describe which elements of documentation are considered essential for functions and types, and differentiate those from aspects that are merely “recommended”, or stylistic things.

  2. As part of the above, make sure to mention function output/results. For example, strongly suggest package authors add a “Results” section to function docstrings to provide some minimal information/references on the output (at least for the functions users are likely to use). E.g., “The output of this function is a MyType.” (This assumes, of course, that “MyType” is well-documented in the first place, and that someone could easily do ? MyType and get something useful. If that’s not the case, then key elements of the output, like any fieldnames, need to be spelled out here.)

  3. Provide advice in the Manual about dealing with (especially documenting) too-common/popular function names. My understanding from spending time looking for fixes for this is that there’s not an easy technical fix to allow pulling up help on a single specific method added/induced by a particular package. But a next-best step would be to give advice to package authors about ways to handle it. To continue the GLM.jl example I mention above, that package handles it by creating a unique alias that can be looked up. This doesn’t fix the issue (fit still produces pages of methods, if I have to look it up for other reasons), but does make things a little better.

  4. This is a more general plea, to package authors: I love example vignettes. They are wonderful ways of showcasing functionality to new users. But online vignettes and model zoos are not a substitute for structured documentation of specific functions. I should be able to get enough basic information to use a function without going to Google to read your vignettes, and if my internet goes down.

Thoughts?

[**] People might also respond with “you’re using Julia wrong”. Again, fair enough. But if that’s the case, I think it might say something about the learning pipeline if a person can use Julia for several years but never discover the “correct”/best way to quickly and easily look up some of these things on the fly.

25 Likes

I couldn’t agree more. All of the things you list are hallmarks of good documentation, and many projects (including the Julia core/stdlib documentation) fall short.

It’s just extremely hard to enforce good documentation. Having “Documentation linters” would be good in theory, but seems nearly impossible to get right. I’d say the best thing to do is to add “how to write documentation” guidelines to the manual that include everything you mention, as well as a pointer to the Diataxis system (at least for packages, it might not be a perfect fit for the language docs).

Beyond that, all you can do is raise awareness and continue to convince as many developers (core or otherwise) that stepping up their documentation game will dramatically improve the robustness of the ecosystem.

2 Likes

Another thing that’s often missing: If you define a type (struct), add a list of methods that are intended to be used with that struct in the docstring – especially for abstract types, like here

5 Likes

This omits the best thing you can do: submit PRs that improve documentation. Not everyone is great at writing code and documentation, and often a newcomer who has just finished, at great effort, figuring out how something works is literally the best person in the world at that moment to make the documentation better. (When you wrote the code, you know too much about the design to approach it with a “newbie” persepctive.)

It would be great if more people leveraged their pain to ensure that future Julia users don’t suffer the same way.

11 Likes

Some docstrings use -> to indicate what’s returned:

help?> findmax

  findmax(f, domain) -> (f(x), index)

  Return a pair of a value in the codomain (outputs of f) and the index of the corresponding value
  in the domain (inputs to f) such that f(x) is maximised. 

Arguably many more should. Making a giant PR adding -> to many functions would be a way to cement this. Say every function in Base which may return two things, for a start?

(I did this some time ago to add many “see also” cross-links, and now I think it’s more common to add those.)

While I know you suggest editing the manual’s section on how to write docs, I’m not sure how many people read that – my guess is that package authors more often follow what Julia does, not what it says you should do.

5 Likes

I’d like to add an additional point to that list:

  1. Expected/Known failure modes.

It’s frustrating to no end when it’s unclear when or why a function throws an error.

10 Likes

I am keeping track of this thread! In the next JuliaCon I intend to submit a workshop for “writing good documentation for a Julia package”, and I will use some of the concerns raised here as points to make sure you address!

18 Likes

It would also be great to have a bit more of a stringent PR review process, such that this kind of after-the-fact clean up of documentation for new features is at least reduced. I don’t think people are asking for big example docs, but rather minimal “what is required/what is given/what are the failure modes” of a function. That should definitely be in the reach of anyone who wants to get a new feature merged, as they literally just wrote it (presumably with something specific in mind).

2 Likes

I’m decidedly more optimistic about tooling being able to help this issue. However, a prerequisite for that is having some established conventions(s) for structured documentation in docstrings. Think the [Lang]Docs and [Lang]Oxygens of the world. Julia is currently in a small minority of languages which don’t have any broadly-accepted standards for this, and it shows in the lack of consistency across different docstring formats in the ecosystem.

To be clear, I’m not suggesting we go full JavaDoc and add an @ element for every conceivable thing someone might want in a docstring. But we can go a lot farther than the current status quo of anything goes. I know DocStringExtensions exists, but it’s more focused on helping fill in the details after one has established a structure docstrings should conform to.

1 Like

Alternatively, I like to use the syntax

```julia
(y, i) = findmax(f, domain)
```

and then talk about what y and i are further down in the docstring.

5 Likes

Now that I look, there are in fact two patterns in Base for indicating the return:

help?> extrema

  extrema(itr; [init]) -> (mn, mx)

  Compute both the minimum mn and maximum mx element in a single pass, and return them as a
  2-tuple.

────────────────────────────────────────────────────────────────────────────────────────────────

  extrema(A::AbstractArray; dims) -> Array{Tuple}

  Compute the minimum and maximum elements of an array over the given dimensions.

Perhaps the first should always be written min, max = extrema(itr; [init]) as you suggest, and reserve -> T for types?

3 Likes

What I find most unfortunate about the section on Writing Documentation is that it actively discourages argument lists, and the community seems to have taken this to heart.

  1. Only provide an argument list when really necessary.

    For simple functions, it is often clearer to mention the role of the arguments directly in the description of the function’s purpose. An argument list would only repeat information already provided elsewhere.

The argument list is the number one thing I’m looking for when I’m looking at documentation, even if it’s for a “simple” function. I want to know exactly what the function expects and accepts, and a paragraph of prose simply can’t provide the same clarity or communicate the details as efficiently.

The number two thing I’m looking for is a similar list with the return values, however, as OP pointed out, the guidelines don’t discuss how to document returns at all.

I can only speak for myself, but whenever I’ve tried to write documentation I’ve always looked at the style guide for the relevant language/community to see what conventions experienced people seem to find useful. An excellent example is the numpy docstring style guide. It’s detailed and unambiguous, but not verbose, and emphasizes structured technical writing over prose, most importantly parameter and return lists. As a result, the numpy/scipy docstrings are amazingly clear and always provide the information I’m looking for (or didn’t know I needed).

8 Likes

The Matlab documentation is also a “best-in-class” example, IMHO.

3 Likes

way too many methods pop up when using ?

You can reduce the number of methods that appear if you start filling in the arguments of the function in the help mode.

help?> sort
search: sort sort! sortperm sortperm! sortslices insorted Cshort issorted QuickSort MergeSort Cushort partialsort

  sort(v; alg::Algorithm=defalg(v), lt=isless, by=identity, rev::Bool=false, order::Ordering=Forward)


  Variant of sort! that returns a sorted copy of v leaving v itself unmodified.
...
julia> using DataFrames

julia> df = DataFrame(x = [3,2,1], y = ["c","b","a"])
3×2 DataFrame
 Row │ x      y
     │ Int64  String
─────┼───────────────
   1 │     3  c
   2 │     2  b
   3 │     1  a

help?> sort(df, :x)
  sort(df::AbstractDataFrame, cols=All();
       alg::Union{Algorithm, Nothing}=nothing,
       lt::Union{Function, AbstractVector{<:Function}}=isless,
       by::Union{Function, AbstractVector{<:Function}}=identity,
       rev::Union{Bool, AbstractVector{Bool}}=false,
       order::Union{Ordering, AbstractVector{<:Ordering}}=Forward,
       view::Bool=false,
       checkunique::Bool=false)


  Return a data frame containing the rows in df sorted by column(s) cols. Sorting on multiple columns is done
  lexicographically.
...

Function docstrings hardly ever mention details about output, only inputs

I think the Julia documentation needs to make a decision about how results should be documented so there aren’t so many variations around.

  • Julia itself largely seems to not document the return values if it can avoid it. Otherwise it uses f(x) -> y.
  • BlueStyle says to write f(x::Int) -> Int and avoid giving the output a name.

I like @goerz style better than either of these methods since it documents the function call the same way you would write it (and doesn’t pretend like -> can be used to define variables). I also prefer to only document types in the # Arguments and # Returns lists below the function signatures and leave the signatures as just the variables names so they are shorter. Plus, if multiple methods use the same variables, then you only need to write out the detailed type information in the list once.

It’s very rare for packages to list “related functions” in docstrings

If you can guess at the related function name and signature, then ? f(x) can help you check. I agree it would be good to encourage more use of “related functions” though.

3 Likes

I really need to make my WIP CheckDoc.jl public, with the various kinks… documentation of error types that may be thrown is one of the checks I’ve implemented.

Perhaps if it’s public I could get help in solving some of the difficulties I’m having that have been (mentally) blocking me from making it public :stuck_out_tongue:

6 Likes

Shouldn’t f()::T be the notation for types?

6 Likes

R’s documentation format has served me well. The structure is generally uniform across many packages. Example. Another.

  • Description
  • Usage
  • Arguments
  • (Details)
  • Return value
  • References
  • See also
  • Examples
3 Likes

That’s a good point about ->. When I was writing, I was mainly thinking about my interactions with the larger package ecosystem (e.g., not so much Base). :slight_smile:

1 Like

I’m also an R user (or used to be, anyway), and to be honest, their documentation guidance is exactly what I have in the back of my head when I think about this issue.

Whatever one’s opinions about some of features of R as a language, I think it does an excellent job when it comes to documentation.

Which is I think is due to two things: (1) the very specific, detailed guidance the developer’s guide provides, and (2) strong community norms around what documentation “should” contain (which is partly driven by and supported by (1), imho).

I hesitated to focus on R’s way of doing things because I wasn’t sure if people have (or want to have) different workflows around function documentation, and hence different expectations about what the “essential” documentation for a function should include. I also didn’t want to turn it into a “well, my OLD language” vs. Julia discussion. :slight_smile:

6 Likes

Thanks very much for pointing that out! I actually wasn’t aware that I could look up specific methods that way in ?.

That being said, while I think that’s a perfectly reasonable approach with functions that have simpler inputs (e.g., like the kind one sees in Base), I’m not sure its feasible for functions with more complicated inputs (e.g., types that are hard to construct on the fly, functions with long lists of inputs, etc.).

1 Like