The challenges of documenting generic functions

If you have both motivation and a clear vision for improving the documentation, that makes you the best qualified person to start writing the PRs

I have a clear vision, I think, but I don’t have expertise to see whether my vision is a good one or not.

How do you think I should proceed? The following is a long (and not very well-written) discussion of my vision.


All experienced julia programmers have internalized concepts like “iterables” and “indexables”, and so, they naturally expect any “indexable” to work as the A argument of findall(predicate, A).

But, this isn’t necessarily obvious to a newcomer. So, my idea is to explicitly write these “concepts” down as a document and use these concepts in the descriptions of the APIs throughout.

But at this point, the “write your own pull-request” work flow wouldn’t work, because I would need to convince everybody to use these concepts in the documentations.

Each time I ask for a helpful description, the experts’ answer has been that it’s impossible to give an “accurate” description because the API is completely general.

What I’m after is a description using a “convention”, which may not be 100% precise but would act as a helpful guidance when you read the manuals and it would also work as a strong constraint when you write your functions.

To elaborate on the latter point . . . For example, when you write your own function similar to findall(predicate, A), you are strongly encouraged to write it in such a way that any “indexable” works as your A.

Here is an example: Type of array index?
, which is a thread I started more than a year ago. There, some people recommended that I should use Int because it always works as index into an array. But, I ended up writing a function that doesn’t assume that. Then recently I pleasantly realized that my function works for a Dict!

In hindsight, I wrote my function in such a way that any “indexable” works as the argument.

If everybody had known the concept, then that discussion would have been much simpler.

I tentatively call these concepts “type classes”, although I’m not at all sure whether this is a good term or not. The Haskell compiler, for example, enforces at compile time that each argument belongs to the right type classes. The julia language itself doesn’t have such a mechanism, but by convention, the core standard library is designed as if there were type classes. (And I’m sure that the writers of this excellent set of functions had this kind of convention.)

So, the following is a very rough stab at what I have in mind as a document of these concepts:

“Iterable” or “Itarable Collection”: Formally, any object that responds to the iterate() function is an Iterable. When you query an Iterable, it returns one “element” in the collection; when you query it a second time, it returns the next element; and . . . ; when there is no more element, it returns nothing by convention. [Now, this description is not accurate. The function iterate() works differently. My description just provides a “mental model”.]

The for construct uses this property of Iterable. [Are there other standard implicit uses of iterate()?]

For example, a Vector is an Iterable:

# Vector returns 3, then 9, then 4, then 1, and then nothing.
for i in [3, 9, 4, 1]
   println(i)

Likewise, a Range is an Iterable:

# Range returns 2, then 3, . . ., then 9, and finally nothing.
for i in 2:9
   println(i)

Also, a Dict is an Iterable:

# Dict returns ("pi", 3.14), then ("ee", 2.72), . . .
for (k,v) in Dict("pi"=>3.14, "ee"=>2.72, . . . )
   println(k, "=>", v)

You can turn a lot of objects into an Iterable even if they themselves are not Iterables. For example, an input stream of text can be turned into an Iterable by the eachline() function:

istream = open("textfile.txt", "r")
# eachline(istream) returns 1st line, 2nd line, . . . ,
# until the stream is exhausted.
for ll in eachline(istream)
  println(ll)

Internally, eachline() returns an object that responds to the iterate() function.

Analogously, you can create a different Iterable out of an object which is already an Iterable by itself:

# Vector as an Iterable provides its elements one by one:
for x in vec # vec is an Iterable by itself
  println(x)
end
# eachindex creates an Iterable that provides the indices into the vector.
for i in eachindex(vec) # different Iterable
  println(i)

Finally, you can write your own function (like eachline() above) to create an Iterable out of an object, if it makes sense to do so. As an exercise, try writing a function that would return each character from a String:

for c in eachcharacter("this is a sample string.")
  println(c)

[Link to the documentation of iterate().]

Indexable:
Formally, an Indexable is any object A that responds to getindex(A,I).

If you give it a “key”, an Indexable gives you “the corresponding value”.
A prime example is Vector:

v = [3.2, 4.0, 9.5, 1.1]
v[2] # -> 4.0
getindex(v, 2) # equivalent to v[2]

Dict is also an Indexable.
. . . blah blah blah . . .

An Indexable is usually an Iterable. . . . blah blah blah.

6 Likes