# Writing for generality

I re-discovered this old thread, which I started well more than a year ago, when I was a complete newbie:

I wanted to write a `findall`-like function that partitions the indices into two sets according to a criterion:

``````function partition(vals, thresh)
idx_l = Vector{keytype(vals)}()
idx_h = Vector{keytype(vals)}()
for i in eachindex(vals)
(vals[i] < thresh) ? push!(idx_l, i) : push!(idx_h, i)
end
(idx_l, idx_h)
end
``````

Note the `keytype(vals)` bit. People argued, why dwell on the type of the index? Why not just use `Int[]`. Fair enough.

But, I’ve recently realized that the above function works for a `Dict`, as is! If you used `Int`, it wouldn’t work for a `Dict`.

For this particular case, the point became moot because an index-free solution was found:

``````function partition(v, thresh)
idx_l = findall(<(thresh), v)
idx_h = setdiff(eachindex(v), idx_l)
(idx_l, idx_h)
end
``````

which also works for a `Dict`. (I’m not comfortable with this solution, though, because the types of `idx_l` and `idx_h` are different.)

In general, you should assume minimum things on your variables as long as your code remains simple. For example, you shouldn’t assume that your index is one or another integer type unless doing so is necessary or it significantly simplifies your code.

Another example is `Float64`. I usually write `Float64` without thinking much. But, sometimes I ask myself, why am I writing `Float64` when any real number will do? Should I be writing `Real` instead?

But, this is hard to learn for a newcomer. There are no tutorials or textbooks to tell you something like the above. The community doesn’t seem to have consensus on the “best practice”.

2 Likes

It works for much more than just `Vector` and `Dict` - it works for anything that has `keytype`, `getindex` and `eachindex`!

In a sense, the “type” of `vals` is a type that has methods for each of the above. Writing for generality means asking the question “What is truly required of this function?”.

It’s not actually index free, because `setdiff` still operates on the indices of `v`, just like your single loop would have done. The only difference between the two is that your first version only loops over `v` (and its indices) once, while your second version necessarily loops twice (in the worst case of all elements in `v` being less than your treshold).

I think we don’t have a “best practice” for this because we can’t (acurately) describe what a function actually requires in a type. We can’t currently annotate `v` with a type describing the exact methods required/expected by `partition` and expect it to behave/work for types that don’t subtype that (abstract) type.

To me, generic programming is thinking about behaviours and abstract properties you can assume to hold. I.e., in your example the generic function `partition` should be such that

``````all(getindex.(Ref(v), il) .< thresh) && all(getindex.(Ref(v), ih) .>= thresh) && union(Set(getindex.(Ref(v), il)), getindex.(Ref(v), ih)) == Set(values(v))
``````

whenever `il, ih = partition(v, thresh)`.
As long as these properties hold, I don’t care about the actual concrete types representing `v` or `il` and `ih`. All that matters is that the generic function can be used in the intended way, i.e., as specified by its properties. Another context to thing about is linear algebra, i.e., what do you need to know about the types in order to use a matrix inverse `inv(A)`? Instead, think about its properties …