Writing for generality

ryofurue · August 17, 2023, 3:21pm

I re-discovered this old thread, which I started well more than a year ago, when I was a complete newbie:

I wanted to write a findall-like function that partitions the indices into two sets according to a criterion:

function partition(vals, thresh)
  idx_l = Vector{keytype(vals)}()
  idx_h = Vector{keytype(vals)}()
  for i in eachindex(vals)
    (vals[i] < thresh) ? push!(idx_l, i) : push!(idx_h, i)
  end
  (idx_l, idx_h)
end

Note the keytype(vals) bit. People argued, why dwell on the type of the index? Why not just use Int[]. Fair enough.

But, I’ve recently realized that the above function works for a Dict, as is! If you used Int, it wouldn’t work for a Dict.

For this particular case, the point became moot because an index-free solution was found:

function partition(v, thresh)
  idx_l = findall(<(thresh), v)
  idx_h = setdiff(eachindex(v), idx_l)
  (idx_l, idx_h)
end

which also works for a Dict. (I’m not comfortable with this solution, though, because the types of idx_l and idx_h are different.)

In general, you should assume minimum things on your variables as long as your code remains simple. For example, you shouldn’t assume that your index is one or another integer type unless doing so is necessary or it significantly simplifies your code.

Another example is Float64. I usually write Float64 without thinking much. But, sometimes I ask myself, why am I writing Float64 when any real number will do? Should I be writing Real instead?

But, this is hard to learn for a newcomer. There are no tutorials or textbooks to tell you something like the above. The community doesn’t seem to have consensus on the “best practice”.

Sukera · August 17, 2023, 4:21pm

It works for much more than just Vector and Dict - it works for anything that has keytype, getindex and eachindex!

In a sense, the “type” of vals is a type that has methods for each of the above. Writing for generality means asking the question “What is truly required of this function?”.

It’s not actually index free, because setdiff still operates on the indices of v, just like your single loop would have done. The only difference between the two is that your first version only loops over v (and its indices) once, while your second version necessarily loops twice (in the worst case of all elements in v being less than your treshold).

I think we don’t have a “best practice” for this because we can’t (acurately) describe what a function actually requires in a type. We can’t currently annotate v with a type describing the exact methods required/expected by partition and expect it to behave/work for types that don’t subtype that (abstract) type.

bertschi · August 17, 2023, 4:21pm

To me, generic programming is thinking about behaviours and abstract properties you can assume to hold. I.e., in your example the generic function partition should be such that

all(getindex.(Ref(v), il) .< thresh) && all(getindex.(Ref(v), ih) .>= thresh) && union(Set(getindex.(Ref(v), il)), getindex.(Ref(v), ih)) == Set(values(v))

whenever il, ih = partition(v, thresh).
As long as these properties hold, I don’t care about the actual concrete types representing v or il and ih. All that matters is that the generic function can be used in the intended way, i.e., as specified by its properties. Another context to thing about is linear algebra, i.e., what do you need to know about the types in order to use a matrix inverse inv(A)? Instead, think about its properties …

Topic		Replies	Views
Type of array index? New to Julia	26	2394	July 13, 2024
Array{Float64,2} -> Array{Float64,1} General Usage indexing , array	5	2796	September 4, 2020
How do I implement array indexing? General Usage	4	291	December 9, 2020
Sort indices based on value New to Julia	13	416	December 8, 2023
1 based (array) indexing New to Julia indexing , arrays	24	2042	November 28, 2022

Writing for generality

Related topics