Keeping well-defined mathematical functions orthogonal to a project structs design

I have a doubt regarding a design pattern. To expose it I’m going to refer to IterativeSolvers.jl’s Conjugate Gradient algorithm.

For the end user, they supply the function cg(A, b; kwargs...), which solves the equation Ax = b and returns the solution x. To solve it, the package internally defines a mutable struct containing all vectors and matrices that will be needed in the iterative algorithm,

mutable struct CGIterable{matT, solT, vecT, numT <: Real}
    A::matT
    x::solT
    r::vecT
    c::vecT
    u::vecT
    tol::numT
    residual::numT
    prev_residual::numT
    maxiter::Int
    mv_products::Int
end

and essentially all the work is done in the following function, which performs a step of the conjugate gradient algorithm, updating the struct of type CGIterable:

###############
# Ordinary CG #
###############

function iterate(it::CGIterable, iteration::Int=start(it))
    # Check for termination first
    if done(it, iteration)
        return nothing
    end

    # u := r + βu (almost an axpy)
    β = it.residual^2 / it.prev_residual^2
    it.u .= it.r .+ β .* it.u

    # c = A * u
    mul!(it.c, it.A, it.u)
    α = it.residual^2 / dot(it.u, it.c)

    # Improve solution and residual
    it.x .+= α .* it.u
    it.r .-= α .* it.c

    it.prev_residual = it.residual
    it.residual = norm(it.r)

    # Return the residual at item and iteration number as state
    it.residual, iteration + 1
end

The problem I see with this is that CG is a mathematically well-defined algorithm which needs a specific set of matrices and vectors as input to produce an output. However, the implementation above depends on the particular design of the struct of type CGIterable, and can only be used if this object is built.

For me it seems more natural to have instead a function CG_step that is independent of the particular design of the structs in my project, such as:

function CG_step(x, A, r, c, u, residual, prev_residual)
    # Check for termination first
    # if done(it, iteration)
    #     return nothing
    # end

    # u := r + βu (almost an axpy)
    β = residual^2 / prev_residual^2
    u .= r .+ β .* u

    # c = A * u
    mul!(c, A, u)
    α = residual^2 / dot(u, c)

    # Improve solution and residual
    x .+= α .* u
    r .-= α .* c

    prev_residual = residual
    residual = norm(r)

    # Return the residual at item and iteration number as state
    residual, iteration + 1
end

and then have an additional function so that I can use my structs throughout my project:

function iterate(it::CGIterable, iteration::Int=start(it))
    return CG_step(it.x, it.A, it.r, it.c, it.u, it.residual, it.prev_residual)
end

That is, what I am trying to convey is that I would think it is desirable to make well-defined mathematical functions that do all the job orthogonal to the particular design of project’s structs. This way, if I ever want to change my structs, I don’t need to touch the actually important function (which could potentially introduce bugs), but just the interface function defined just above. The actual mathematical function would be coded “once and for all” in Julia.

Of course, I guess that I am missing some important points that make this approach undesirable, but I wanted to ask about your opinions on this. What I said makes any sense? What are the advantages/disadvantages of doing what I said? Why didn’t they code the algorithm this way?

4 Likes

Your version can work well in some cases, but look at the downsides:

  • You need to pass a large number of parameters to your CG_step. This could get unwieldy in other cases.
  • How can the function store the changes it makes to the parameters? In-place assignments of arrays will work, but updating scalars will not work. You can play with multiple return values as you do with residual (but you don’t use it actually?). But this makes things more brittle and complicated.

What’s best depends on the particular problem, but in general I think wrapping many related values in a struct is good software engineering practice.

And tangentially related is this quote from Linus Torvalds (see discussion here):

Bad programmers worry about the code. Good programmers worry about data structures and their relationships.

It’s often a good idea to prioritize the choice of data structure even if doing so you lose some mathematical purity.

2 Likes

I guess it’s a good point. My biggest concern was that, if I wanted to check my own implementation of the algorithm with IterativeSolvers.jl’s one, I would need to first create their data structures to be able to call the function, so that what I gained in passing fewer inputs is lost because I still need to create a struct beforehand.

However, I agree that the code is much cleaner from the developer point of view.

1 Like

Multiple dispatch can help with this. While you could write different versions of the function for different structs, you could also write adapters.

function CG_step(x, A, r, c, u, residual, prev_residual; maxiter, mv_products)
   cgiter = CGIterable(A, x, r, c, u, tol, residual, prev_residual, maxiter, mv_products)
   return IterativeSolvers.iterate(cgiter)
end

function IterativeSolvers.iterate(it::MyCGIterable)
    cgiter = convert(IterativeSolvers.CGIterable, it)
    return IterativeSolvers.iterate(cgiter)
end

function Base.convert(::Type{IterativeSolvers.CGIterable}, it::MyCGIterable)
    return IterativeSolvers.CGIterable( ... )
end
1 Like