Efficient approach to generate optimised methods for special cases of a general function

I am a fairly new user of Julia so may have misconceptions about the language’s capabilities and behaviour.

Problem Statement

I have implemented the most general form of an algorithm I am interested in, which is parameterised by multiple vectors and matrices. Although this general form is sometimes used, most of the time I will only be interested in the cases where some of the vectors/matrices are zero, or the matrices are the identity/multiples of the identity.

As a minimal example, let’s consider the function

function linear_transformation(x::Vector{Float64}, A::Matrix{Float64}, b::Vector{Float64})
    return A * x + b
end

I would like to have automatically generated, optimised versions of this function for the cases:

  • A is zero
  • A is the identity
  • b is zero

E.g. for the first case, I would ideally like the code to run as quickly as simply returning b.

It’s also worth noting that such optimisations will likely have ripple effects—one matrix being zero might cause another to be zero, and so on.

I appreciate that this is a lot to ask for (especially for a non-symbolic, eagerly executed language) so I’m not expecting a perfect solution to this. Instead, I would love to have an open discussion about this topic and hopefully find some ways I can get closer to my desired outcome.

Initial Thoughts

A manual way of approaching this would be to write variants of linear_transformation dispatching on zero values. Obviously, this is not a scalable solution, but I am also under the impression that value dispatch is discouraged in Julia and leads to inefficient code.

The UniformScaling matrix seems related to what I’m asking for, though I’m not completely sure it does exactly what I’m after. Perhaps, creating similar Identity and Zero matrices inspired by this is a good approach.

Can macros be used to achieve this task?

The natural reflex is to add an if / else to your function like so:

function linear_transformation(x, A, b)
    if all(iszero, A)
        return b
    elseif ...
        ...
    end
end

but that will be rather inefficient. First because of the if, and second because statements like all(iszeros, A) need to check every single coefficient.

The better method is to encode the matrix structure in the type. You have correctly spotted that our standard library LinearAlgebra does this, and already has an identity operator. Meanwhile, FillArrays.jl has types to represent arrays filled with zeros or ones. Both of these libraries overload the necessary operations to make A * x + b as fast as can be without additional intervention.

To sum up, the only thing you need to do is allow arbitrary matrix / vector subtypes in your function:

linear_transformation(x::AbstractVector, A::AbstractMatrix, b::AbstractVector) = A * x + b

Then you can apply it with the following varieties of arrays:

using LinearAlgebra, FillArrays
A = rand(2, 2);  # dense matrix
A = I(2);  # identity matrix
A = FillArrays.Zeros(2, 2);  # zero matrix

Even better, you can combine types of A and b however you want, and the multiple dispatch will still find the optimal method.

3 Likes

Not all coefficients for the false case.

Gotcha. That sounds great. Thank you very much.

One follow-up question:

This might be a case over over-engineering, but is UniformScaling as efficient as it can be for the unscaled identity matrix case? It is my understanding that I is indistinguishable from 1.0 * I by the type system and so multiplying a matrix by I would involve multiplying each element (unnecessarily) by 1.0. Is this correct? If so, I assume I would just have to define something similar to I that is strictly for unscaled identity matrices.

Indeed you’re right and the performance difference is visible too, although most of the hit is actually due to the new allocation and not the multiplication by 1. If we use in-place multiplication with LinearAlgebra.mul!, the difference goes away, so I assume 1 * x is optimized (at least for integer 1):

using BenchmarkTools, LinearAlgebra

struct Identity end
Base.:*(::Identity, x) = x
LinearAlgebra.mul!(y, ::Identity, x) = y .= x

x = rand(1000)
y = zeros(1000)
julia> @btime (A * $x) setup=(A = Identity());
  2.455 ns (0 allocations: 0 bytes)

julia> @btime (A * $x) setup=(A = I);
  533.193 ns (1 allocation: 7.94 KiB)

julia> @btime mul!($y, A, $x) setup=(A = Identity());
  74.981 ns (0 allocations: 0 bytes)

julia> @btime mul!($y, A, $x) setup=(A = I);
  64.121 ns (0 allocations: 0 bytes)
1 Like

You mentioned that allocation is the main culprit here but I think it’s worth pointing out that the following definition

is not a fair way to compare against multiplication by I, since multiplication using * is supposed to create a new matrix, not provide a new name for the old matrix. If you instead define

Base.:*(::Identity, x) = copy(x)

then the first two timings become on my machine

julia> @btime (A * $x) setup=(A = Identity());
  502.525 ns (1 allocation: 7.94 KiB)

julia> @btime (A * $x) setup=(A = I);
  505.102 ns (1 allocation: 7.94 KiB)

so we can conclude that multiplication by the uniform scaling operator is maximally efficient even when using the * operator.

1 Like