Efficient approach to generate optimised methods for special cases of a general function

TimHargreaves · February 28, 2024, 11:55am

I am a fairly new user of Julia so may have misconceptions about the language’s capabilities and behaviour.

Problem Statement

I have implemented the most general form of an algorithm I am interested in, which is parameterised by multiple vectors and matrices. Although this general form is sometimes used, most of the time I will only be interested in the cases where some of the vectors/matrices are zero, or the matrices are the identity/multiples of the identity.

As a minimal example, let’s consider the function

function linear_transformation(x::Vector{Float64}, A::Matrix{Float64}, b::Vector{Float64})
    return A * x + b
end

I would like to have automatically generated, optimised versions of this function for the cases:

A is zero
A is the identity
b is zero

E.g. for the first case, I would ideally like the code to run as quickly as simply returning b.

It’s also worth noting that such optimisations will likely have ripple effects—one matrix being zero might cause another to be zero, and so on.

I appreciate that this is a lot to ask for (especially for a non-symbolic, eagerly executed language) so I’m not expecting a perfect solution to this. Instead, I would love to have an open discussion about this topic and hopefully find some ways I can get closer to my desired outcome.

Initial Thoughts

A manual way of approaching this would be to write variants of linear_transformation dispatching on zero values. Obviously, this is not a scalable solution, but I am also under the impression that value dispatch is discouraged in Julia and leads to inefficient code.

The UniformScaling matrix seems related to what I’m asking for, though I’m not completely sure it does exactly what I’m after. Perhaps, creating similar Identity and Zero matrices inspired by this is a good approach.

Can macros be used to achieve this task?

gdalle · February 28, 2024, 12:10pm

The natural reflex is to add an if / else to your function like so:

function linear_transformation(x, A, b)
    if all(iszero, A)
        return b
    elseif ...
        ...
    end
end

but that will be rather inefficient. First because of the if, and second because statements like all(iszeros, A) need to check every single coefficient.

The better method is to encode the matrix structure in the type. You have correctly spotted that our standard library LinearAlgebra does this, and already has an identity operator. Meanwhile, FillArrays.jl has types to represent arrays filled with zeros or ones. Both of these libraries overload the necessary operations to make A * x + b as fast as can be without additional intervention.

To sum up, the only thing you need to do is allow arbitrary matrix / vector subtypes in your function:

linear_transformation(x::AbstractVector, A::AbstractMatrix, b::AbstractVector) = A * x + b

Then you can apply it with the following varieties of arrays:

using LinearAlgebra, FillArrays
A = rand(2, 2);  # dense matrix
A = I(2);  # identity matrix
A = FillArrays.Zeros(2, 2);  # zero matrix

Even better, you can combine types of A and b however you want, and the multiple dispatch will still find the optimal method.

Dan · February 28, 2024, 12:11pm

Not all coefficients for the false case.

TimHargreaves · February 28, 2024, 12:27pm

Gotcha. That sounds great. Thank you very much.

One follow-up question:

This might be a case over over-engineering, but is UniformScaling as efficient as it can be for the unscaled identity matrix case? It is my understanding that I is indistinguishable from 1.0 * I by the type system and so multiplying a matrix by I would involve multiplying each element (unnecessarily) by 1.0. Is this correct? If so, I assume I would just have to define something similar to I that is strictly for unscaled identity matrices.

gdalle · February 28, 2024, 12:47pm

Indeed you’re right and the performance difference is visible too, although most of the hit is actually due to the new allocation and not the multiplication by 1. If we use in-place multiplication with LinearAlgebra.mul!, the difference goes away, so I assume 1 * x is optimized (at least for integer 1):

using BenchmarkTools, LinearAlgebra

struct Identity end
Base.:*(::Identity, x) = x
LinearAlgebra.mul!(y, ::Identity, x) = y .= x

x = rand(1000)
y = zeros(1000)

julia> @btime (A * $x) setup=(A = Identity());
  2.455 ns (0 allocations: 0 bytes)

julia> @btime (A * $x) setup=(A = I);
  533.193 ns (1 allocation: 7.94 KiB)

julia> @btime mul!($y, A, $x) setup=(A = Identity());
  74.981 ns (0 allocations: 0 bytes)

julia> @btime mul!($y, A, $x) setup=(A = I);
  64.121 ns (0 allocations: 0 bytes)

PeterSimon · March 2, 2024, 6:46pm

You mentioned that allocation is the main culprit here but I think it’s worth pointing out that the following definition

is not a fair way to compare against multiplication by I, since multiplication using * is supposed to create a new matrix, not provide a new name for the old matrix. If you instead define

Base.:*(::Identity, x) = copy(x)

then the first two timings become on my machine

julia> @btime (A * $x) setup=(A = Identity());
  502.525 ns (1 allocation: 7.94 KiB)

julia> @btime (A * $x) setup=(A = I);
  505.102 ns (1 allocation: 7.94 KiB)

so we can conclude that multiplication by the uniform scaling operator is maximally efficient even when using the * operator.

Topic		Replies	Views
Possibility of a generically sized vector of 1s and 0s similar to the UniformScalingOperator in Juila v0.7 General Usage	17	1230	April 14, 2018
Specialized matrix-matrix multiplication algorithm New to Julia question , performance , linearalgebra	5	420	July 9, 2024
Uniform scaling inplace addition with matrix Performance blas	13	1530	May 7, 2021
Efficient finite difference operators Performance diffeq	19	2527	August 8, 2018
Avoid unnecessary computations with (sparse) Arrays General Usage array	2	377	December 19, 2019

Efficient approach to generate optimised methods for special cases of a general function

Related topics