Performance of Unitful Arrays

Suppose I have a large matrix M composed of homogeneous Unitful data, and a vector v, which contains homogeneus Unitful data as well. (By Unitful data I mean the data with the types from Unitful.jl)

Does the calculation of matrix-vector product M*v suffers from the fact that the arrays are Unitful?
For example, can it use fast BLAS implementations, or does it fall back to generic implementation instead?

1 Like

I haven’t checked (have you?), but there’s no reason it needs to suffer. If in practice it does, you could fix it by adding a method that strips the units and adds them back at the end.

3 Likes

That is true. On other hand, it seems that such stripping and restripping of units requires essentially the copying of array. And, if the matrix-vector product sits in a loop, such copying would lead to a substantial overhead.

It looks like the only way to make it fast is to have arrays which get assigned the type as a whole. Something like

struct UnitfulArray{T, N, Unit} <: AbstractArray{T,N}
    a::Array{T,N}
    u::Unit
end

Guess I really need to do the testing.

1 Like

I have checked the dispatch, and the case with Unitful arrays indeed falls back to generic implementation.
Consider the matrix-vector for normal arrays:

A = randn(10,10)
v = randn(10)
@which A*v

gives me

*(A::StridedArray{T, 2}, x::StridedArray{S, 1}) where {T<:Union{Complex{Float32}, Complex{Float64}, Float32, Float64}, S<:Real}

If I try the same with Unitful arrays

using Unitful
Au = A*1.0u"m"
@which Au*v

I get

*(A::AbstractArray{T,2}, x::AbstractArray{S,1}) where {T, S}

which corresponds to generic fallback.

It is interesting, is there any demand for Unitful arrays that are fast for Linear Algebra operations at all?

No—unitful arrays are stored with the same underlying data format as unitless arrays (the units are attached to the array as a whole, not stored for each element separately). That should make it possible to reinterpret as a dimensionless array without making a copy, or to call BLAS directly on the unitful array.

3 Likes

I’ve tried to find in the repository Unitful.jl the custom definition of unitful arrays, however, did not succeed.
Also, in the example that I wrote, typeof(Au) gives Array{Quantity{...}, 2}, which looks like base array stuffed with unitful data.

May be there is some other library that implements this?

There is no special array implementation here, since Quantity{Float64,... is a bitstype, an ordinary array simply has the data packed tightly, and it has the same bits as the corresponding array of Float64 numbers. All that’s needed is to change Julia’s point of view about the data, which is what reinterpret does:

julia> A = rand(100,100); B = rand(100,100);

julia> @btime $A * $B;
  36.804 μs (2 allocations: 78.20 KiB)

julia> using Unitful: m

julia> Am = A * m; Bm = B * m;

julia> @btime $Am * $Bm;
  747.303 μs (8 allocations: 78.53 KiB)

julia> @btime reinterpret(Float64, $Am) * reinterpret(Float64,$Bm);
  37.370 μs (2 allocations: 78.20 KiB)

julia> reinterpret(Float64, Am) isa StridedArray{Float64}
true

I think that ideally this (or something like it) would be done to some mul! function which gets called by *.

7 Likes

this is fascinating… so even though the array appears looks like (x1,x2,x3::MyType):

[x1,x2,x3]

The memory layout looks compact and is more like:

Array{MyType}[x1.val, x2.val, x3.val]

?
Is this thanks to Quantity is a single-scalar type struct?

1 Like

FWIW, in @mcabbott’s code example the same matrix multiplication performance can be achieved using ustrip():

ustrip(Am) * ustrip(Bm)

My question is: what is the recommended method of attaching units to the result of the stripped matrix multiplication? Are there better alternatives than for example:

ustrip(Am) * ustrip(Bm) * unit(first(Am)*first(Bm))

NB: Win10, Julia 1.7 and Uniftul 1.9.2