Matrix power not memory optimal

jarl · June 25, 2024, 12:46pm

It seems that computing A^p with integer p for a dense matrix A with ^(A,p) is not memory allocation optimal:

julia> using BenchmarkTools, LinearAlgebra
julia> A=randn(256,256); A=A/norm(A);
julia> @btime A8=A^8;
  813.054 μs (6 allocations: 1.50 MiB)
julia> @btime begin; 
    A2=similar(A); mul!(A2,A,A); 
    A4=similar(A); mul!(A4,A2,A2); 
    A8=A2; mul!(A8,A4,A4); end;
  729.419 μs (4 allocations: 1.00 MiB)
julia> @btime ((A^2)^2)^2;
  752.717 μs (6 allocations: 1.50 MiB)

In the second btime, I recycle the memory slots when repeating the powers. The difference should be even larger for higher powers. As far as I can tell, the reason is that ^(A,::Integer) makes a call to Base.power_by_squaring in intfuncs.jl which does not take such memory aspects into account (in particular line 330)

github.com

JuliaLang/julia/blob/5654e6043823717e085239f6509413410106e902/base/intfuncs.jl#L314-L334


      
          @assume_effects :terminates_locally function power_by_squaring(x_, p::Integer; mul=*)
              x = to_power_type(x_)
              if p == 1
                  return copy(x)
              elseif p == 0
                  return one(x)
              elseif p == 2
                  return mul(x, x)
              elseif p < 0
                  isone(x) && return copy(x)
                  isone(-x) && return iseven(p) ? one(x) : copy(x)
                  throw_domerr_powbysq(x, p)
              end
              t = trailing_zeros(p) + 1
              p >>= t
              while (t -= 1) > 0
                  x = mul(x, x)
              end
              y = x
              while p > 0
                  t = trailing_zeros(p) + 1

Is there a good repeated squaring in some package in Julia?

Oscar_Smith · June 25, 2024, 3:25pm

this would be a pretty good first pr for someone. the fix is quite straightforward

jarl · June 25, 2024, 3:53pm

The function power_by_squaring seems to be written for scalars and immutable types, I don’t see an easy fix where mul is replaced by mul!. One can make some conditions on the type, but since this is in Base references to any Matrix is not ideal. What fix did you have in mind?

gdalle · June 25, 2024, 3:55pm

How would it handle immutable matrices like FillArrays.jl or StaticArrays.jl?

jarl · June 25, 2024, 3:58pm

StaticArrays are not necessarily immutable. Not sure we have many immutable matrices, but I get your point.

gdalle · June 25, 2024, 4:01pm

They hide in many places

julia> using LinearAlgebra

julia> I
UniformScaling{Bool}
true*I

julia> (2I)^3
UniformScaling{Int64}
8*I

jarl · June 25, 2024, 4:09pm

Since I is not a AbstractMatrix, your example does not speak against a type-check. I believe your call reduces to a call to power_by_squaring(::Float64,::Int), which should remain fine.

Not saying I have a good solution though…

gdalle · June 25, 2024, 4:11pm

My bad. But I’m not sure it’s easy to statically deduce whether a matrix is mutable or not?

Elrod · June 25, 2024, 4:48pm

julia> using StaticArrays
WARNING: using StaticArrays.pop in module Main conflicts with an existing identifier.

julia> A = @SMatrix rand(3,3);

julia> B = similar(A); B .= 3;

julia> convert(typeof(parent(A)), B)
3×3 SMatrix{3, 3, Float64, 9} with indices SOneTo(3)×SOneTo(3):
 3.0  3.0  3.0
 3.0  3.0  3.0
 3.0  3.0  3.0

Can use similar, mutate it, and then use convert to return an answer of the correct type.

typeof(parent(A)) is so that this would work for at least some wrappers like view.
Alternatively, could do a more general

RT = Base.promote_op() do
  # capture `A`, not a type!
  convert(typeof(parent(A)), B)
end
if RT === typeof(parent(A))
    # only try converting if it is known to return the expected type at compile time
    convert(typeof(parent(A)), B)
else
    B
end

An easier approach is probably to introduce some interface.

stevengj · June 25, 2024, 5:27pm

In general, the return type for ^(A::AbstractMatrix,::Integer) should probably be what is returned by similar(A) (which is always mutable)? StaticArrays can overload ^(::SMatrix, ::Integer) to call power_by_squaring directly and avoid allocating a mutable copy.

jarl · June 25, 2024, 5:45pm

The example shows how the performance can be improved for Matrix. The same does not hold for SparseMatrix because of increased fill-in, not captured well with similar.

julia> A=sprandn(256,256,0.1); A=A/norm(A);
julia> @btime A8=A^8;
  34.995 ms (19 allocations: 3.01 MiB)
julia> @btime begin; 
          A2=similar(A); mul!(A2,A,A); 
          A4=similar(A); mul!(A4,A2,A2); 
          A8=A2; mul!(A8,A4,A4); end;
  205.170 ms (51 allocations: 3.26 MiB)

Therefore, I’m currently I would say it’s better to have a separate implementation specifically for dense matrices, i.e., ^(::Matrix,::Int), and keep the current as fallback eg for SMatrix and SparseMatrix.

jishnub · June 25, 2024, 6:12pm

Perhaps this should be specialized for a StridedMatrix, which should cover most dense types.