How to make norm(::AbstractVector) faster in my case?

ziyiyin97 · December 28, 2021, 4:09pm

import Base.size
import Base.IndexStyle
import Base.getindex
import Base.setindex!

using LinearAlgebra
using BenchmarkTools

struct Dummy{T} <: AbstractVector{T}
    coeffs::Vector{T}
end


############### Base ###############################
size(c::Dummy{T}) where T = size(c.coeffs)

IndexStyle(::Type{<:Dummy}) = IndexLinear()

getindex(c::Dummy{T}, I::Int) where T = c.coeffs[I]

setindex!(c::Dummy{T}, v, I::Int) where T = setindex!(c.coeffs, v, I)

### Experiments

coeff = randn(Float32,10^9);
x = Dummy{Float32}(coeff);

@benchmark norm(x)
@benchmark norm(x.coeffs)

my example is here. I created an abstractvector called Dummy with its attribute coeffs. And then I set getindex/setindex to act directly on the coeffs attributed of Dummy. Now I found huge performance loss on norm of this AbstractVector Dummy, as

julia> @benchmark norm(x)
BenchmarkTools.Trial: 2 samples with 1 evaluation.
 Range (min … max):  2.956 s …  2.959 s  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.958 s             ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.958 s ± 1.809 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █                                                      █  
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  2.96 s        Histogram: frequency by time        2.96 s <

 Memory estimate: 16 bytes, allocs estimate: 1.

julia> @benchmark norm(x.coeffs)
BenchmarkTools.Trial: 17 samples with 1 evaluation.
 Range (min … max):  302.239 ms … 303.657 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     302.967 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   302.929 ms ± 426.119 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▁ ▁  █              ▁  ▁  ▁▁   ▁      █▁ ▁▁      ▁ ▁        ▁  
  █▁█▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁█▁▁██▁▁▁█▁▁▁▁▁▁██▁██▁▁▁▁▁▁█▁█▁▁▁▁▁▁▁▁█ ▁
  302 ms           Histogram: frequency by time          304 ms <

 Memory estimate: 32 bytes, allocs estimate: 2.

May I get any suggestion on how to make the norm of this AbstractVector as fast as evaluating the norm of its coeffs? An overload of the norm would definitely work but it sounds hacky to me. Thanks much for your input!

Oscar_Smith · December 28, 2021, 4:17pm

I think you just want to define norm(x::Dummy, p) = norm(x.coeffs,p). The problem is that your version is falling back to the generic calculation rather than using BLAS.

Also Julia 1.8 might make this faster if https://github.com/JuliaLang/julia/pull/43256 or https://github.com/JuliaLang/julia/pull/43459 get merged.

mcabbott · December 28, 2021, 4:19pm

Or, since your struct only holds a Vector, it could be declared <:DenseArray. (And will need to define pointer, strides, maybe.)

ziyiyin97 · December 28, 2021, 4:41pm

Thanks for your quick reply. Then following your suggestion, I should overload every operation, e.g. dot, norm which BLAS has its own ways to make it faster, right? That is still a lot of hacky work and I really wonder if there is any easier way to deal with these things for AbstractArrays in Julia.

ziyiyin97 · December 28, 2021, 4:53pm

Do you mean do this instead?

struct Dummy{T} <: AbstractVector{T}
    coeffs::DenseVector{T}
end

No the problem is still there

lmiq · December 28, 2021, 5:02pm

Why is the fallback so bad?

julia> mynorm(x) = sqrt(sum(val->val^2,x))
mynorm (generic function with 1 method)

julia> x = Dummy{Float32}(rand(Float32,10^8));

julia> @btime LinearAlgebra.norm($x)
  305.828 ms (0 allocations: 0 bytes)
5773.3687f0

julia> @btime mynorm($x)
  27.586 ms (0 allocations: 0 bytes)
5773.3687f0

Oscar_Smith · December 28, 2021, 5:04pm

2 reasons:

it is dealing with rescaling to prevent spurious overflow/underflow
it is doing so badly.

mcabbott · December 28, 2021, 5:07pm

No, I mean make the struct <:DenseArray{T,1}, not <: AbstractVector{T}. The field certainly wants to be concrete.

ziyiyin97 · December 28, 2021, 5:12pm

Oh this is just a minimal example. In my practice I still want to make it an AbstractVector so it can include metadata and more attributes

mcabbott · December 28, 2021, 5:22pm

Sure, but what has that got to do with anything? Do you want norm to access these other fields? Then clearly you must write a method. But if all you want is for an existing norm method to be used, then one way to do that is by fitting into its signature.

ziyiyin97 · December 28, 2021, 6:47pm

Other fields have nothing to do with norm. So I did

import Base.size
import Base.IndexStyle
import Base.getindex
import Base.setindex!
import Base.pointer

using LinearAlgebra
using BenchmarkTools

struct Dummy{T} <: DenseVector{T}
    coeffs::Vector{T}
end

############### Base ###############################
size(c::Dummy{T}) where T = size(c.coeffs)

IndexStyle(::Type{<:Dummy}) = IndexLinear()

getindex(c::Dummy{T}, I::Int) where T = c.coeffs[I]

setindex!(c::Dummy{T}, v, I::Int) where T = setindex!(c.coeffs, v, I)

pointer(c::Dummy{T}) where T = pointer(c.coeffs) # removing this line doesn't work either

### Experiments

coeff = randn(Float32,10^4);
x = Dummy{Float32}(coeff);

@benchmark norm(x)
@benchmark norm(x.coeffs)

and got this error for norm(x)

julia> norm(x)
ERROR: conversion to pointer not defined for Dummy{Float32}
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:33
 [2] unsafe_convert(#unused#::Type{Ptr{Float32}}, a::Dummy{Float32})
   @ Base ./pointer.jl:67
 [3] nrm2(n::Int64, X::Dummy{Float32}, incx::Int64)
   @ LinearAlgebra.BLAS /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/blas.jl:487
 [4] nrm2(x::Dummy{Float32})
   @ LinearAlgebra.BLAS /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/blas.jl:493
 [5] norm2
   @ /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/dense.jl:105 [inlined]
 [6] norm(itr::Dummy{Float32}, p::Int64)
   @ LinearAlgebra /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/generic.jl:627
 [7] norm(itr::Dummy{Float32})
   @ LinearAlgebra /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/generic.jl:625
 [8] top-level scope
   @ REPL[19]:1

What do I do? Sorry didn’t find any online resource about this

ziyiyin97 · December 28, 2021, 6:49pm

@Oscar_Smith

mcabbott · December 28, 2021, 7:01pm

It’s a bit obscure but this is the crucial bit:

Defining that seems to be sufficient:

julia> Base.unsafe_convert(::Type{Ptr{T}}, A::Dummy{T}) where {T} =
           Base.unsafe_convert(Ptr{T}, A.coeffs)

julia> @btime norm($x)  # was 15.625 μs
  min 12.416 μs, mean 12.647 μs (0 allocations)
99.28318f0

julia> @btime norm($(x.coeffs))
  min 12.416 μs, mean 12.562 μs (0 allocations)
99.28318f0

julia> @which pointer(x)  # no method added here
pointer(x::AbstractArray{T}) where T in Base at abstractarray.jl:1170

julia> @btime $(rand(10^4, 10^4)) * $x;  # was 23.167 ms without BLAS
  min 13.536 ms, mean 14.144 ms (4 allocations, 156.34 KiB)

ziyiyin97 · December 28, 2021, 8:28pm

Thanks a lot! This works! One more question:

if I want to do this on GPU, what should I do?

struct CuDummy{T} <: DenseCuVector{T}
    coeffs::CuVector{T}
end

this throws an error as

ERROR: invalid subtyping in definition of CuDummy
Stacktrace:
 [1] top-level scope
   @ REPL[29]:1
 [2] top-level scope
   @ ~/.julia/packages/CUDA/DFAea/src/initialization.jl:52

ziyiyin97 · December 28, 2021, 8:36pm

When I set it up as

struct CuDummy{T} <: CUDA.AbstractGPUArray{T,1}
    coeffs::CuVector{T}
end

the run time for norm(x) and norm(x.coeffs) when x is a CuDummy with length of 10^6 are 12ms and 11ms, which is not too different tho

Topic		Replies	Views
Slow abstractarray.jl, indices: line 64 when starting julia with `--inline=no` General Usage performance	6	782	June 13, 2018
Performance of norm function Performance linearalgebra	21	12166	March 28, 2021
Faster isapprox() General Usage question , performance , linearalgebra , loopvectorization , preallocation	14	669	July 5, 2023
Simple Mat-Vec multiply (understanding performance, without the bugs) Performance tullio	16	3253	August 12, 2020
StructArrays.jl getindex cost memory Performance question	4	97	February 13, 2025

How to make norm(::AbstractVector) faster in my case?

Related topics