# How to make norm(::AbstractVector) faster in my case?

``````import Base.size
import Base.IndexStyle
import Base.getindex
import Base.setindex!

using LinearAlgebra
using BenchmarkTools

struct Dummy{T} <: AbstractVector{T}
coeffs::Vector{T}
end

############### Base ###############################
size(c::Dummy{T}) where T = size(c.coeffs)

IndexStyle(::Type{<:Dummy}) = IndexLinear()

getindex(c::Dummy{T}, I::Int) where T = c.coeffs[I]

setindex!(c::Dummy{T}, v, I::Int) where T = setindex!(c.coeffs, v, I)

### Experiments

coeff = randn(Float32,10^9);
x = Dummy{Float32}(coeff);

@benchmark norm(x)
@benchmark norm(x.coeffs)
``````

my example is here. I created an abstractvector called `Dummy` with its attribute `coeffs`. And then I set `getindex`/`setindex` to act directly on the `coeffs` attributed of `Dummy`. Now I found huge performance loss on `norm` of this AbstractVector `Dummy`, as

``````julia> @benchmark norm(x)
BenchmarkTools.Trial: 2 samples with 1 evaluation.
Range (min … max):  2.956 s …  2.959 s  ┊ GC (min … max): 0.00% … 0.00%
Time  (median):     2.958 s             ┊ GC (median):    0.00%
Time  (mean ± σ):   2.958 s ± 1.809 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

█                                                      █
█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
2.96 s        Histogram: frequency by time        2.96 s <

Memory estimate: 16 bytes, allocs estimate: 1.

julia> @benchmark norm(x.coeffs)
BenchmarkTools.Trial: 17 samples with 1 evaluation.
Range (min … max):  302.239 ms … 303.657 ms  ┊ GC (min … max): 0.00% … 0.00%
Time  (median):     302.967 ms               ┊ GC (median):    0.00%
Time  (mean ± σ):   302.929 ms ± 426.119 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

▁ ▁  █              ▁  ▁  ▁▁   ▁      █▁ ▁▁      ▁ ▁        ▁
█▁█▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁█▁▁██▁▁▁█▁▁▁▁▁▁██▁██▁▁▁▁▁▁█▁█▁▁▁▁▁▁▁▁█ ▁
302 ms           Histogram: frequency by time          304 ms <

Memory estimate: 32 bytes, allocs estimate: 2.
``````

May I get any suggestion on how to make the norm of this AbstractVector as fast as evaluating the norm of its `coeffs`? An overload of the norm would definitely work but it sounds hacky to me. Thanks much for your input!

I think you just want to define `norm(x::Dummy, p) = norm(x.coeffs,p)`. The problem is that your version is falling back to the generic calculation rather than using `BLAS`.

Also Julia 1.8 might make this faster if faster and simpler generic_norm2 by oscardssmith · Pull Request #43256 · JuliaLang/julia · GitHub or RFC: Add `norm(A, p; dims)` by mcabbott · Pull Request #43459 · JuliaLang/julia · GitHub get merged.

Or, since your struct only holds a Vector, it could be declared `<:DenseArray`. (And will need to define pointer, strides, maybe.)

Thanks for your quick reply. Then following your suggestion, I should overload every operation, e.g. `dot, norm` which BLAS has its own ways to make it faster, right? That is still a lot of hacky work and I really wonder if there is any easier way to deal with these things for AbstractArrays in Julia.

Do you mean do this instead?

``````struct Dummy{T} <: AbstractVector{T}
coeffs::DenseVector{T}
end
``````

No the problem is still there

Why is the fallback so bad?

``````julia> mynorm(x) = sqrt(sum(val->val^2,x))
mynorm (generic function with 1 method)

julia> x = Dummy{Float32}(rand(Float32,10^8));

julia> @btime LinearAlgebra.norm(\$x)
305.828 ms (0 allocations: 0 bytes)
5773.3687f0

julia> @btime mynorm(\$x)
27.586 ms (0 allocations: 0 bytes)
5773.3687f0

``````

2 reasons:

1. it is dealing with rescaling to prevent spurious overflow/underflow
2. it is doing so badly.
1 Like

No, I mean make the struct `<:DenseArray{T,1}`, not `<: AbstractVector{T}`. The field certainly wants to be concrete.

Oh this is just a minimal example. In my practice I still want to make it an AbstractVector so it can include metadata and more attributes

Sure, but what has that got to do with anything? Do you want `norm` to access these other fields? Then clearly you must write a method. But if all you want is for an existing `norm` method to be used, then one way to do that is by fitting into its signature.

Other fields have nothing to do with norm. So I did

``````import Base.size
import Base.IndexStyle
import Base.getindex
import Base.setindex!
import Base.pointer

using LinearAlgebra
using BenchmarkTools

struct Dummy{T} <: DenseVector{T}
coeffs::Vector{T}
end

############### Base ###############################
size(c::Dummy{T}) where T = size(c.coeffs)

IndexStyle(::Type{<:Dummy}) = IndexLinear()

getindex(c::Dummy{T}, I::Int) where T = c.coeffs[I]

setindex!(c::Dummy{T}, v, I::Int) where T = setindex!(c.coeffs, v, I)

pointer(c::Dummy{T}) where T = pointer(c.coeffs) # removing this line doesn't work either

### Experiments

coeff = randn(Float32,10^4);
x = Dummy{Float32}(coeff);

@benchmark norm(x)
@benchmark norm(x.coeffs)

``````

and got this error for `norm(x)`

``````julia> norm(x)
ERROR: conversion to pointer not defined for Dummy{Float32}
Stacktrace:
 error(s::String)
@ Base ./error.jl:33
 unsafe_convert(#unused#::Type{Ptr{Float32}}, a::Dummy{Float32})
@ Base ./pointer.jl:67
 nrm2(n::Int64, X::Dummy{Float32}, incx::Int64)
@ LinearAlgebra.BLAS /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/blas.jl:487
 nrm2(x::Dummy{Float32})
@ LinearAlgebra.BLAS /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/blas.jl:493
 norm2
@ /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/dense.jl:105 [inlined]
 norm(itr::Dummy{Float32}, p::Int64)
@ LinearAlgebra /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/generic.jl:627
 norm(itr::Dummy{Float32})
@ LinearAlgebra /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/generic.jl:625
 top-level scope
@ REPL:1
``````

It’s a bit obscure but this is the crucial bit:

Defining that seems to be sufficient:

``````julia> Base.unsafe_convert(::Type{Ptr{T}}, A::Dummy{T}) where {T} =
Base.unsafe_convert(Ptr{T}, A.coeffs)

julia> @btime norm(\$x)  # was 15.625 μs
min 12.416 μs, mean 12.647 μs (0 allocations)
99.28318f0

julia> @btime norm(\$(x.coeffs))
min 12.416 μs, mean 12.562 μs (0 allocations)
99.28318f0

julia> @which pointer(x)  # no method added here
pointer(x::AbstractArray{T}) where T in Base at abstractarray.jl:1170

julia> @btime \$(rand(10^4, 10^4)) * \$x;  # was 23.167 ms without BLAS
min 13.536 ms, mean 14.144 ms (4 allocations, 156.34 KiB)
``````

Thanks a lot! This works! One more question:

if I want to do this on GPU, what should I do?

``````struct CuDummy{T} <: DenseCuVector{T}
coeffs::CuVector{T}
end
``````

this throws an error as

``````ERROR: invalid subtyping in definition of CuDummy
Stacktrace:
 top-level scope
@ REPL:1
 top-level scope
@ ~/.julia/packages/CUDA/DFAea/src/initialization.jl:52
``````

When I set it up as

``````struct CuDummy{T} <: CUDA.AbstractGPUArray{T,1}
coeffs::CuVector{T}
end
``````

the run time for `norm(x)` and `norm(x.coeffs)` when `x` is a `CuDummy` with length of `10^6` are `12ms` and `11ms`, which is not too different tho