Sum and other functions on custom subtype of AbstractArray 1000 slower than on normal Arrays

stakaz · June 5, 2018, 7:07am

Hello, I define the following custom type which basically is a wrapper but with the size of the array as a type parameter

module test

using BenchmarkTools

struct Lattice{S, T, N} <: AbstractArray{T, N}
	lattice::AbstractArray{T, N}

	function Lattice(λ::AbstractArray{T,N}) where {T,N}
		new{Tuple{size(λ)...}, T, N}(λ)
	end
end

Base.length(Λ::Lattice) = length(Λ.lattice)
Base.eltype(Λ::Lattice{S, T, N}) where {S, T, N} = T
Base.ndims(Λ::Lattice{S, T, N}) where {S, T, N} = N

Base.size(Λ::Lattice) = size(Λ.lattice)
Base.size(Λ::Lattice, i::Integer) = size(Λ.lattice, i)
Base.getindex(Λ::Lattice, i...) = Λ.lattice[i...]
Base.setindex!(Λ::Lattice, value, i...) = Λ.lattice[i...] = value
Base.view(Λ::Lattice, i...) = view(Λ.lattice, i...)

const λ = rand([-1,1,0], 4, 4)
const Λ = Lattice(λ)

## testing performance

summation(A) = sum(A)

@btime summation(λ)
@btime summation(Λ)


@code_llvm sum(λ)
@code_llvm sum(Λ)

end

The output is:

WARNING: replacing module test
  56.554 ns (0 allocations: 0 bytes)
  37.156 μs (33 allocations: 1.14 KiB)

define i64 @julia_sum_64840(i8** dereferenceable(40)) #0 !dbg !5{
top:
  %1 = call i64 @julia__mapreduce_64637(i8** nonnull %0)
  ret i64 %1
}

define i8** @julia_sum_64842(i8** dereferenceable(8)) #0 !dbg !5{
top:
  %1 = call i8** @julia_mapfoldl_64828(i8** inttoptr (i64 139717740967168 to i8**), i8** nonnull %0)
  ret i8** %1
}

So I see that something is different for my type but I cannot understand what makes it work 1000 times slower… Could someone help me to get the bottleneck of this definition?

Edit: when adding mutable ot hte struct definition I get 30x speed up but still much slower:

WARNING: replacing module test
  56.555 ns (0 allocations: 0 bytes)
  3.667 μs (17 allocations: 576 bytes)

define i64 @julia_sum_65143(i8** dereferenceable(40)) #0 !dbg !5{
top:
  %1 = call i64 @julia__mapreduce_64637(i8** nonnull %0)
  ret i64 %1
}

define i8** @julia_sum_65145(i8** dereferenceable(8)) #0 !dbg !5{
top:
  %1 = call i8** @julia_mapfoldl_65131(i8** inttoptr (i64 139717740967168 to i8**), i8** nonnull %0)
  ret i8** %1
}

foobar_lv2 · June 5, 2018, 9:05am

You need to define

struct Lattice{S, T, N} <: AbstractArray{T, N}
	lattice::Array{T, N}
end

and Base.IndexStyle(::Lattice)=IndexLinear(). Then I get the same times for both types.

I’d like to remark that your S parameter is weird and I’m not sure why you have it (you’re not even using it to specialize length/size)

stakaz · June 5, 2018, 9:29am

Hey, I just figured it out on my own and wanted to write it when I saw your post. Thanks, this were exactly the two point s

I need the size as a type later to generate functions depending on the size of the lattice (basically to avoid every lookup of sizes for indexing in a hot loop).

foobar_lv2 · June 5, 2018, 9:53am

I think the lookup is not so expensive; but specialization of the loop can help a lot for small sizes.

I think that in this case you might want

struct Lattice{S, T, N} <: AbstractArray{T, N}
	lattice::Array{T, N}
  	function Lattice(L::AbstractArray{T,N}) where {T,N}
    new{Val{(length(L),size(L))}, T, N}(L)
	end
end
Base.size(::Lattice{Val{S}}) where {S} = S[2]
Base.length(::Lattice{Val{S}}) where {S}= S[1]

Also, consider the MArray type from StaticArrays.jl. That might do everything you want, possibly faster.

stakaz · June 5, 2018, 10:21am

Yes, I am aware of that package, however, the performance is not good for bigger lattices, also mentioned on the web page. also, it is a little bit of trying out for me to define this type on my own.

What exactly is Val used for here? And why?

Topic		Replies	Views
Custom Array Performance Issues General Usage	8	444	September 30, 2019
Iteration/getindex performance of AbstractArray wrapper-types Performance	2	461	January 21, 2021
How to implement fast range reads for my custome-defined array? Performance array	2	726	January 20, 2018
AbstractArray vs. Array in Julia types General Usage question , performance	15	2806	May 26, 2017
Custom Type for lazy kernel operations on AbstractArrays, ho can i improve it? Performance question	3	382	March 9, 2020

Sum and other functions on custom subtype of AbstractArray 1000 slower than on normal Arrays

Related topics