Hi,

I have a custom type, wrapping a matrix, for which I have defined the indexing behaviour.

The interesting fact is that indexing on the custom type is slower by 50% than indexing on the underlying matrix directly only when the `eltype`

of the underlying array is `Complex{Float64}`

. See the benchmark below.

```
using BenchmarkTools
struct Foo{T<:Number, A<:AbstractMatrix{T}}
data::A
end
Foo(data::AbstractMatrix) = Foo{eltype(data), typeof(data)}(data)
@inline function Base.getindex(U::Foo, i::Int)
@inbounds ret = U.data[i]
ret
end
@inline function Base.setindex!(U::Foo, val::Number, i::Int)
@inbounds U.data[i] = val
val
end
Base.eachindex(U::Foo) = eachindex(U.data)
function test(a, b, c, d)
@simd for i in eachindex(a)
@inbounds a[i] = b[i] + c[i]*d[i]
end
a
end
# size
n = 128
for T in [Float16, Float32, Float64,
Complex{Float16}, Complex{Float32}, Complex{Float64}]
a = Foo(zeros(T, n, n))
b = Foo(zeros(T, n, n))
c = Foo(zeros(T, n, n))
d = Foo(zeros(T, n, n))
t_foo = @belapsed test($a, $b, $c, $d)
t_arr = @belapsed test($a.data, $b.data, $c.data, $d.data)
@printf "%s %8.5f μs %8.5f μs\n" lpad(string(T), 16) 10^6*t_foo 10^6*t_arr
end
```

The output of this is

```
Float16 296.075 μs 287.413 μs
Float32 4.462 μs 4.455 μs
Float64 12.278 μs 12.284 μs
Complex{Float16} 1205.071 μs 1192.119 μs
Complex{Float32} 26.366 μs 26.055 μs
Complex{Float64} 46.057 μs 29.107 μs
```

Has any of you encountered such behaviour for `Complex{Float64}`

before, or is this known somehow?

Thanks.

Davide

P.S. This also shows that `Float16`

is much slower then other floating point numbers. But this is not my problem!