Playing with JETTest.jl and I realize that there exists a runtime dispatch when view, broadcasting, and CartesianIndex array(not CartesianIndices) are used together. But havenβt observed any suspicious performance difference from BenchmarkTools.
function foo(X, inds, Y)
view(X, inds) .+= Y
end
X = collect(1:9)
inds = collect(LinearIndices(X)) # Matrix{Int}
Rinds = collect(CartesianIndices(X)) # Matrix{CartesianIndex{1}}
Y = collect(inds)
@report_dispatch foo(X, inds, Y) # no errors
@report_dispatch foo(X, Rinds, Y) # one runtime dispatch
@btime foo($X, $inds, $Y);
# 1.7.0-beta3: 23.129 ns (0 allocations: 0 bytes)
# 1.6.2: 25.331 ns (0 allocations: 0 bytes)
# why would this be 0 allocations when there exists runtime dispatch?
@btime foo($X, $Rinds, $Y);
# 1.7.0-beta3: 23.865 ns (0 allocations: 0 bytes)
# 1.6.2: 25.538 ns (0 allocations: 0 bytes)
I thought runtime dispatch would trigger some allocations in benchmark results, but this time I didnβt see it.
As a comparison, I re-test this with LinearIndices
and CartesianIndices
and become surprised with the performance gap hereβ¦
X = collect(1:9)
inds = LinearIndices(X)
Rinds = CartesianIndices(X)
Y = collect(inds)
@report_dispatch foo(X, inds, Y) # no errors
@report_dispatch foo(X, Rinds, Y) # no errors
@btime foo($X, $inds, $Y);
# 1.7.0-beta3: 27.935 ns (0 allocations: 0 bytes)
# 1.6.2: 29.897 ns (0 allocations: 0 bytes)
@btime foo($X, $Rinds, $Y);
# 1.7.0-beta3: 14.933 ns (0 allocations: 0 bytes)
# 1.6.2: 14.716 ns (0 allocations: 0 bytes)
Any ideas on this result or where should I investigate from?
The runtime dispatch error that JETTest found is:
julia> @report_dispatch foo(X, Rinds, Y) # one runtime dispatch
βββββ 1 possible error found βββββ
β @ REPL[34]:2 Base.materialize!(Main.view(X, inds), Base.broadcasted(Main.+, Main.view(X, inds), Y))
ββ @ broadcast.jl:894 Base.Broadcast.copyto!(dest, Base.Broadcast.instantiate(Core.apply_type(Base.Broadcast.Broadcasted, _)(Base.getproperty(bc, :f), Base.getproperty(bc, :args), Base.Broadcast.axes(dest))))
βββ @ broadcast.jl:980 Base.Broadcast.preprocess(dest, bc)
ββββ @ broadcast.jl:966 Base.Broadcast.preprocess(dest, Base.getindex(args, 1))
βββββ @ broadcast.jl:957 Base.Broadcast.unalias(dest, src)
ββββββ @ subarray.jl:111 Base.copyto!(dest, V)
βββββββ @ abstractarray.jl:1349 Base.unaliascopy(A)
ββββββββ @ subarray.jl:112 Base.map(Base._trimmedindex, Base.getproperty(V, :indices))
βββββββββ @ tuple.jl:213 f(Base.getindex(t, 1))
ββββββββββ @ subarray.jl:117 Base.oftype(i, Base.reshape(Base.eachindex(Base.IndexLinear(), i), Base.axes(i)))
βββββββββββ @ essentials.jl:375 Base.convert(Base.typeof(x), y)
ββββββββββββ @ array.jl:532 _(a)
βββββββββββββ @ array.jl:540 Base.copyto_axcheck!(Core.apply_type(Base.Array, _, _)(Base.undef, Base.size(x)), x)
ββββββββββββββ @ abstractarray.jl:1056 Base.copyto!(dest, src)
βββββββββββββββ @ abstractarray.jl:950 Base.copyto_unaliased!(Base.IndexStyle(dest), dest, Base.IndexStyle(srcβ²), srcβ²)
ββββββββββββββββ @ abstractarray.jl:970 Base.setindex!(dest, Base.getindex(src, i), Base.+(i, Ξi))
βββββββββββββββββ @ array.jl:839 Base.convert(_, x)
βββββββββββββββββ runtime dispatch detected: Base.convert(_::Type{CartesianIndex{1}}, x::Int64)
βββββββββββββββββββββββββββββββββ
(Core.PartialStruct(SubArray{Int64, 2, Vector{Int64}, Tuple{Matrix{CartesianIndex{1}}}, false}, Any[Vector{Int64}, Tuple{Matrix{CartesianIndex{1}}}, Core.Const(0), Core.Const(0)]), 1)