The following code uses AbstractVector as a struct field, and it results in memory being allocated:
struct S1
A::AbstractVector{Int}
end
function f(s::S1)
s.A[end]
end
s = S1([1,2,3])
f(s)
@code_warntype f(s)
md"bytes allocated: $(@allocated f(s))"
In an inner loop, accessing s.A will result in a lot of tiny allocations.
The fix I found was to do this
struct S2{T}
A::T
function S2(A)
new{typeof(A)}(A)
end
end
function f(s::S2)
s.A[end]
end
s = S2(@view [1,2,3][2:3])
@code_warntype f(s)
f(s)
md"bytes allocated: $(@allocated f(s))"
But this is less readable to me. As I understand about Julia, yes an abstract type field will require a wrapper around the concrete type, but it doesn’t explain why accessing an array element allocates memory. Why isn’t s.A[i] dispatched to
getindex(a::AbstractVector{Int}, i)::Int
which should be inferrable and by itself would not allocate any memory?
IIUC, one problem here is: how could the compiler know ahead of time which specific method to call? getindex implementations for a plain Vector, a UnitRange or a SubArray are very different, and if the type of container itself is not inferrable, there needs to be a dynamic dispatch somewhere.
As you said, you can eliminate dynamic dispatch altogether by parameterizing your struct with the concrete type of container stored in it. This is a very good way to deal with the problem IMO.
But if you don’t like it (you seem to find it less readable) and you use your container in a tight loop, an other way would be to use a function barrier in order to have dynamic dispatch occur only once. An example could look like:
struct S
a::AbstractVector{Int}
end
function my_sum_barrier(s::S)
function inner(a)
ret = zero(eltype(a))
for x in a
ret += x
end
return ret
end
return inner(s.a)
end
The function barrier allowed eliminatingdynamic dispatch.
julia> using BenchmarkTools
julia> a = rand(1:10, 100);
julia> s = S(@view a[20:80]);
julia> @btime my_sum_barrier($s)
42.987 ns (0 allocations: 0 bytes)
326
This is in contrast to a more naive implementation in which no function barrier would be used, like:
function my_sum(s::S)
ret = zero(eltype(s.a))
for x in s.a
ret += x
end
return ret
end
With such an implementation, there are allocations at each loop iteration:
I can think of S3 as essentially creating barrier functions for all uses of the struct. Yes the default construct already extracts types from the parameters, the S(A) = new{typeof(A)}(A) was not needed.