I often have to “categorise” a vector of struct elements by a specific field and I wanted to get rid of some duplicated logic and generalise the categorisation, however I hit the penalty of runtime (dynamic) lookup of the field type, which I need to create the output dictionary. I though the inference would somehow magically figure it out but I was wrong :)
and understand the reason.
Obviously if the type is known at compile time, the compiler should be able to figure out the field types either since structs cannot be redefined but it’s the call to fieldtype()
which is not pure, and prevents that.
So my (probably dumb) question is something like: is there any way to somehow tell Julia that fieldtype()
is pure and if not, are there any other approaches than parametrising the field type into the struct’s definition (at least that’s the only way I found so far but it’s a bit annoying since I have many fields I want to categorise on involving a couple of different struct definitions).
Here is an example:
struct Hit
dom_id::Int16
time::Float64
end
The hardcoded version:
function domhits(hits::Vector{T}) where T
hit_map = Dict{Int16, Vector{T}}()
for hit ∈ hits
dom_id = hit.dom_id
if !haskey(hit_map, dom_id)
hit_map[dom_id] = T[]
end
push!(hit_map[hit.dom_id], hit)
end
hit_map
end
The general version which is dynamically determining the type via fieldtype()
:
function categorize(field::Symbol, elements::Vector{T}) where T
out = Dict{fieldtype(T, field), Vector{T}}()
for el ∈ elements
key = getfield(el, field)
if !haskey(out, key)
out[key] = T[]
end
push!(out[key], el)
end
out
end
Here is some test data (a couple of hits which we will sort by dom_id
):
n = 20
dom_ids = Int16.(rand(1:4, n))
times = rand(n);
hits = [Hit(dt...) for dt in zip(dom_ids, times)]
Gives:
20-element Array{Hit,1}:
Hit(3, 0.44280682173150687)
...
Hit(4, 0.9483316068868308)
Hit(4, 0.874216680644536)
And the benchmark results:
@btime domhits($hits) # 625.762 ns (14 allocations: 1.41 KiB)
Dict{Int16,Array{Hit,1}} with 4 entries:
4 => Hit[Hit(4, 0.257824), Hit(4, 0.948332), Hit(4, 0.874217), Hit(4, 0.46145…
2 => Hit[Hit(2, 0.147367), Hit(2, 0.124529), Hit(2, 0.973964), Hit(2, 0.39077…
3 => Hit[Hit(3, 0.442807), Hit(3, 0.995658), Hit(3, 0.332106), Hit(3, 0.50589…
1 => Hit[Hit(1, 0.292101)]
@btime categorize(:dom_id, $hits) # 5.061 μs (34 allocations: 2.03 KiB)
Dict{Int16,Array{Hit,1}} with 4 entries:
4 => Hit[Hit(4, 0.257824), Hit(4, 0.948332), Hit(4, 0.874217), Hit(4, 0.46145…
2 => Hit[Hit(2, 0.147367), Hit(2, 0.124529), Hit(2, 0.973964), Hit(2, 0.39077…
3 => Hit[Hit(3, 0.442807), Hit(3, 0.995658), Hit(3, 0.332106), Hit(3, 0.50589…
1 => Hit[Hit(1, 0.292101)]