How to work with `Base.Generator` efficiently


#1

I was calculating some statistics on an irregular data stored in a Dict, when I learned that StatsBase.countmap does not support generators. Since it is time I learned how to use them, I thought I would implement a method:

using StatsBase

function StatsBase.addcounts!{T}(cm::Dict{T}, g::Base.Generator)
    ## how to make sure T matches eltype of g?
    for v in g
        cm[v] = get(cm, v, 0) + 1
    end
    cm
end

function StatsBase.countmap(g::Base.Generator)
    _eltype = Base.iteratoreltype(g)
    if _eltype == Base.EltypeUnknown()
        _eltype = Any
    end
    addcounts!(Dict{_eltype, Int}(), g)
end

But it is about 3x slower than just collecting the values (note: records and the calculation I do on them is just a toy example, to make my code self-contained):

records = Dict(rand(Int) => rand(Int, rand(1:5)) for i in 1:200000)
using BenchmarkTools
@benchmark countmap(collect(length(v) for v in values(records)))
@benchmark countmap(length(v) for v in values(records))

I suspect my code is not type stable: it does not use the element type of the generator, for one thing. So how can I speed this up? Apologies if this is in the manual, I could not find it.


#2

If you suspect the code is not type-stable, have you tried @code_warntype to try and see this?


#3

I did. I just don’t know how to fix it. I imagine that somehow I would need to dispatch on the eltype of the iterator, but I am not even sure about this.


#4

Base.iteratoreltype always returns EltypeUnknown() for generators. Not sure why.

EDIT: looks like the required code is just missing at the moment, see this thread: https://groups.google.com/d/msg/julia-users/G-olgn3mIks/42WJAyDPBAAJ


#5

One trick that is used in Base is to pass the result of start to an auxiliary function, see e.g. here (I copied that trick from some other part of Base but I don’t remember where). Hope this helps.


#6

@carlobaldassi: Maybe I am not getting it, but how can I (or the compiler) be sure that the type of start(itr) is the same as the type for all elements?

@nalimilan: Do you know if anyone opened an issue in the end?


#7

I think this issue has been resolved in master and in 0.5.1. 18695