Is there a reason why in StatsBase
fit(Histogram, obs::AbstractArray, bins,...) does not accept an iterator over the observations? Possible sources of observations are streams, csv files, databases etc. To just bin them indexing,
getindex(obs, ...), is not needed? In my case a 2d histogram is made from ~250 MB of observations in an SQLite database, so this is presently not a show stopper. Typical data sets for histograms fit easily into arrays in memory, but it is not always the case.
I have tried to look at the binning part in the code, but it is a bit over my horizon. Are there alternative packages? In MATLAB
accumarray can do the job, vectorized. It is adapted to Julia in VectorizedRoutines
Generally an iterator interface to data would avoid allocations and be more “Julian”.