When the dataset is too large for memory, you may want to avoid materialization and use single-pass algorithms. Try OnlineStats.jl
:
fit!(GroupBy(String, Sum()), data_itr) # assuming there are two columns: hospital is a string, and amount is a number