Multi Way Summary

All I am looking for a simple way to perform a summary on multiple levels

Attached is a screenshot from SAS, which shows what I am trying to achieve (consider the first row as well as the subsequent 26 rows).
Is there any package in Julia that provides this?
By default combine only aggregates on the most granular level, see example.

lnk="https://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/639388c2cbc2120a14dcf466e85730eb8be498bb/iris.csv"
fi=download(lnk)

using DataFrames
using CSV

df=CSV.read(fi)
agg=combine(groupby(df,[:species,:petal_width]),:sepal_length=>sum)

Just to make sure I understand you correctly, your aim is to have

combine(groupby(df,[:species]),:sepal_length=>sum)
combine(groupby(df,[:petal_width]),:sepal_length=>sum)
combine(groupby(df,[:species,:petal_width]),:sepal_length=>sum)

all in one call, and output into one table? If so I’m unaware of a package that would provide this, but it should be relatively easy to roll your own?

Yes, this is what I want.
Indeed it is probably not too difficult, but I don’t want to reinvent the wheel.
I hope it is clear, that I plan to call this function with an arbitrary number of arguments which could be of different types.
I think I even may have a snippet somewhere that does this but I was hoping “proper code” exists for this somewhere online. (it has happened to me that I coded something that actually existed in some function I did not know…)

As per your suggestion I gave this a try.
I think this suffices for me (for now), although it certainly has potential for improvements.


using DataFrames
using CSV
using IterTools

function multiwayaggregation(df::DataFrame,v::Vector{Symbol},cs::Union{Pair, typeof(nrow), DataFrames.ColumnIndex, DataFrames.MultiColumnIndex}...)
    res=DataFrame()

    for c in v 
        @assert !(any(ismissing,df[!,c])) #otherwise the appending will not be meaningful, as we set the values to missing for columns which are not considered in the multi way summary
    end
    
    for subsetlength=length(v):-1:0
        for subs in IterTools.subsets(v,subsetlength)
            #@show subs
            if subsetlength==0 
                agg = DataFrames.combine(df,cs...)
            else 
                agg = DataFrames.combine(DataFrames.groupby(df,subs),cs...)
            end
            nonAggregatedVars=setdiff(v,subs)
            
            k=1
            DataFrames.insertcols!(agg,k,:_TYPE_ => repeat(vcat(subsetlength),size(agg,1)))
            k+=1
            for addcol in nonAggregatedVars 
                DataFrames.insertcols!(agg,k,addcol => repeat(vcat(missing),size(agg,1)))
                k+=1
            end 
            
            DataFrames.allowmissing!(agg)
            DataFrames.append!(res,agg)            
        end
    end
    
    sort!(res,vcat(:_TYPE_,v))
    return res 
end

lnk="https://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/639388c2cbc2120a14dcf466e85730eb8be498bb/iris.csv"
fi=download(lnk)

df=CSV.read(fi)

v=[:species,:petal_width]
rs=multiwayaggregation(df,v,:sepal_length=>sum)