Filtering and Grouping DataFrame

I’ve a csv file of the following:
1- Customer name
2- Customer location
3- Deliveries over last 60 months, so n in the table below is 60 sometimes more

I read the data as:

using CSV, Query, DataFramesMeta, DataFrames
df = CSV.read("deliveries.csv"; header=true, delim=',')

And get it as:


locations×n DataFrame
│ Row │ Customer │ location │ M0      │ M1       │ M2    │ ...   │ Mn
│     │ String   │ String   │ Int64   │ Int64    │ Int64 │ ...   │ Int64 
├─────┼──────────┼──────────┼─────────┼──────────┼───────┼───────┼──────
│ 1   │ x        │ X1       │ 5       │ 4        │ 5     │ ...   │ 3    
│ 2   │ x        │ X2       │ 3       │ 3        │ 4     │ ...   │ 5     
│ 3   │ y        │ X3       │ 6       │ 3        │ 4     │ ...   │ 5      

How can I work it to do the following:
1- Exclude location from the data frame
2- Group by customer, so that ineach month, the number became the sum of deliveries for all locations with this customer under the given month, something as below:


customers×n DataFrame
│ Row │ Customer │ M0      │ M1       │ M2    │ ...   │ Mn 
│     │ String   │ Int64   │ Int64    │ Int64 │ ..... │ Int64 
├─────┼──────────┼─────────┼──────────┼───────┼───────┼───
│ 1   │ x        │ 8       │ 7        │ 9     │ ...   │ 8     
│ 2   │ y        │ 6       │ 3        │ 4     │ ...   │ 5     

In typical SQL, it can be done as:

Select Customer,SUM(M0),SUM(M1),SUM(M2),...,SUM(Mn) From Customers GROUP BY Customer
aggregate(df[:, Not(:location)], :Customer, sum)

with a drawback that it will rename you the columns

or

by(sdf -> mapcols(sum, sdf[:, 3:end]), df, :Customer)

and now your columns will not get renamed.

3 Likes

This one workd perfectly, can you please explain how it works

  1. you group df by customer,
  2. then in each subgroup you drop first two columns (as they are :Customer and :location so you do not want to aggregate them)
  3. after dropping the columns you apply sum function to each column of such a data frame (which is done by mapcols function)

You can find more examples of similar transformations by reading the documentation of by and mapcols functions.

2 Likes

Deeply appreciated, thanks a lot.