Filtering and Grouping DataFrame

hasanOryx · October 9, 2019, 7:05pm

I’ve a csv file of the following:
1- Customer name
2- Customer location
3- Deliveries over last 60 months, so n in the table below is 60 sometimes more

I read the data as:

using CSV, Query, DataFramesMeta, DataFrames
df = CSV.read("deliveries.csv"; header=true, delim=',')

And get it as:


locations×n DataFrame
│ Row │ Customer │ location │ M0      │ M1       │ M2    │ ...   │ Mn
│     │ String   │ String   │ Int64   │ Int64    │ Int64 │ ...   │ Int64 
├─────┼──────────┼──────────┼─────────┼──────────┼───────┼───────┼──────
│ 1   │ x        │ X1       │ 5       │ 4        │ 5     │ ...   │ 3    
│ 2   │ x        │ X2       │ 3       │ 3        │ 4     │ ...   │ 5     
│ 3   │ y        │ X3       │ 6       │ 3        │ 4     │ ...   │ 5

How can I work it to do the following:
1- Exclude location from the data frame
2- Group by customer, so that ineach month, the number became the sum of deliveries for all locations with this customer under the given month, something as below:


customers×n DataFrame
│ Row │ Customer │ M0      │ M1       │ M2    │ ...   │ Mn 
│     │ String   │ Int64   │ Int64    │ Int64 │ ..... │ Int64 
├─────┼──────────┼─────────┼──────────┼───────┼───────┼───
│ 1   │ x        │ 8       │ 7        │ 9     │ ...   │ 8     
│ 2   │ y        │ 6       │ 3        │ 4     │ ...   │ 5

In typical SQL, it can be done as:

Select Customer,SUM(M0),SUM(M1),SUM(M2),...,SUM(Mn) From Customers GROUP BY Customer

bkamins · October 9, 2019, 7:34pm

aggregate(df[:, Not(:location)], :Customer, sum)

with a drawback that it will rename you the columns

or

by(sdf -> mapcols(sum, sdf[:, 3:end]), df, :Customer)

and now your columns will not get renamed.

hasanOryx · October 9, 2019, 7:38pm

This one workd perfectly, can you please explain how it works

bkamins · October 9, 2019, 8:00pm

you group df by customer,
then in each subgroup you drop first two columns (as they are :Customer and :location so you do not want to aggregate them)
after dropping the columns you apply sum function to each column of such a data frame (which is done by mapcols function)

You can find more examples of similar transformations by reading the documentation of by and mapcols functions.

hasanOryx · October 9, 2019, 8:03pm

Deeply appreciated, thanks a lot.

Topic		Replies	Views
DataFrame Groupby New to Julia dataframes	2	2148	April 26, 2018
Regarding filtering/creating dataframe New to Julia	3	853	March 13, 2019
Group, Mutate, Ungroup Data question , dataframes	2	1762	February 4, 2018
How to filter data conditional on two columns General Usage data , dataframes	17	1779	January 11, 2022
Dataframes: Split combined result to different columns General Usage dataframes	3	310	December 13, 2021

Filtering and Grouping DataFrame

Related topics