I’ve a csv file of the following:
1- Customer name
2- Customer location
3- Deliveries over last 60 months, so n
in the table below is 60
sometimes more
I read the data as:
using CSV, Query, DataFramesMeta, DataFrames
df = CSV.read("deliveries.csv"; header=true, delim=',')
And get it as:
locations×n DataFrame
│ Row │ Customer │ location │ M0 │ M1 │ M2 │ ... │ Mn
│ │ String │ String │ Int64 │ Int64 │ Int64 │ ... │ Int64
├─────┼──────────┼──────────┼─────────┼──────────┼───────┼───────┼──────
│ 1 │ x │ X1 │ 5 │ 4 │ 5 │ ... │ 3
│ 2 │ x │ X2 │ 3 │ 3 │ 4 │ ... │ 5
│ 3 │ y │ X3 │ 6 │ 3 │ 4 │ ... │ 5
How can I work it to do the following:
1- Exclude location from the data frame
2- Group by customer, so that ineach month, the number became the sum of deliveries for all locations with this customer under the given month, something as below:
customers×n DataFrame
│ Row │ Customer │ M0 │ M1 │ M2 │ ... │ Mn
│ │ String │ Int64 │ Int64 │ Int64 │ ..... │ Int64
├─────┼──────────┼─────────┼──────────┼───────┼───────┼───
│ 1 │ x │ 8 │ 7 │ 9 │ ... │ 8
│ 2 │ y │ 6 │ 3 │ 4 │ ... │ 5
In typical SQL, it can be done as:
Select Customer,SUM(M0),SUM(M1),SUM(M2),...,SUM(Mn) From Customers GROUP BY Customer