The code is defined inside a function, say main
, that I have not written here.
The df_flows
variable is defined inside that function.
Instead of writing functions, I am using the explicit access to the variable df_flows.rp
. Since df_flows
is a local variable, I think that using a function is not necessary here, right? I was just trying to make the example readable.
About the use case, it is a left join that I then group. I can try to explain below what I want:
Explanation in join terms:
leftjoin(df_cons, df_flows, on = [:rp, :asset => :to])
- Compute the intersection of the
time_block
of left and right - Multiply the resulting value by the
flow
column - Sum
flow
by grouping bydf_cons
’ index
Per row explanation:
- For each row of
df_cons
- Select/filter
df_flows
by matchingrp = row.rp
andto = row.asset
- Compute the intersection of the time blocks
- Multiply the resulting value by the
flow
column - Sum
flow
and return
My current solution is to not do any use of DataFrames, and just use Dictionaries to store the indices of the non-zero flows. It is slow, but around 10x faster that this version. The full context is Speeding up JuMP model creation with sets that depend on other indexes - #6 by slwu89