The code is defined inside a function, say main, that I have not written here.
The df_flows variable is defined inside that function.
Instead of writing functions, I am using the explicit access to the variable df_flows.rp. Since df_flows is a local variable, I think that using a function is not necessary here, right? I was just trying to make the example readable.
About the use case, it is a left join that I then group. I can try to explain below what I want:
Explanation in join terms:
leftjoin(df_cons, df_flows, on = [:rp, :asset => :to])- Compute the intersection of the
time_blockof left and right - Multiply the resulting value by the
flowcolumn - Sum
flowby grouping bydf_cons’ index
Per row explanation:
- For each row of
df_cons - Select/filter
df_flowsby matchingrp = row.rpandto = row.asset - Compute the intersection of the time blocks
- Multiply the resulting value by the
flowcolumn - Sum
flowand return
My current solution is to not do any use of DataFrames, and just use Dictionaries to store the indices of the non-zero flows. It is slow, but around 10x faster that this version. The full context is Speeding up JuMP model creation with sets that depend on other indexes - #6 by slwu89