# prologue: 30 lines to set up result
for r in table
# 400 lines to filter and make composite variable
# epilogue: 30 lines to fill variable into corresponding slot at the beginning
the problem comes when prologue and epilogue’s content depends on what user wants to do. Right now the development workflow involves editing the source code to comment out one pair of prologue & epilogue and leave another.
This is basically source code dumping, so I’m using Mixers.jl @pour to reduce clutter but it’s still the idea (e.g. Revise can’t automatically handle macro), I’d comment out one pair of @intro, @outro depending on what I’m doing.
Is there a better way? The loop structure is here to stay because >> RAM and also impossible to turn into columnar
I am not aware of the general use of concepts like prologue and epilogue in loops and/or why they would be useful.
I have looked at @pour and I do not understand why one would use it instead of defining a function. It seems more like an exposed internal used to implement @mix than something having a clear use case.
If you changes are too arbitrary there is nothing better than just editing the code directly. However, if there is a pattern of what parts often change, then you could take a function as an argument of main to do that processing, and then just call main with different arguments.
you can see that it’s raw number of variables that is the problem here, if I want to turn them into functions, I will need to call a function with all these variable and symbol names and later edits would become 2x work because function interface
I want to correct my OP that prologue happens outside of for loop, epilogue is inside, so it’s more like prepare once and fill many times
for r in table
# here's epilogue
epilogue!(data_ML, :pt_1, pt_1, :var2, var1, :var1, var3....) # many many more
now every time you make a change, not only you have to change inside epilogue!, you also have to change this line, in the correct order! It’s like C++ .h and .cxx pointless double edit whenever you change a function
the point of this is so it’s completely flat for going to arrow storage. and there are way too many variable, collapsing a few of them is not gonna help
Why do you pass the Dict, symbols, and values? Would not suffice to pass just the Dict, or to do from start all the changes in a Dict (this is, instead of variables, always refer to Dict fields)?
Sincerely, if your changes can be completely arbitrary, then I do not see much else than keeping everything always inside some structure (a Dict, or NamedTuple, or Vector+Enum) and then allowing to pass some function that takes such structure and makes changes to it.
because the function needs to know which variable goes into which field of the data_ML? this is why they need to live in the same function body:
function epilogue!(data_ML; kwargs...)
# push each kwargs pair
epilogue!(data_ML; var1=var1, var2=var2, var3alias=var3)
if you just pass epilogue!(data_ML, var1, var2, var3... var80) you’re relying on positional info? I guess that’s possible, but again, it’s hard to eye align them when you have 80 variables, when you add or remove one, you need to precisely remember which one it was and what alias you gave them
I have the strong feeling that we are not understanding each other.
My change does not add any extra = as they would already exist anyway, no? Instead of defining a new variable (as you said you do in the middle of a loop) you instead set a field in a Dict or any other structure, then you do not need to make changes to the epilogue! call when new variables appear or old ones stop being used. You can also pass any hooks, this is, functions to be called at specific points of the processing, and they will have easy access to all the data.