How to organize loop with long prologue and epilogue that switches depending on task

imagine the main loop boils down to this:

function main(table)
   # prologue: 30 lines to set up result

   for r in table
    # 400 lines to filter and make composite variable
    # epilogue: 30 lines to fill variable into corresponding slot at the beginning
   end
   return prologue_made
end

the problem comes when prologue and epilogue’s content depends on what user wants to do. Right now the development workflow involves editing the source code to comment out one pair of prologue & epilogue and leave another.

This is basically source code dumping, so I’m using Mixers.jl @pour to reduce clutter but it’s still the idea (e.g. Revise can’t automatically handle macro), I’d comment out one pair of @intro, @outro depending on what I’m doing.

Is there a better way? The loop structure is here to stay because >> RAM and also impossible to turn into columnar

I am not aware of the general use of concepts like prologue and epilogue in loops and/or why they would be useful.

I have looked at @pour and I do not understand why one would use it instead of defining a function. It seems more like an exposed internal used to implement @mix than something having a clear use case.

If you changes are too arbitrary there is nothing better than just editing the code directly. However, if there is a pattern of what parts often change, then you could take a function as an argument of main to do that processing, and then just call main with different arguments.

the prologue looks like this:

    data_ML = Dictionary(Dict(    
        :SR => Int32[],    
        :Nlep => Int32[],  
        # 30 more lines with different variable names
))

end epilogue looks like this:

...   
  push!(data_ML[:pt_1], pt(v_l_tlv[v_l_order[1]]))
  push!(data_ML[:pt_2], pt(v_l_tlv[v_l_order[2]]))
  push!(data_ML[:pt_3], pt(v_l_tlv[v_l_order[3]]))
  push!(data_ML[:pt_4], pt(v_l_tlv[v_l_order[4]]))
  push!(data_ML[:Zlep1_pt], pt(v_l_tlv[l1]))

you can see that it’s raw number of variables that is the problem here, if I want to turn them into functions, I will need to call a function with all these variable and symbol names and later edits would become 2x work because function interface

I want to correct my OP that prologue happens outside of for loop, epilogue is inside, so it’s more like prepare once and fill many times

I am not sure what you mean by

and later edits would become 2x work because function interface

I do feel like data_ML[:pt_1] should be data_ML[:pt][1] (i.e., have one field with an array instead of 4 similarly named fields), and then you could do something like:

for i in 1:4
    push!(data_ML[:pt][i], pt(v_l_tlv[v_l_order[i]])
end

I also mean this advice in a more general tone, abstract your fields in a way that all fields that have similar processing can be processed together. Even if you keep different names you can do:

for (s, i) in [(:pt_1, 1), (:pt_2, 2), (:pt_3, 3), (:pt_4, 4), (:Zlep1_pt, l1)]
    push!(data_ML[s], pt(v_l_tlv[v_l_order[i]]))
end
for r in table
    ...
    # here's epilogue
    epilogue!(data_ML, :pt_1, pt_1, :var2, var1, :var1, var3....) # many many more
end

now every time you make a change, not only you have to change inside epilogue!, you also have to change this line, in the correct order! It’s like C++ .h and .cxx pointless double edit whenever you change a function


the point of this is so it’s completely flat for going to arrow storage. and there are way too many variable, collapsing a few of them is not gonna help

Why do you pass the Dict, symbols, and values? Would not suffice to pass just the Dict, or to do from start all the changes in a Dict (this is, instead of variables, always refer to Dict fields)?

Sincerely, if your changes can be completely arbitrary, then I do not see much else than keeping everything always inside some structure (a Dict, or NamedTuple, or Vector+Enum) and then allowing to pass some function that takes such structure and makes changes to it.

because the function needs to know which variable goes into which field of the data_ML? this is why they need to live in the same function body:

function epilogue!(data_ML; kwargs...)
    # push each kwargs pair
end

function main()
   #for-loop...
   epilogue!(data_ML; var1=var1, var2=var2, var3alias=var3)
end

if you just pass epilogue!(data_ML, var1, var2, var3... var80) you’re relying on positional info? I guess that’s possible, but again, it’s hard to eye align them when you have 80 variables, when you add or remove one, you need to precisely remember which one it was and what alias you gave them

these variables are “derived” in the middle of the loop body

I mean, why instead of declaring

var_that_goes_into_pt1 = ...
...
epilogue!(data_ML, :pt_1, var_that_goes_into_pt1, ...)

You don’t

data_ML[:pt_1] = ...
...
epilogue!(data_ML)

from start?

Spam = instead of push!()? Aren’t we back to square 1? I mean in this case all epilogue!() does is transfer iteration data_ML to the data_ML outside the loop?

I have the strong feeling that we are not understanding each other.

My change does not add any extra = as they would already exist anyway, no? Instead of defining a new variable (as you said you do in the middle of a loop) you instead set a field in a Dict or any other structure, then you do not need to make changes to the epilogue! call when new variables appear or old ones stop being used. You can also pass any hooks, this is, functions to be called at specific points of the processing, and they will have easy access to all the data.

Does prolor have something more than declaration?

I can think of some smart epolig! that does things on fly and does not need the prolog declaration.

It’s just for initialization