Run parts of other files from a loop

Hi Julia folks,

This is a somewhat strange/niche question, but I hope the answer ends up being useful to other users. Basically, I have one file that needs to loop over a list of other files, evaluating only the first few lines of each.

For example, suppose the files in my list are file1.jl and file2.jl (below). In a third file, main.jl (also below), I need to loop over these files to get their values of the variable M and do some operations to them. I don’t mind if main.jl picks up the variables defined prior to M, but I cannot evaluate the lines after defining M, as these include function definitions that conflict with each other.

Is there a way to do this, either by modifying my current approach or doing something more clever? Any advice is greatly appreciated!

Many thanks,

Alex

Example files:

# file1.jl
a = 1;
b = 2;
M = a + b;
# ...followed by stuff we don't want
# file2.jl
a = 1;
b = 2;
c = 3;
M = a + b * c;
# ...followed by stuff we don't want
# main.jl
files = ["file1.jl","file2.jl"]; # files to loop over
N = length(files); # number of files
v = zeros(N); # vector to store manipulated values of M

for n = 1:N
    f = files[n]
    # do something to get M from file f
    v[n] = M^2
end

What an ugly hack of an idea. If you can’t restructure the files better to be more reusable, then write code that extracts the code you need and write it to another file as a reusable function.

2 Likes

Any suggestions for how to do this? I know it’s an ugly idea, and I’ve already come up with an ugly solution. I was hoping to find a better idea with a better solution.

Hey @alexdmeyer .
Here is one possible, minimally invasive change.
Create a file config.jl. In that, create a dictionary which defines each tuple of a, b, M etc.
This might look like: config_dictionary = Dict(1 => (2, 3, 4), 2 => (5, 6, 7))

Then, each of file1.jl, file2.jl can access those like this:

include("config.jl")
a, b, M = config_dictionary[1] # change the key here for each file.
# proceed to do whatever you do, even if it's a conflicting definition

Then, back in main.jl, you can access that same dict, and perform you sum of squares on all the Ms. This should have no affect on how you run file1, file2 etc.

Hope this helps

1 Like

It depends a lot on what actually happens in those files before the definition of M. Maybe for each file, read lines up to the M assignment. In a new file write

function MfileX()

(where you replace fileX with some name from each file), then add the lines through the M assignment, and finally an end line. Repeat for each file. At the end, write a function, say allMs(), that returns either all the M functions or the results of calling all the M functions. Your client code can include the new file and call allMs().

Maybe the files might have code that shouldn’t be part of the M functions, but you can identify it and omit it, or include it outside of the M functions.

At least this makes your M definitions reusable.

This seems like it might be an XY problem. What is your underlying goal?

2 Likes

Hi everyone,

Thanks for the great suggestions. @Jeff_Emanuel, your idea sounds like it should work, although it’s not what I’d have thought of myself. I’ll give that a try tomorrow.

@stevengj you’re probably right about that. Here’s some context.

I’m performing the same analysis of ~20 outbreaks of an infectious disease. Since the analysis is always the same procedure with different inputs for each outbreak, I have one file (like main.jl in my MWE) that performs the analysis and ~20 files (like file1.jl in my MWE) that define those outbreak-specific inputs. The main file takes the name of the outbreak as an argument and uses it to include the correct input file.

This was fine when I was analyzing each outbreak separately. For the meta-analysis of these results, however, I sometimes need parts of all these input files – in this case, the part that defines a variable M, which is always the first ~10 lines of the input file. I don’t necessarily need all M’s stored at once; I just need to manipulate each one alone and store the output (that is, no operations require M’s from multiple files at once), which is why my first guess used a for-loop.

Hopefully that made sense. I’ve brute-forced a temporary solution to this problem (N = 20 is not that large) and I think Jeff_Emanuel’s idea will work, but I welcome any further suggestions.

An obvious solution that I overlooked is to save M from each input file… I didn’t think of it before because M has a user-defined type with many parts, but writing functions to save it and load it in a loop will be easy compared with trying to run the code to build it in a loop.

Thanks everyone for your help!

Ignore the naysayers - go wild !!

Use readline to read the lines you want and eval them

open("file2.jl") do io
    M = nothing
    while M === nothing
       eval(Meta.parse(readline(io)))
    end
end

Maybe use a data format, such as JSON or TOML or YAML, to define the parameters, rather than code?

(Forcing yourself to include parameters into the global namespace is asking for trouble in the long run. e.g. suppose you want to run many simulations in parallel.)

Of course, sometimes you want code to set up the simulation parameters, e.g. if you are running a parameter sweep. But in this case, I would usually recommend organizing things the other way around — have your parameter-generating code call the computational code (as a subroutine in a package) rather than the opposite. As a general principle, writing nontrivial computational code as global “scripts” makes it inflexible and non-reusable in the long run. See also Organizing code in Julia.

Here’s a nice excerpt on a related subject from a great (language-agnostic) book I would recommend, The Pragmatic Programmer:

3 Likes