Do I need a macro to compile a list of functions?

I have a bunch of related functions that I expand from time to time. They are grouped into files. I would like to put into an array all functions in a file, so that I can call them in a loop. Can this be accomplished without writing a macro? If no, how should this macro get the vector of functions?

Just put the functions into a vector? Functions are first-class in Julia, so you can do myfuns = [+, -, *, /] just fine (but it won’t be type-stable, so if needs to be performant maybe use a tuple instead).

So ‘myfun=[include(“function_defs.jl”)]‘ ? That would require separating function definitions with commas within the file, wouldn’t it?

Files are not a good abstraction for Julia code.
You could define this list of function in the file you include. So e.g.

Myfuns.jl contains

fun1(x) = ...
fun2(y) = ....

all_functions = [fun1, fun2]

And then in your main script just include it and afterwards the list of functions will be available in all_functions. Different files would still collect their functions in all_functions. You could also define it beforehand and then use push! inside your files to populate it. If you have multiple collection of these functions in different files and you always include all of them anyways then you should think about using a better method of organization.

1 Like

It’s unclear what result you want here, let me ask some questions:

  • Is the list of functions static for a given run, or do you need to change it dynamically?
  • Do you mean “call them in a loop” as in “call each function, one after another” or as in “loop over data, and call all functions on the data as the body of that loop”?

You won’t need a macro either way, but it could still be the best choice for your application.

1 Like

A set of functions doesn’t change during the execution. Each function needs to be called once per a data value. At present, when I define a new function, I manually add it to the vector. The vector is then passed to the routine that calls each function for every data value and stores the results in a matrix, a row per each data value. The number of columns of the matrix is the sum of the number of output values of each function.

Additionally, every function has the input argument that tells it not to perform the computation, but return the list of symbols, one per every output. That is so I can store the results in CSV file with named columns.

I’d throw the functions into a module to isolate the names from the destination namespace, and import the list (I assume this is a Vector{Function}).

A higher order function-barrier would help, if it were possible to apply each function to all data values rather than all functions to each data value.

This will almost certainly benefit from a macro, then. That gives the compiler the best chance to perform optimizations such as specializing the function to the inputs and inlining. Iterating over a vector at runtime means the compiler can’t make much of any assumptions about what will be called, it will have to determine at runtime (for each call) what method should be used based on the type of the data. This will also result in unnecessary boxing of primitive types, which will then allocate and have to be garbage collected.

Consider using a compile-time value here, dispatching to a separate method of the function. Example: fn_a(::Val{True}, args...) does the computation, and fn_a(::Val{False}, args...) returns the symbols.

If determining the symbols involves no computation, then manually maintaining the symbol list for each function as a Vector, and passing that to the macro, is going to be a better choice.

1 Like

Thank you for the thoughtful answer. What I didn’t make clear in my question is that performance is not much of a concern for me, but rather, the amount of work involved in adding a new function.

3 Likes

It is unclear to me how exactly these functions should be aggregated. Would this work for you? Note it will not be performant but we can worry about that later.

Main file:

all_function = []

include("file1.jl")
include("file2.jl")
if some_condition:
    include("file3.jl")
end

# now all included functions are in all_functions and can be called in a loop

The individual function files look like

foo(x) = 1
push!(all_functions, foo)

bar(y) = 2
push!(all_functions, bar)
# more functions ...

Then adding a new function is very simple. You just put in a file and write a corresponding push! to all_functions beneath it.

1 Like

Macros are certainly harder to write, but one only has to do it once. This might be better as a @generated function, which is just another way of doing metaprogramming.

If you posted a minimal example of the function Vector, and the function which calls those functions, people could help you get started on a macro or gen-function for it.

In all three cases, adding and subtracting functions from the Vector will automatically change the result of the calling function. So all the extra work is up-front, so to speak: actually using it will be just as easy in all cases, and even if performance is less of a concern, it’s still a nice-to-have, yes?

1 Like

The purpose of this set of functions is to generate individual problem sets for coursework. The text of a problem statement is fixed, but there’re some parameters that are generated randomly. One function generates data and solution(s) for one problem. For example, parameters for a first problem from the set number 4 (very simple made-up example) would look like this

problems = [problem4_1, problem4_2, ... etc ]

function problem4_1(rng_seed::Integer=0; column_names=false)
    if column_names
        return [:pr4_1_swimming_pool_size_liters, :pr4_1_pipe_inflow_liters_sec,
                :pr4_1_pipe_outflow_liters_sec, :pr4_1_time_to_fill_min]
    end
    Random.seed!(rng_seed)
    #
    swimming_pool_size_liters = rand(1000:10:2000)
    pipe_inflow_liters_sec = rand(20:30)
    pipe_outflow_liters_sec = rand(5:10)
    #
    time_to_fill_min =  problem4_1_solution(swimming_pool_size_liters, pipe_inflow_liters_sec, pipe_outflow_liters_sec)
    #
    return (swimming_pool_size_liters,pipe_inflow_liters_sec,pipe_outflow_liters_sec,
            time_to_fill_min)
end
function  problem4_1_solution()
    time_to_fill_min = swimming_pool_size_liters/(pipe_inflow_liters_sec-pipe_outflow_liters_sec)/60
end

And then there’s a function that generates a dataframe that is saved into a CSV file

function make_all_problems(
    roster::AbstractDataFrame,
    problems::AbstractVector{<:Function},
    rng_seed::Integer)

    N_students = nrow(roster)
    col_names = [pr(column_names=true) for pr in problems]
    param_lengths = [length(nms) for nms in col_names]
    col_names = mapreduce(identity, vcat, col_names)
    M = length(col_names)
    problems_data = Matrix{Any}(missing, N_students, M)
    N_all_problems = length(problems)
     for k in 1:N_students
         for n in 1:length(problems)
            idx_1 = sum(param_lengths[1:n - 1]) + 1
            idx_2 = idx_1 + param_lengths[n] - 1
            problems_data[k, idx_1:idx_2] .= problems[n](rng_seed + k + n)
        end
    end
    for m in 1:M
        column_type = promote_type(typeof.(filter(!ismissing, problems_data[:, m]))...)
        roster[!, col_names[m]] = Vector{Union{column_type,Missing}}(problems_data[:,m])
    end

    return roster
end

Then this CSV file is used by the LaTeX package “csvmerge” to generate the text of problem sets.

And you want to generate different problem sets simply by changing the included files? Then I think my approach is sufficient, isn’t it?

One could add a bit more bookkeeping/make it more Julian by using some multiple dispatch but it is the same approach conceptually. Here is a sketch:
Main file

abstract type AbstractProblem end # not needed but nicer for bookkeeping :)

# Generic functions
function column_names end
function problem_data end
function problem_solution end

# a default method - instantiates some RNG object if given a seed
problem_data(prob, rng_seed::Int=0) = problem_data(prob, Random.Xoshiro(rng_seed)

include("problems1.jl")
# .. and so on
function make_all_problems(
    roster::AbstractDataFrame,
    problems::AbstractVector{AbstractProblem},
    rng_seed::Integer)

    N_students = nrow(roster)
    col_names = column_names.(problems)
    param_lengths = length.(col_names)
    col_names = reduce(vcat, col_names)
    M = length(col_names)
    problems_data = Matrix{Any}(missing, N_students, M)
    N_all_problems = length(problems)
     for k in 1:N_students
         for n in 1:length(problems)
            idx_1 = sum(param_lengths[1:n - 1]) + 1
            idx_2 = idx_1 + param_lengths[n] - 1
            problems_data[k, idx_1:idx_2] .= problem_data(problems[n], rng_seed + k + n)
        end
    end
    for m in 1:M
        column_type = promote_type(typeof.(filter(!ismissing, problems_data[:, m]))...)
        roster[!, col_names[m]] = Vector{Union{column_type,Missing}}(problems_data[:,m])
    end

    return roster
end
struct Problem4_1 <: AbstractProblem end

column_names(::Problem4_1) = [
    :pr4_1_swimming_pool_size_liters, 
    :pr4_1_pipe_inflow_liters_sec,
    :pr4_1_pipe_outflow_liters_sec, 
    :pr4_1_time_to_fill_min]

function problem_data(::Problem4_1, rng::AbstractRNG)
    swimming_pool_size_liters = rand(rng, 1000:10:2000)
    pipe_inflow_liters_sec = rand(rng, 20:30)
    pipe_outflow_liters_sec = rand(rng, 5:10)
    #
    time_to_fill_min =  problem_solution(Problem4_1(), swimming_pool_size_liters, pipe_inflow_liters_sec, pipe_outflow_liters_sec)
    #
    return (swimming_pool_size_liters,pipe_inflow_liters_sec,pipe_outflow_liters_sec,
            time_to_fill_min)
end

function problem_solution(::Problem4_1, swimming_pool_size_liters, pipe_inflow_liters_sec, pipe_outflow_liters_sec)
    return swimming_pool_size_liters/(pipe_inflow_liters_sec-pipe_outflow_liters_sec)/60
end

struct Problem4_2 <: AbstractProblem end
# ...
PROBLEM_SET4 = [Problem4_1(), Problem4_2()]

One could additionally use some modules to separate the namespaces a bit more (useful if problems are more complex and require some helper functions). One could also think about giving the Problem types some fields if e.g. you want to be able to generate variations of the same problem (maybe harder, larger or whatever).

A set might have 15 to 20 of those problems, each with different lists of inputs and outputs and different solution algorithm. My question is whether I can decrease the amount of repetitive boilerplate involved in adding a new problem. Relying on a dispatch is an interesting idea, it does seem to make the job easier. For instance, the lists of inputs and outputs right now need to be manually duplicated in 4 places.
By defining a type per function those lists can be defined just once (make the parameters fields of a type). The list of symbols can be replaced by a call to fieldnames(), the argument to the solution
function and its return are now always the same. Maybe, a macro can help to get rid of manual input like

PROBLEM_SET4 = [Problem4_1(), Problem4_2()]

What I was thinking is some type of macro-level pragma, like

#pragma_input_list 
swimming_pool_size_liters,pipe_inflow_liters_sec,pipe_outflow_liters_sec

and then inserting this automatically into the symbols list, solution function call, solution function definition and return args. But after looking into the metaprogramming section, I did not see any traces of a capability like this. Macro definition works on parsed source, and I assume the parser discards comments.

1 Like

Instead of copying around multiple values could you bundle them up in a struct? You could change up what I suggested earlier and maybe do:

struct Problem4_1 <: AbstractProblem
    swimming_pool_size_liters
    pipe_inflow_liters_sec
    pipe_outflow_liters_sec
end

function Problem4_1(rng::AbstractRNG)
    swimming_pool_size_liters = rand(rng, 1000:10:2000)
    pipe_inflow_liters_sec = rand(rng, 20:30)
    pipe_outflow_liters_sec = rand(rng, 5:10)
    return Problem4_1(swimming_pool_size_liters, pipe_inflow_liters_sec, pipe_outflow_liters_sec)
    # or dont' assign locally to reduce boilerplate
end

function problem_solution(p::Problem4_1)
    return p.swimming_pool_size_liters/(p.pipe_inflow_liters_sec-p.pipe_outflow_liters_sec)/60
end

struct Problem4_2 <: AbstractProblem
    # fields
end
# ...
PROBLEM_SET4 = [Problem4_1, Problem4_2]

You can get the column_names via fieldnames.

This should reduce the boilerplate as you just pass around the struct and not all of the values separately. Please clarify what else you consider boilerplate.

It might be that there could be a macro-based solution to simply writing stuff. To write macros, the best starting point is to write down an example of what you want to write down ideally (needs to be valid Julia syntax, not valid code) and then think about what it needs to expand to such that it does what it should do.

1 Like

You probably wrote your comment while I was editing mine, so there’s some overlap. The return list with this approach also becomes unchanging for every problem() and problem_solution() method, so the savings are definite and I shall move to this way of programming the sets.

Another possible simplification is to add a separate abstract type per set, and then all types within the set are recovered by calling subtypes():

abstract type AbstractSet end
abstract type ProblemSet4<: AbstractSet end 

mutable struct Problem4_1 <: ProblemSet4
    swimming_pool_size_liters
    pipe_inflow_liters_sec
    pipe_outflow_liters_sec
   time_to_fill_min
end
1 Like

As a matter of basic data hygiene, I would handle this part differently:

Something more like this:

const problems = [problem4_1, problem4_2, ... etc ]
const columns_for_problem = Dict{Function,Vector{Symbol}()
columns_for_problem[problem4_1] = [:pr4_1_swimming_pool_size_liters, :pr4_1_pipe_inflow_liters_sec, :pr4_1_pipe_outflow_liters_sec, :pr4_1_time_to_fill_min]

If you create the Dict before any of the problem functions, the column names can be added under each problem, making it easy enough to keep both in sync.

Then remove the optional keyword from all the problems, and pass in the columns_for_problem Dict to the main function, and just do this:

col_names = vcat((columns_for_problem[pr] for pr in problems)...)

To get one Vector with all column names for each project, in order. param_lengths can be replaced with

    param_lengths = [length(col) for col in columns_for_problem]

Given what you’re doing with the function, I don’t think writing a macro to unroll calling them will help you in any way, because you use the iterations to place the data in the Matrix. This is clearly not an application where the loss in performance (if there would even be any here) is relevant.

AbstractSet is already taken (it’s the supertype of Set) but an approach like this does let you automatically generate the column headers:

julia> abstract type AbstractProblemSet end

julia> abstract type ProblemSet4<: AbstractProblemSet end

julia> mutable struct Problem4_1 <: ProblemSet4
           swimming_pool_size_liters
           pipe_inflow_liters_sec
           pipe_outflow_liters_sec
          time_to_fill_min
       end

julia> function columns_of_problem(problem::AbstractProblemSet)
           header = Symbol[]
           for column in fieldnames(typeof(problem))
               push!(header, column)
           end
           return header
       end
columns_of_problem (generic function with 1 method)

julia> columns_of_problem(Problem4_1(1,2,3,4))
4-element Vector{Symbol}:
 :swimming_pool_size_liters
 :pipe_inflow_liters_sec
 :pipe_outflow_liters_sec
 :time_to_fill_min

You’ll want to type the fields of the struct, which won’t interfere with this approach, and you can also retrieve them with fieldtypes if you need to.

Edit: you could also do the same thing with the type, with a signature like function columns_of_problem(T::Type{<:AbstractProblemSet}), and drop the typeof in the body.

Also worth noting that this will let you define the problem itself by making the struct callable.

julia> (p::Problem4_1)() = p.swimming_pool_size_liters/(p.pipe_inflow_liters_sec-p.pipe_outflow_liters_sec)/60

julia> Problem4_1(1,2,3,4)()
-0.016666666666666666

The suggested types/functions are reasonable implementations, yet macros might be good fit for your use case. While harder to write and maybe understand then functions, they could nicely document the purpose of your definitions. Here is a possible syntax inspired by Test and Turing:

@problemset ProblemSet4 begin
    @problem Ex4_1 begin
        swimming_pool_size_liters ~ 1000:10:2000
        pipe_inflow_liters_sec ~ 20:30
        pipe_outflow_liters_sec ~ 5:10
        @solution time_to_fill_min = swimming_pool_size_liters/(pipe_inflow_liters_sec-pipe_outflow_liters_sec)/60
    end
    @problem Ex4_2 begin
        ...
    end
end

This would then define a vector of problems called ProblemSet4 as well as types and methods for all problems Ex4_1, Ex4_2 …

2 Likes