I have a bunch of related functions that I expand from time to time. They are grouped into files. I would like to put into an array all functions in a file, so that I can call them in a loop. Can this be accomplished without writing a macro? If no, how should this macro get the vector of functions?
Just put the functions into a vector? Functions are first-class in Julia, so you can do myfuns = [+, -, *, /]
just fine (but it wonât be type-stable, so if needs to be performant maybe use a tuple instead).
So âmyfun=[include(âfunction_defs.jlâ)]â ? That would require separating function definitions with commas within the file, wouldnât it?
Files are not a good abstraction for Julia code.
You could define this list of function in the file you include. So e.g.
Myfuns.jl contains
fun1(x) = ...
fun2(y) = ....
all_functions = [fun1, fun2]
And then in your main script just include it and afterwards the list of functions will be available in all_functions
. Different files would still collect their functions in all_functions
. You could also define it beforehand and then use push!
inside your files to populate it. If you have multiple collection of these functions in different files and you always include all of them anyways then you should think about using a better method of organization.
Itâs unclear what result you want here, let me ask some questions:
- Is the list of functions static for a given run, or do you need to change it dynamically?
- Do you mean âcall them in a loopâ as in âcall each function, one after anotherâ or as in âloop over data, and call all functions on the data as the body of that loopâ?
You wonât need a macro either way, but it could still be the best choice for your application.
A set of functions doesnât change during the execution. Each function needs to be called once per a data value. At present, when I define a new function, I manually add it to the vector. The vector is then passed to the routine that calls each function for every data value and stores the results in a matrix, a row per each data value. The number of columns of the matrix is the sum of the number of output values of each function.
Additionally, every function has the input argument that tells it not to perform the computation, but return the list of symbols, one per every output. That is so I can store the results in CSV file with named columns.
Iâd throw the functions into a module to isolate the names from the destination namespace, and import the list (I assume this is a Vector{Function}
).
A higher order function-barrier would help, if it were possible to apply each function to all data values rather than all functions to each data value.
This will almost certainly benefit from a macro, then. That gives the compiler the best chance to perform optimizations such as specializing the function to the inputs and inlining. Iterating over a vector at runtime means the compiler canât make much of any assumptions about what will be called, it will have to determine at runtime (for each call) what method should be used based on the type of the data. This will also result in unnecessary boxing of primitive types, which will then allocate and have to be garbage collected.
Consider using a compile-time value here, dispatching to a separate method of the function. Example: fn_a(::Val{True}, args...)
does the computation, and fn_a(::Val{False}, args...)
returns the symbols.
If determining the symbols involves no computation, then manually maintaining the symbol list for each function as a Vector, and passing that to the macro, is going to be a better choice.
Thank you for the thoughtful answer. What I didnât make clear in my question is that performance is not much of a concern for me, but rather, the amount of work involved in adding a new function.
It is unclear to me how exactly these functions should be aggregated. Would this work for you? Note it will not be performant but we can worry about that later.
Main file:
all_function = []
include("file1.jl")
include("file2.jl")
if some_condition:
include("file3.jl")
end
# now all included functions are in all_functions and can be called in a loop
The individual function files look like
foo(x) = 1
push!(all_functions, foo)
bar(y) = 2
push!(all_functions, bar)
# more functions ...
Then adding a new function is very simple. You just put in a file and write a corresponding push!
to all_functions
beneath it.
Macros are certainly harder to write, but one only has to do it once. This might be better as a @generated
function, which is just another way of doing metaprogramming.
If you posted a minimal example of the function Vector
, and the function which calls those functions, people could help you get started on a macro or gen-function for it.
In all three cases, adding and subtracting functions from the Vector will automatically change the result of the calling function. So all the extra work is up-front, so to speak: actually using it will be just as easy in all cases, and even if performance is less of a concern, itâs still a nice-to-have, yes?
The purpose of this set of functions is to generate individual problem sets for coursework. The text of a problem statement is fixed, but thereâre some parameters that are generated randomly. One function generates data and solution(s) for one problem. For example, parameters for a first problem from the set number 4 (very simple made-up example) would look like this
problems = [problem4_1, problem4_2, ... etc ]
function problem4_1(rng_seed::Integer=0; column_names=false)
if column_names
return [:pr4_1_swimming_pool_size_liters, :pr4_1_pipe_inflow_liters_sec,
:pr4_1_pipe_outflow_liters_sec, :pr4_1_time_to_fill_min]
end
Random.seed!(rng_seed)
#
swimming_pool_size_liters = rand(1000:10:2000)
pipe_inflow_liters_sec = rand(20:30)
pipe_outflow_liters_sec = rand(5:10)
#
time_to_fill_min = problem4_1_solution(swimming_pool_size_liters, pipe_inflow_liters_sec, pipe_outflow_liters_sec)
#
return (swimming_pool_size_liters,pipe_inflow_liters_sec,pipe_outflow_liters_sec,
time_to_fill_min)
end
function problem4_1_solution()
time_to_fill_min = swimming_pool_size_liters/(pipe_inflow_liters_sec-pipe_outflow_liters_sec)/60
end
And then thereâs a function that generates a dataframe that is saved into a CSV file
function make_all_problems(
roster::AbstractDataFrame,
problems::AbstractVector{<:Function},
rng_seed::Integer)
N_students = nrow(roster)
col_names = [pr(column_names=true) for pr in problems]
param_lengths = [length(nms) for nms in col_names]
col_names = mapreduce(identity, vcat, col_names)
M = length(col_names)
problems_data = Matrix{Any}(missing, N_students, M)
N_all_problems = length(problems)
for k in 1:N_students
for n in 1:length(problems)
idx_1 = sum(param_lengths[1:n - 1]) + 1
idx_2 = idx_1 + param_lengths[n] - 1
problems_data[k, idx_1:idx_2] .= problems[n](rng_seed + k + n)
end
end
for m in 1:M
column_type = promote_type(typeof.(filter(!ismissing, problems_data[:, m]))...)
roster[!, col_names[m]] = Vector{Union{column_type,Missing}}(problems_data[:,m])
end
return roster
end
Then this CSV file is used by the LaTeX package âcsvmergeâ to generate the text of problem sets.
And you want to generate different problem sets simply by changing the included files? Then I think my approach is sufficient, isnât it?
One could add a bit more bookkeeping/make it more Julian by using some multiple dispatch but it is the same approach conceptually. Here is a sketch:
Main file
abstract type AbstractProblem end # not needed but nicer for bookkeeping :)
# Generic functions
function column_names end
function problem_data end
function problem_solution end
# a default method - instantiates some RNG object if given a seed
problem_data(prob, rng_seed::Int=0) = problem_data(prob, Random.Xoshiro(rng_seed)
include("problems1.jl")
# .. and so on
function make_all_problems(
roster::AbstractDataFrame,
problems::AbstractVector{AbstractProblem},
rng_seed::Integer)
N_students = nrow(roster)
col_names = column_names.(problems)
param_lengths = length.(col_names)
col_names = reduce(vcat, col_names)
M = length(col_names)
problems_data = Matrix{Any}(missing, N_students, M)
N_all_problems = length(problems)
for k in 1:N_students
for n in 1:length(problems)
idx_1 = sum(param_lengths[1:n - 1]) + 1
idx_2 = idx_1 + param_lengths[n] - 1
problems_data[k, idx_1:idx_2] .= problem_data(problems[n], rng_seed + k + n)
end
end
for m in 1:M
column_type = promote_type(typeof.(filter(!ismissing, problems_data[:, m]))...)
roster[!, col_names[m]] = Vector{Union{column_type,Missing}}(problems_data[:,m])
end
return roster
end
struct Problem4_1 <: AbstractProblem end
column_names(::Problem4_1) = [
:pr4_1_swimming_pool_size_liters,
:pr4_1_pipe_inflow_liters_sec,
:pr4_1_pipe_outflow_liters_sec,
:pr4_1_time_to_fill_min]
function problem_data(::Problem4_1, rng::AbstractRNG)
swimming_pool_size_liters = rand(rng, 1000:10:2000)
pipe_inflow_liters_sec = rand(rng, 20:30)
pipe_outflow_liters_sec = rand(rng, 5:10)
#
time_to_fill_min = problem_solution(Problem4_1(), swimming_pool_size_liters, pipe_inflow_liters_sec, pipe_outflow_liters_sec)
#
return (swimming_pool_size_liters,pipe_inflow_liters_sec,pipe_outflow_liters_sec,
time_to_fill_min)
end
function problem_solution(::Problem4_1, swimming_pool_size_liters, pipe_inflow_liters_sec, pipe_outflow_liters_sec)
return swimming_pool_size_liters/(pipe_inflow_liters_sec-pipe_outflow_liters_sec)/60
end
struct Problem4_2 <: AbstractProblem end
# ...
PROBLEM_SET4 = [Problem4_1(), Problem4_2()]
One could additionally use some modules to separate the namespaces a bit more (useful if problems are more complex and require some helper functions). One could also think about giving the Problem
types some fields if e.g. you want to be able to generate variations of the same problem (maybe harder, larger or whatever).
A set might have 15 to 20 of those problems, each with different lists of inputs and outputs and different solution algorithm. My question is whether I can decrease the amount of repetitive boilerplate involved in adding a new problem. Relying on a dispatch is an interesting idea, it does seem to make the job easier. For instance, the lists of inputs and outputs right now need to be manually duplicated in 4 places.
By defining a type per function those lists can be defined just once (make the parameters fields of a type). The list of symbols can be replaced by a call to fieldnames()
, the argument to the solution
function and its return are now always the same. Maybe, a macro can help to get rid of manual input like
PROBLEM_SET4 = [Problem4_1(), Problem4_2()]
What I was thinking is some type of macro-level pragma, like
#pragma_input_list
swimming_pool_size_liters,pipe_inflow_liters_sec,pipe_outflow_liters_sec
and then inserting this automatically into the symbols list, solution function call, solution function definition and return args. But after looking into the metaprogramming section, I did not see any traces of a capability like this. Macro definition works on parsed source, and I assume the parser discards comments.
Instead of copying around multiple values could you bundle them up in a struct? You could change up what I suggested earlier and maybe do:
struct Problem4_1 <: AbstractProblem
swimming_pool_size_liters
pipe_inflow_liters_sec
pipe_outflow_liters_sec
end
function Problem4_1(rng::AbstractRNG)
swimming_pool_size_liters = rand(rng, 1000:10:2000)
pipe_inflow_liters_sec = rand(rng, 20:30)
pipe_outflow_liters_sec = rand(rng, 5:10)
return Problem4_1(swimming_pool_size_liters, pipe_inflow_liters_sec, pipe_outflow_liters_sec)
# or dont' assign locally to reduce boilerplate
end
function problem_solution(p::Problem4_1)
return p.swimming_pool_size_liters/(p.pipe_inflow_liters_sec-p.pipe_outflow_liters_sec)/60
end
struct Problem4_2 <: AbstractProblem
# fields
end
# ...
PROBLEM_SET4 = [Problem4_1, Problem4_2]
You can get the column_names
via fieldnames
.
This should reduce the boilerplate as you just pass around the struct and not all of the values separately. Please clarify what else you consider boilerplate.
It might be that there could be a macro-based solution to simply writing stuff. To write macros, the best starting point is to write down an example of what you want to write down ideally (needs to be valid Julia syntax, not valid code) and then think about what it needs to expand to such that it does what it should do.
You probably wrote your comment while I was editing mine, so thereâs some overlap. The return list with this approach also becomes unchanging for every problem()
and problem_solution()
method, so the savings are definite and I shall move to this way of programming the sets.
Another possible simplification is to add a separate abstract type per set, and then all types within the set are recovered by calling subtypes()
:
abstract type AbstractSet end
abstract type ProblemSet4<: AbstractSet end
mutable struct Problem4_1 <: ProblemSet4
swimming_pool_size_liters
pipe_inflow_liters_sec
pipe_outflow_liters_sec
time_to_fill_min
end
As a matter of basic data hygiene, I would handle this part differently:
Something more like this:
const problems = [problem4_1, problem4_2, ... etc ]
const columns_for_problem = Dict{Function,Vector{Symbol}()
columns_for_problem[problem4_1] = [:pr4_1_swimming_pool_size_liters, :pr4_1_pipe_inflow_liters_sec, :pr4_1_pipe_outflow_liters_sec, :pr4_1_time_to_fill_min]
If you create the Dict
before any of the problem functions, the column names can be added under each problem, making it easy enough to keep both in sync.
Then remove the optional keyword from all the problems, and pass in the columns_for_problem
Dict
to the main function, and just do this:
col_names = vcat((columns_for_problem[pr] for pr in problems)...)
To get one Vector
with all column names for each project, in order. param_lengths
can be replaced with
param_lengths = [length(col) for col in columns_for_problem]
Given what youâre doing with the function, I donât think writing a macro to unroll calling them will help you in any way, because you use the iterations to place the data in the Matrix
. This is clearly not an application where the loss in performance (if there would even be any here) is relevant.
AbstractSet
is already taken (itâs the supertype of Set
) but an approach like this does let you automatically generate the column headers:
julia> abstract type AbstractProblemSet end
julia> abstract type ProblemSet4<: AbstractProblemSet end
julia> mutable struct Problem4_1 <: ProblemSet4
swimming_pool_size_liters
pipe_inflow_liters_sec
pipe_outflow_liters_sec
time_to_fill_min
end
julia> function columns_of_problem(problem::AbstractProblemSet)
header = Symbol[]
for column in fieldnames(typeof(problem))
push!(header, column)
end
return header
end
columns_of_problem (generic function with 1 method)
julia> columns_of_problem(Problem4_1(1,2,3,4))
4-element Vector{Symbol}:
:swimming_pool_size_liters
:pipe_inflow_liters_sec
:pipe_outflow_liters_sec
:time_to_fill_min
Youâll want to type the fields of the struct, which wonât interfere with this approach, and you can also retrieve them with fieldtypes
if you need to.
Edit: you could also do the same thing with the type, with a signature like function columns_of_problem(T::Type{<:AbstractProblemSet})
, and drop the typeof
in the body.
Also worth noting that this will let you define the problem itself by making the struct callable.
julia> (p::Problem4_1)() = p.swimming_pool_size_liters/(p.pipe_inflow_liters_sec-p.pipe_outflow_liters_sec)/60
julia> Problem4_1(1,2,3,4)()
-0.016666666666666666
The suggested types/functions are reasonable implementations, yet macros might be good fit for your use case. While harder to write and maybe understand then functions, they could nicely document the purpose of your definitions. Here is a possible syntax inspired by Test
and Turing
:
@problemset ProblemSet4 begin
@problem Ex4_1 begin
swimming_pool_size_liters ~ 1000:10:2000
pipe_inflow_liters_sec ~ 20:30
pipe_outflow_liters_sec ~ 5:10
@solution time_to_fill_min = swimming_pool_size_liters/(pipe_inflow_liters_sec-pipe_outflow_liters_sec)/60
end
@problem Ex4_2 begin
...
end
end
This would then define a vector of problems called ProblemSet4
as well as types and methods for all problems Ex4_1
, Ex4_2
âŚ