Need to look into your suggestion in more detail. Meanwhile, the MWE for solution with types an multiple dispatch is below.
abstract type AbstractProblemSet end
abstract type Set3 <: AbstractProblemSet end
mutable struct Problem3_1 <: Set3
swimming_pool_size_liters::Int
pipe_inflow_liters_sec::Int
pipe_outflow_liters_sec::Int
time_to_fill_min::Float64
end
function problem(problem_type::Type{Problem3_1})
swimming_pool_size_liters = rand(1000:10:2000)
pipe_inflow_liters_sec = rand(20:30)
pipe_outflow_liters_sec = rand(5:10)
p = problem_type(swimming_pool_size_liters, pipe_inflow_liters_sec,
pipe_outflow_liters_sec, -1)
problem_solution!(p)
return p
end
function problem_solution!(p::Problem3_1)
p.time_to_fill_min = p.swimming_pool_size_liters/(p.pipe_inflow_liters_sec-p.pipe_outflow_liters_sec)/60
return p
end
function make_set(
roster::AbstractDataFrame, S::Type{<:AbstractProblemSet}, rng_seed::Integer
)
N_students = nrow(roster)
col_names = [fieldnames(pr) for pr in subtypes(S)]
param_lengths = [length(nms) for nms in col_names]
col_names = mapreduce(identity, vcat, col_names)
M = length(col_names)
problems_data = Matrix{Any}(missing, N_students, M)
for k in 1:N_students
problem_types = subtypes(S)
for n in 1:length(problem_types)
idx_1 = sum(param_lengths[1:n - 1]) + 1
idx_2 = idx_1 + param_lengths[n] - 1
Random.seed!(rng_seed + k + n)
pr = problem(problem_types[n])
problems_data[k, idx_1:idx_2] .= [getproperty(pr, f) for f
in fieldnames(problem_types[n])]
end
end
for m in 1:M
column_type = promote_type(typeof.(filter(!ismissing, problems_data[:, m]))...)
roster[!, col_names[m]] = Vector{Union{column_type,Missing}}(problems_data[:,m])
end
return rstr
end
Thereâs a great deal of stuff going on in DynamicPPLâs compiler.jl implementation of @model macro. Is there a simplified working example of such macro?
First, this sort of abstract type is always a set (Juliaâs type system is a lattice), so the Set in the name isnât necessary. It means that Problem3_1 isa AbstractProblemSet is true, and âset of one elementâ doesnât equal âone elementâ, or Peano numbering wouldnât work.
So you might try organizing the abstract types like so:
abstract type AbstractProblem end
abstract type ProblemSet3 <: AbstractProblem end
This does mean that ProblemSet3 isa AbstractProblem, which is debatable. You could use AbstractProblem3 if you wanted, but a name with Set3 in it is fine, since it helps you organize the problem domain.
Second, your concrete structs are essentially functors, so I would structure that code this way:
function problem(problem_type::Type{Problem3_1})
swimming_pool_size_liters = rand(1000:10:2000)
pipe_inflow_liters_sec = rand(20:30)
pipe_outflow_liters_sec = rand(5:10)
p = problem_type(swimming_pool_size_liters, pipe_inflow_liters_sec,
pipe_outflow_liters_sec, -1)
return p()
end
function (p::Problem3_1)()
p.time_to_fill_min = p.swimming_pool_size_liters/(p.pipe_inflow_liters_sec-p.pipe_outflow_liters_sec)/60
return p
end
Which I would find expresses the intention more clearly. You might feel differently, neither of these is clearly better than the other.
The call p() modifies the caller itself, which seems somewhat weird. In Julia, functions modifying their inputs are marked by the exclamation, which canât be done in this case.
Thatâs a reasonable comment, although self-modifying a callable struct isnât unheard of. I was considering a whole tangent about whether you want a mutable struct in the first place. You could do something like this:
struct Problem3_1 <: Set3
swimming_pool_size_liters::Int
pipe_inflow_liters_sec::Int
pipe_outflow_liters_sec::Int
time_to_fill_min::Float64
end
Problem3_1(a,b,c) = Problem3_1(a,b,c,a/(b-c))
function problem(problem::Type{Problem3_1})
swimming_pool_size_liters = rand(1000:10:2000)
pipe_inflow_liters_sec = rand(20:30)
pipe_outflow_liters_sec = rand(5:10)
return p(swimming_pool_size_liters, pipe_inflow_liters_sec, pipe_outflow_liters_sec)
end
This has some advantages, I wrote the problem solution in shorthand, which is of course not necessary.
Yes, that has the advantage of not needing to supply the placeholder values when constructing the type. On the other hand, that brings back some of the copying and pasting of variable names, because the function to solve the problem ought to have the meaningful variable names for the purpose of documentation.
The lack of the variable names was just in the interest of brevity, I can see where youâd want them included in the actual code.
Although itâs worth noting that because the structs have the field names, the code to retrieve them doesnât require that the constructor use them. But yes, for clarity you might want the constructor to look like this:
function Problem3_1(
swimming_pool_size_liters,
pipe_inflow_liters_sec,
pipe_outflow_liters_sec)
time_to_fill_min = p.swimming_pool_size_liters/(p.pipe_inflow_liters_sec-p.pipe_outflow_liters_sec)/60
return Problem3_1(swimming_pool_size_liters,
pipe_inflow_liters_sec,
pipe_outflow_liters_sec,
time_to_fill_min)
end
I wouldnât do it that way myself, but for pedagogy I understand the case for it.
Sorry, did just take the syntax of @model as inspiration, e.g., in using ~ for random sampling. The expansion of your macro could be much simpler.
When I find some time, I will try to put together an example.
using Random
abstract type AbstractProblem end
# Generic functions
function column_names end
function problem_data end
function problem_solution end
# Note: API as in solution by abraemer
# Define sample problem by hand
struct Problem4_1 <: AbstractProblem end
column_names(::Problem4_1) = [
:pr4_1_swimming_pool_size_liters,
:pr4_1_pipe_inflow_liters_sec,
:pr4_1_pipe_outflow_liters_sec,
:pr4_1_time_to_fill_min]
function problem_data(::Problem4_1, rng::AbstractRNG)
swimming_pool_size_liters = rand(rng, 1000:10:2000)
pipe_inflow_liters_sec = rand(rng, 20:30)
pipe_outflow_liters_sec = rand(rng, 5:10)
#
time_to_fill_min = problem_solution(Problem4_1(), swimming_pool_size_liters, pipe_inflow_liters_sec, pipe_outflow_liters_sec)
#
return (swimming_pool_size_liters,pipe_inflow_liters_sec,pipe_outflow_liters_sec,
time_to_fill_min)
end
function problem_solution(::Problem4_1, swimming_pool_size_liters, pipe_inflow_liters_sec, pipe_outflow_liters_sec)
return swimming_pool_size_liters/(pipe_inflow_liters_sec-pipe_outflow_liters_sec)/60
end
# Macro to define such problems
skiplinenums(exprs) = filter(e -> !(e isa LineNumberNode), exprs)
function parse_body(body)
defs = []
sol = nothing
for expr in skiplinenums(body.args)
if expr.head == :call && expr.args[1] == :(~)
push!(defs, expr.args[2] => expr.args[3])
elseif expr.head == :macrocall && expr.args[1] == Symbol("@solution") && isnothing(sol)
sol = expr.args[end]
@assert sol.head == :(=)
else
error("TODO: Better error message/handling!")
end
end
defs, sol
end
macro problem(name, body)
@assert body.head == :block "Syntax error: Expecting block of definitions!"
defs, sol = parse_body(body)
colnames = [var for (var, val) in defs]
quote
begin
struct $(esc(name)) <: AbstractProblem end
function $(esc(:column_names))(::$(esc(name)))
[$([:(Symbol($(string(c)))) for c in colnames]...)]
end
function $(esc(:problem_solution))(::$(esc(name)), $(esc.(first.(defs))...))
$(esc(sol.args[2]))
end
function $(esc(:problem_data))(problem::$(esc(name)), rng::AbstractRNG)
$([:($(esc(var)) = rand(rng, $(esc(val)))) for (var, val) in defs]...)
$(esc(sol.args[1])) = $(esc(:problem_solution))(problem, $(esc.(colnames)...))
($(esc.(colnames)...), $(esc(sol.args[1])))
end
end
end
end
macro problemset(name, body)
@assert body.head == :block "Syntax error: Expecting block of definitions!"
prob_names = []
for prob in skiplinenums(body.args)
@assert (prob.head == :macrocall && prob.args[1] == Symbol("@problem")) "Only problems allowed in problemset!"
push!(prob_names, skiplinenums(prob.args)[2])
end
quote
$(esc(body))
$(esc(name)) = [$([:($(esc(prob))()) for prob in prob_names]...)]
end
end
# Check with @macroexpand that this basically generates the same code as above for Problem4_1
@problem Problem4_2 begin
swimming_pool_size_liters ~ 1000:10:2000
pipe_inflow_liters_sec ~ 20:30
pipe_outflow_liters_sec ~ 5:10
@solution time_to_fill_min = swimming_pool_size_liters/(pipe_inflow_liters_sec-pipe_outflow_liters_sec)/60
end
Random.seed!(123)
@show column_names(Problem4_2())
@show problem_data(Problem4_2(), Random.default_rng())
@problemset MyProblems begin
@problem Problem4_3 begin
swimming_pool_size_liters ~ 1000:10:2000
pipe_inflow_liters_sec ~ 20:30
pipe_outflow_liters_sec ~ 5:10
@solution time_to_fill_min = swimming_pool_size_liters/(pipe_inflow_liters_sec-pipe_outflow_liters_sec)/60
end
@problem Simple4 begin
x ~ 1:3
y ~ 2:5
@solution xy = x + y
end
end
Random.seed!(123)
@show column_names.(MyProblems)
@show problem_data.(MyProblems, Ref(Random.default_rng()))
Note that the syntax is quite strict and error handling is somewhat rough.
I have decided that macro-based solution saves the greatest amount of drudge work, and so created an implementation that combines the ideas from your prototype and DynamicPPLâs compiler. If you find the time to contribute criticisms or suggestions, that will be most appreciated.