Simulated Method of Moments / Indirect inference: how might ModelingToolkit be extended to provide convenient user inteface?

I am looking to estimate a simulated method of moments model. The most similar thread on SMM is from 2019 but does not settle issues: The most similar thread is: Global optimization: Simulated Method of Moments . Julia seems like the perfect language for running SMM. It would be great to find/develop a package that could provide more readable and shorter code.

The ModelingToolkit package has some really nice syntax (defining parameters and variables, creating arrays of equations that can be operated on), but does not appear to be geared towards these sorts of problems (maybe it can be trivially extended but I don’t know how). Below I will develop a pseudo-code example of the sort of features that would be nice. If anyone knows of existing code that does this sort of thing or would like to provide recommendations on developing it then I would be thrilled.

Psuedo code:

First set up the model. For a simple example to illustrate ideas, I am setting up a latent variable ar1 model, where the latent variable causes an outcome. x follows an ar1 process with fixed effect terms. x is observed with error: what is observed is \tilde{x}. y is a function of the latent variable x and and observed variable w.

N = 500
T = 4
@parameters β μ σε σϵ σα σξ a b c
@observations i t
@variables y x ̃x w ε ϵ α ξ

eqns = [xₜ₊₁ ~ αᵢ + β*xᵢₜ + εᵢₜ ∀ t ∈ [2:T] i∈N
        ̃xₜ ~ xₜ + ϵₓₜ t ∈ [1:T]
        yᵢₜ ~ a + b*xᵢₜ + c*wᵢₜ + ξᵢₜ
        εᵢₜ ~ Normal(0, σε) ∀ i t
        ϵᵢₜ ~ Normal(0, σϵ) ∀ i t
        αᵢ ~ Normal(μ, σα) ∀ i
        ξᵢₜ ~ Normal(0, σξ) ∀ i t]


In the imagined syntax above, @parameters and @variables would work in a way similar ModelingToolkit. @observations would define the units of observation: in this case we have i and t, so the constructed data would be a panel (curently DataFrames does not support convenient ways to represent/manipulate panel data… that is something else that would be great to add but is beside the point for this thread).

The last four equations would define how random variables are distributed.

Note that y depends on w, but the process generating w is not defined. This is an incompletely specified model, but that is okay if w is given.

We can simulate this data from this process for given parameters values and values of w. A function method might look like:

simulate(empirical_data::DataFrame, eqns::Array, parms::NamedTuple, observations::Array{observations,1})::DataFrame = …

This method produces a dataframe with all simulated variables. We feed it a datafame which must have w as one of its variables because w is not specified. If we were to feed it a dataframe containing \varepsilon then we would treat \varepsilon as given. Also there must be columns in the dataframe corresponding to the units of observation t and i.

Simulation is crucial for SMM. SMM attempts to estimate parameters from the model by selecting parameter values that minimize a metric on moments/estimated coefficients calculated from both the actual data and the synthesized data. That is, we have good parameters in the the model if statistics calculated on the synthetic data look like the statistics calculated on the actual data.

So we would want to define an array of moments (or auxiliary model coefficients):

moments = [mean(yᵢ) 
           mean(xᵢ)
           var(xᵢₜ)
           cov(xᵢₜ₊₁, xᵢₜ) 
           ...]

and we would fit the model with a command like


fitted_model = optim(df::Data, moments, W, method=:Wald, optimizer=:MCMC)

An approach like this would make it very simple to code up structural models for SMM estimation, and would make the code very readable, reducing risks of mistakes.

Thoughts?

ModelingToolkit is all about transforming numerical systems to other numerical systems. SMM is on my radar for yesterday, so let’s talk about how to get there. Let’s work together.

If you were interested in the continuous time case, there is a similar expansions. For example,

Specifically, https://github.com/SciML/ModelingToolkit.jl/issues/405 mentions using this to do the linear noise approximation (LNA), which is just the case of taking only the mean and variance equations, but the plan has since evolved to do for all n.

But that’s the continuous case. What about the discrete case? Well that’s documented in the issue here:

https://github.com/SciML/ModelingToolkit.jl/issues/571

which is a really high priority right now. What I want to do is the following. First I want to create a DiscreteSystem for systems of the type:

u_{n+1} = f(u_n,p,t)

(like the ODE DiscreteProblem). That would be the canonical symbolic representation of discrete dynamical systems. On top of that it would then be good to add the discrete stochastic system DiscreteStochasticSystem:

u_{n+1} = f(u_n,p,t) + g(u_n,p,t)deta

where eta ~ N(0,1), which is the canonical nonlinear discrete stochastic dynamical system.

Once those system types are in place (with the appropriate transformations to the numerical schemes to simulate), then a transformation of DiscreteStochasticSystemDiscreteSystem of form moment_expansion(sys,5) which symbolically derives the discrete deterministic system of the first 5 moments, which you then generate code for and solve.

So code would look more like:

N = 500
T = 4
@parameters β μ σε σϵ σα σξ a b c
@observations i t
@variables y x ̃x w ε ϵ α ξ
D = Difference(1) # First Difference Operator

eqns = [D(x) ~ αᵢ + β*x
              ̃x ~ xₜ + ϵ
             y ~ a + b*x + c*w]

noiseeqs = [
        σε
        σϵ
        σα
        σξ
]

sys = DiscreteStochasticSystem(eqs,noiseeqs)

would then have a bunch of pre-defined transformations. The reason to split it is because that would be easier to work with and write transformations because that assumption of linearity imparts a lot of structure in the equations.

A separate form could be created for the more general system:

@parameters β μ σε σϵ σα σξ a b c
@observations i t
@variables y x ̃x w ε ϵ α ξ

eqns = [D(x) ~ α + β*x + ε
               ̃x ~ x + ϵ
               y ~ a + b*x + c*w + ξ
]

distributions = [εᵢₜ => Normal(0, σε)
                        αᵢ => Normal(μ, σα)
                        ξᵢₜ => Normal(0, σξ)]

sys = DiscreteRandomSystem(eqns,distributions)

Now that form is very difficult to actually do math on (because technically you can be doing nonlinear transformations of the random quantities :scream_cat:). But the way you’d use it for your end is:

stoch_sys = convert(DiscreteStochasticSystem,sys) # errors if it can't
discrete_sys = moment_expansion(sys,5) # Gives the deterministic system for the first 5 moments

ICs = [
   x => ...
   y => ...
   moment(x,1) => ...
   ...
]

ps = [
   α => [1.0,3.0,...]
   ...
]

prob = DiscreteProblem(discrete_sys, ICs, ps)
sol = solve(prob)
plot(sol,vars=moment(x,2))

That’s at least what I’m planning on doing, and it should cover your case, @sdwfrost 's case, and I think @jlperla might like it.

@YingboMa @shashi looping you in on the whole design here. Let me know what you think.

Yeah, to me the biggest thing is the simple solution to the discrete system with an observational equation. We have the structure and writeup on that, so happy to accelerate a move into a proper sciml package.

The hermite stuff I don’t know anything about, but I think all of this stuff can be layered on. But the basics need to be there, and it needs to have observational equations for many practical cases.

1 Like

Yeah… I need to get on that :laughing:. That’s what I’m planning to be the numerical target for the symbolic interface of the discrete stochastic system here.

The Hermite stuff is just how you do the same thing to a stochastic differential equation since Hermite expensions are a pretty natural way to expand a Brownian motion.

1 Like

Perhaps this package SMM.jl might also be worth a look. I used an earlier version and it was great then and has been developed since.

I am interested. How can I contribute?

The discrete case is more important for most of my applications.

A separate form could be created for the more general system:

I strongly prefer this syntax because it more closely resembles how the structural model would be presented in a paper (so this syntax would increase readability and reduce the risk of semantic errors). As you point out, this form contains all the information necessary to transform it into the prior syntactical form.

A question: I don’t have a strong sense for how you are proposing that the program would operate under the hood. As it usually implemented, SMM runs simulations from a process and computes the empirical moments from the simulated data. But it looks like you would be numerically calculating the moments without running simulations? I think this would be closer to GMM, except it would apply even when the moments can’t be derived analytically.

Running simulations is pretty straightforward when the variables in the model follow a Digraph, but would not be when there is simultaneity (as is common in economic models, where two variables cause each other without either being temporally prior to the others, such as in supply and demand or in strategic interactions).

A comment: one of the primary purposes of structural modelling is to run counterfactuals. Given the structure of the model and the estimated parameters, how would other variables change if we were to set a particular variable to a particular value (counterfactuals can be diverse: set x to x0 for all individuals, set x to x_i + x0, set x to f(w_i,z_i), etc). Syntax should support these counterfactuals.

Another comment: Structural models are often not fully specified. In particular, the process generating some variables is not specified at all, so moments that are conditional on these variables cannot be computed except as a function of these variables. It would be useful if the values of these variables could be conveniently provided, such as through a dataframe, where the variable name would be the same in the dataframe as in the equation. Something like this would be necessary for computing moments that can be compared with the empirical moments.

You can derive programs that directly compute moments without having to sample using a moment expansion.

One comment is that MSM means that the model (the data generating process, dgp) is simulable, which means that we know a lot about the model. For this reason, reasoning about the model can lead us to moments that are informative. For example, the estimated parameters of a well-chosen auxiliary model, as in indirect inference. Use of only generic moments like means, correlations, etc., is not taking into account knowledge of the structure of the dgp, and is likely to lead to inefficient estimates.

For this reason, I’m not so sure that trying to describe models or moments with a really closely defined structure is going to lead to good results. Every model to be estimated needs thought and experimentation to find moments that allow it to be estimated well.

I have been working on things related to MSM for the last few years. An example of MSM estimation of a discrete time model is Econometrics/EstimateSV.jl at main · mcreel/Econometrics · GitHub This is discussed on page 924 (Section 22.4) of the document Econometrics/econometrics.pdf at main · mcreel/Econometrics · GitHub

The approach discussed there advocates chosen moments based upon thinking carefully about the model, and using experimentation. One way to supplement art and introspection with more modern methods is to feed a lot of trial moments into a neural net, and let the net figure out which ones, or what combinations of them, are useful. That’s also discussed in the econometrics.pdf document I mentioned above, and the https://github.com/mcreel/SimulatedNeuralMoments.jl package implements it. So, I would advocate that a package should accept whatever moments are returned by a function, which could compute complicated and “artistic” moments based on knowledge of the dgp, and could also include very inobvious moments, such as those coming out of a trained neural net. Working only with generic moments that don’t rely on knowledge of the dgp would be giving up the tremendous amount of information already embodied in the assumptions that are needed to be able to simulate the model. MSM makes assumptions as strong as those made by maximum likelihood estimators, and we should try to use this information to get MSM estimators that are close to fully efficient, for this reason.

2 Likes

The whole purpose of a symbolic system is to do more model-specific experimentation than you can do by hand.

In the situations where MSM is commonly used, the model itself is often simple and could probably in many cases be conveniently described with a symbolic system. The difficulty is that the data is observed in some way that makes analytic computation of likelihood functions or moments very difficult or impossible. For example, a continuous time SDE that is observed at discrete intervals. As far as I know, symbolic methods can’t capture the expectation of y(t) conditional on y(t-1) for general problems of this sort (that was one of the motivating examples in Gouriéroux et al. https://onlinelibrary.wiley.com/doi/abs/10.1002/jae.3950080507). However, a the parameters of a simple AR model run on the observed data can be informative for the mean reversion parameter of the SDE, and can thus serve as moments. Other such statistics can be used for other parameters of the model.

If there is a way to represent informative statistics that may come from auxiliary models that may require nonlinear optimization, then great! If a framework ruled out this sort of statistic, then I would be concerned that it could not deliver estimates that obtain nearly full asymptotic efficiency.

1 Like

I don’t think anyone was suggesting analytic computations.

Of course. But it would still benefit from a symbolic system even without the analytic solution. That’s the whole point.

Sorry, I didn’t mean only analytic solutions. Approximation methods such as quadrature, etc., are also often unavailable due to curse of dimensionality considerations, as was the case in the original work by McFadden.

What I don’t see is how a symbolic system to describe the model can help to discover informative statistics, when the data is generated from the model in a way such that the density or moments of the data are not available. It may be that simple, not very informative moments can fit into a symbolic system, but more complicated, tailor made moments seem unlikely, to me, to be workable with such a system. So, my question is how a symbolic system could help.

Standard quadrature methods, yes, but there are symbolic-numeric techniques for deriving problem-specific sparse quadratures.

https://dl.acm.org/doi/10.1145/1073884.1073898

The whole point is to derive isometries and build problem-specific quadratures based on the symmetries. If your argument is that standard methods don’t work, that :man_shrugging: okay I agree but that’s why no one suggested doing that.

If you have numerical methods solving equations, symbolic systems can help improve it. Automatic parallelization of code, sparsity handling, tearing, etc. I’m not sure what the hold up is? Is parallelism not good? I’m confused by your argument.

Quite the opposite. The whole purpose is to help the derivation of tailor made moments and expansions in special ways no one would ever want to write out by hand. In the same way, would you ever want to tear nonlinear systems by hand?

I don’t think you’re quite understanding the symbolic-numeric approach and instead are thinking about some pure symbolic world. This isn’t SymPy. It’s more akin to generalized automatic differentiation to problems other than differentiation.

No, I definitely do not understand it :grinning: but I will be eager to give any new developments a try.

I’ll table this for now and put together a more clear video on it for JuliaCon.

You can derive programs that directly compute moments

As you set up the problem, the shocks are all normal. Would your approach also apply to other distributiosn, like Type-1 extreme value?

One of the primary ways that labor economist use SMM is for dynamic discrete choice models, where agents choose the greater of several quantities, and the size of each of these quantities includes a T1EV error.

I’ll need to look at if there’s a literature on non-normal shocks (in the continuous time it would be called “colored noise”, like pink noise) for how this is done in that case. My guess is that you expand it out almost like a Taylor series except in frequency space and then a very similar procedure follows. Which expansion to use is what would need to be pulled from the literature though.