Help me structure my code and extend it to a working example: Using DiffEqFlux to interpolate GDP data and see if there is a singularity

Hello,

i tried to solve my own toy problem for DiffEqFlux since i found the example that comes with DiffEqFlux confusing.
The source is available here, frankly it is in a messy shape right now: GitHub - freemin7/GDPPr-d: Turning a piece of ugly, messy code in to something beautilful with the help of the community

It was my first project where i used Atom + Juno to develop and it was a good experience.
Over time i tried different things, stored some results and used my own short hands. The feature that i can just execute what i need was nice for development but the resulting code became nontransparent.

How can i structure this so it becomes a presentable piece of code?

I have one time series (but i want it to scale to multiple).
Should i put it in an extra file, maybe in a less Julia specific format? CSV?

I have multiple models (which take a different number of parameters).
Each of the model leads to a different ODEProblem.
I maybe want to vary the integrator.
I maybe want to vary optimizers.
I want to demonstrate the importance of the right loss function.
How can i show examples of this option space?
How can i keep it lean to change single examples (with re-rendering everything)?

How can show how the solutions get better with time?
My models are inherently numerically unstable and have finite escape time with wrong parameter.(Which is why my Adam is created with such a small number.)
How can i handle this gracefully?
How can show that it is something you have to keep in mind?

From optimizing my parameters i get a sufficiently nice solution.
How can i store results (parameters and solution) for further use?
How can i keep interactivity for exploration in Juno?

Partial answers are welcome. I am looking for ideas how to structure it and make the parts talk to each other.

To give a more general idea what i was up to:
I took GDP data from http://api.worldbank.org/v2/en/indicator/NY.GDP.MKTP.KD?downloadformat=excel
I developed a family of models based on this form:

dcGDP[1] = α1 * ((cGDP[1]))^β1

I took the 1960 GDP as initial condition, solved the equation, minimized a loss function between the solution at each integer and my GDP data. It turns out the L2 Scalar product is really bad here, my custom one (L2 of GDP[i]/solution[i] ) fitted the curve better.

After i had an solution i saved the parameters at the end of file. I also plotted GDP/Sol to line my prediction errors up with mayor economic crises. This is not automatically generated yet.

Unexplained error with only \alpha and \beta


α1 = 490.83321111718465 β1 = 0.692494816182829

Blue: GDP data (inflation adjusted)
Orange: dcGDP[1] = α1 * ((cGDP[1]))^β1
Pink: dcGDP[1] = α1 * ((cGDP[1]))^(β1*(1+δ*(t) + δ2*(t^2)))

Interpretation:
The model does not account for economic bubbles and crisis and crisis are well detectable in prediction error. Every downturn in prediction error has an associated crisis.
Before and after 1973 had very different growth.
Different runs (with different models) suggested the best β1 is falling over time.
β1 \approx 0.7 => no singularity in GDP growth. Would be the case if β1 > 1. β1 < 1 => growth is in O(x^n) for some finite n.
(checking total amount of computational power(or a proxy for that) for singularity in the same way would be interesting but i could not find a data set)

My hope is that with your suggestion this could become an complete example for a “data science” workflow. It would be amazing if could both produce a static document and be a play ground to experiment on and get your feet wet with DiffEqFlux and cohorts. (Having two different files for that would work too)

I think you want to generate some kind of work vs precision plot. Essentially show what the fitting time vs final loss was in a plot for various strategies, since something being fast doesn’t mean anything if it didn’t fit well. Then just a bunch of different color lines.

Other than that, :man_shrugging: I’m still trying to find out the best way to convey information about this as well, since there’s many different dimensions of success/failure in this area.

I reran the fitting with the two different loss functions (i wanted to post pictures showing the badness of L2 here). The difference is not as pathological is remembered. It seems like i was overexcited with my custom loss function. :slightly_frowning_face: