Language impediments to inference of real-time streaming models

marty0801 · May 19, 2020, 1:06am

I’m debating whether to attempt a Julia solution to a particular “streaming” inference problem in which a global data model is constantly being updated with new data. I’m new to the language, but there seems to be language-related complications; Julia seems to favor statically-defined, rather than dynamically-generated, functions. I am aware Julia’s metaprogramming facilities, but would rather not delve into that if I can avoid it.

Here’s the problem. I want to perform inference on a global model (a parametric likelihood function) of real-time streaming data which is being modified in real time as follows. Incoming data is clustered; each cluster defines a parametric likelihood function of its constituent data; the global likelihood function is updated by multiplication by the likelihoods of the new data clusters. For example: Initial global model f_0(x|\theta); new-data likelihood g(y|\theta); updated likelihood is f(x,y|\theta)=f_0(x|\theta)\cdot g(y|\theta).

Questions:

How to do inference on the current model? This seems problematic since the global likelihood function cannot be recompiled in the same Julia session. (True?) How then do I take code that performed inference against f_0 and point it at f(x,y)? What is the Julian approach? Use Revise? Rename the global model whenever I update with new data? My concern there is having potentially thousands of global models sitting around in memory indefinitely.
Should I be worried about the latency of compiling the updated model? In the example above, f_0(x|\theta) may be very complex; g is relatively simple by comparison. When Julia compiles the new global model, f(x,y), does it efficiently reuse the previously compiled f_0, or is the latter recompiled along with g?
Does anyone know of an existing Julia project doing something like this already?

Thanks!

Tamas_Papp · May 19, 2020, 5:53am

I would use an approach with higher order functions, chaining together the likelihood incrementally.

mohamed82008 · May 19, 2020, 6:03am

You can define X as [x, y1, y2, y3] and define f(X) = f0(X[1]) * prod(g, X[2:end]). If you have more data, just push! it to X and call f again on X.

Edit: although if you are multiplying many such terms, I would work in terms of the log and add instead to avoid underflow.

Edit 2: to avoid re-computing terms, you can make f a callable struct with a cache field and save the results of f0(X[1]) and every computed value in the cache field in f. Alternatively, there is Memoization.jl to memoize the functions automatically.

trappmartin · May 19, 2020, 6:36am

I did several projects related to BNP and dynamic compositional likelihoods. In my cases those compositions usually get quite complicated as they are also nested and, therefore, I always used an approach similar to what @mohamed82008 suggested for you in (2). But there are many approaches in Julia to do this and I in my experience Julia is one of the more suited language for this.

marty0801 · May 19, 2020, 11:22am

Thank you! Have you, by chance, used this approach with automatic-differentiation packages, in particular ReverseDiff? Do they play nicely together?

marty0801 · May 19, 2020, 11:28am

Thanks very much. Can you describe or point me to examples of the other approaches you alluded to?

mohamed82008 · May 19, 2020, 1:42pm

ReverseDiff should be fine yes. If it is not, please open an issue with a minimal working example.

Topic		Replies	Views
Compiler-like analysis in Julia General Usage question , metaprogramming	17	1459	December 31, 2018
Probabilistic programming with source transformations Statistics announcement	66	7383	July 2, 2018
Another post on package compilation time Performance compilation , package-compiler	14	2490	June 29, 2020
JuliaCon 2020 Birds of a Feather Probabilistic Programming	24	3270	August 28, 2020
Extended Slack PPL discussion Probabilistic Programming	5	1719	March 28, 2020

Language impediments to inference of real-time streaming models

Related topics