I’m debating whether to attempt a Julia solution to a particular “streaming” inference problem in which a global data model is constantly being updated with new data. I’m new to the language, but there seems to be language-related complications; Julia seems to favor statically-defined, rather than dynamically-generated, functions. I am aware Julia’s metaprogramming facilities, but would rather not delve into that if I can avoid it.
Here’s the problem. I want to perform inference on a global model (a parametric likelihood function) of real-time streaming data which is being modified in real time as follows. Incoming data is clustered; each cluster defines a parametric likelihood function of its constituent data; the global likelihood function is updated by multiplication by the likelihoods of the new data clusters. For example: Initial global model f_0(x|\theta); new-data likelihood g(y|\theta); updated likelihood is f(x,y|\theta)=f_0(x|\theta)\cdot g(y|\theta).
Questions:
- How to do inference on the current model? This seems problematic since the global likelihood function cannot be recompiled in the same Julia session. (True?) How then do I take code that performed inference against f_0 and point it at f(x,y)? What is the Julian approach? Use Revise? Rename the global model whenever I update with new data? My concern there is having potentially thousands of global models sitting around in memory indefinitely.
- Should I be worried about the latency of compiling the updated model? In the example above, f_0(x|\theta) may be very complex; g is relatively simple by comparison. When Julia compiles the new global model, f(x,y), does it efficiently reuse the previously compiled f_0, or is the latter recompiled along with g?
- Does anyone know of an existing Julia project doing something like this already?
Thanks!