Hi there,
I am looking for ways to learn probablistic models on timeseries data in an incremental online fashion, processing data point after data point continuously, as they arrive.
Hence my question whether there are online algorithms to train a Turing.jl model which can do this incremental learning efficiently and will only need constant Memory (i.e. they don’t need to store all the data).
Any help is highly appreciated. If some other packages next to Turing.jl would be good, I am happy to know, it is just that Turing.jl seems to be the goto package for bayesian probablistic modelling.
Hello! You must be searching for what people are referring to as “Recursive Bayes”. This will also lead you to this discussion. It’s definitely worth also mentioning the work from the github organization biaslab RxInfer.jl, which I think it’s not in the first discussion.
WIth traditional MCMC you will have to resample all your model data with every new measurement you receive. All different techniques I think them more or less as an efficient approximation of that.
If you are interested in a discount factor, i.e. the parameters of interests are evolving with time, you definitely need to move away from traditional MCMC. Sequential Monte Carlo (SMC), for example, targets this. (there is this tool SMC.jl although I haven’t tried that out)
With respect to time series, people have been using a lot Kalman filters. An alternative, if you find that restricting, are Gaussian processes which is a very powerful tool. Both are considered Bayesian and provide nice uncertainty estimations. However the modeling decisions are fixed to (MV-)normal, but that shouldn’t scare you for the second case too much because they manage to get very flexible.
1 Like
Thank you very very much for your help.
I read through the older discussion. No perfect fit it seems to me: Gen.jl seem to support SMC approaches, but may be hard to tune and setup. SMC.jl itself seems not that widespread or userfriendly like the Turing.jl ecosystem.
I am really looking for some approach which I can recommend others because it is well tested, has a larger user base and is very stable.
With Turing I can use Variational Inference instead of MCMC. Shouldn’t that solve all the performance problems, as it does a (localized) gradient descent instead of using samples?
RxInfer.jl actually mentions directly on their landing page that they support streaming datasets - one of their key features. That sounds very promising indeed! Thank you for the pointer.
EDIT: This seems to be the example notebook to demonstrate the streaming Infinite Data Stream · RxInfer.jl