For some stateful optimizers, like ADAM, it has an
IdDict field to store
momentums of parameters. When deserializing the optimizer with BSON, the
id info is lost in the newly created
IdDict. This makes it hard to truly recover from some checkpoints.
(I think in TF, each variable has a name so things may be easier?)