I think that your solution is the only solution at the moment.
I also think that the overhead would be small. In the union, you essentially create a shallow copy of IdDict and that should be pretty fast, in comparison of the price of the gradient.
You can check it out by yourself. Do few iterations where you will just take gradient with respect to parameters (no union) and then of your solutions. The preformance diff will be small.