Ahh interesting, didn’t know it only did the one way there. That must be why it currently has a feel of “I think it mostly works but a few things seem to get hit”. Is there a reason Enzyme cannot just mixed mode that? ReverseDiff first did that, where broadcast has a diagonal sparsity pattern so you can always switch that step to forward without a cost (and that can sometimes reduce cost)
1 Like
I mean it’s probably useful to mixed mode broadcast regardless.
However, what that PR does is generically say that autodiff of @cuda is @cuda of autodiff (which is presently needed by broadcasting among other things).
It’s definitely useful to also consider what higher level utilities we want to add – but that is useful as a baseline so generic code doesn’t need to pipe in an inner autodiff call inside all @cuda’s.
That’s why I’ll argue that the linked PR makes broadcast work in reverse mode (among other things), but will potentially get further improved with additional broadcast-specific tuning.
1 Like