Differentiating Jacobian-vector product for sliced score matching?

I think you will get more of a speedup by not computing the entire Jacobian though compared to anything you will get from using different modes of AD. A full dense Jacobian is expensive to compute no matter how you choose to do so. RD and FD should be asymptotically comparable when computing a full n x n Jacobian but FD will probably be faster for small n.

2 Likes