Are there any obvious problems/code smells with this GPU code?

I am working on a small package to implement the Sinkhorn algorithm for optimal transport.

Since this is frequently used as a loss function for generative models, my goal is to have an implementation that is generic enough to work on GPU or CPU, and to work with normal arrays as well as autodiff code (currently Flux TrackedArray).

All of the relevant code is here; it’s a pretty simple algorithm to implement. I am wondering, though, if there’s anything obviously dubious or non-idiomatic in the way I’ve written it.

(Some of the allocations may appear unnecessary, but in fact to work with TrackedArrays I am pretty sure it does have to be written this way, rather than mutating an array and avoiding the allocations.)

I was just looking for something like this!

A few thoughts:

  1. Citations of algorithm? The log-domain version looks like this implementation: https://arxiv.org/pdf/1705.09634.pdf
  2. hist_dim = size(K, 1)
    v = copy(a)
    v .= 1.0
    

hist_dim is no longer used, and you can rewrite v = copy(a) to v = similar(a) to avoid the copying.

Other than that it’s pretty straightforward. I’d be interested in adding some convenience wrappers for use with Distances and maybe some alternative implementations for Array types (i.e. on the CPU)

1 Like

Thanks! I took everything from the excellent “Computational Optimal Transport” https://arxiv.org/abs/1803.00567 but you’re right that I should dig up the original papers and cite them.

Thanks for the recommendation to use “similar”. A CPU implementation could have many fewer allocations (and perhaps use the logsumexp trick) so perhaps I should specialize for that. It would be nice to keep it generic though.