Yea, I am in the situation where I want to solve many cheap OT problems a lot of time 
Merde. More specifically, are you by chance planning to run KNN-like constructions on a metric space, where each experiment corresponds to one point-cloud of samples, and the metric (between experiments) is EMD, such that you need, in worst worst-case, quadratically many EMD solutions? That’s my guess because of the Machine Learning Tag (learn distributions, not medium-dimensional points) combined with “many cheap” problems.
I have met people running such computations; they were very, very unhappy with their compute needs (context was some medical data that came in form of a (sampled) distribution for each patient; so you have a partially labeled distribution-of-distributions).
I am pretty sure that there is a lot one could do in these settings; but this is algorithmic research rather than coding or googling for packages. It’s a fun problem, though, that I spent some off-time working on.
Edit: If you end up writing a paper-thin wrapper using PyCall around POT, it would be much appreciated if you could make a github repo for it. That way you can also swap out the POT for another solver if a faster one can be found.