I should probably mention that this isn’t fully at double float precision. By making some reasonable assumptions about what the errors and polynomial sizes will be, you get more than native precision (my goal) while being about 2x faster than a fully correctly double float accurate implementation.