LBNL and CMU appear to have a project to use CMU’s Spiral framework to generate efficient FFT libraries for CPU’s, GPU’s, etc. It’s a more recent project than FFTW, and unlike FFTW, the libraries are BSD licensed.
Does anyone in the Julia community have experience with FFTX specifically or the Spiral framework in general? On paper, it would seem like combining the kernel generation of Spiral within Julia could be a good way to develop efficient exascale codes…