[ANN] RustFFT.jl: Compute forward and inverse FFTs with RustFFT

Taaitaaiger · May 29, 2023, 6:52pm

RustFFT is a high-performance, SIMD-accelerated FFT library written in pure Rust. It can compute FFTs of any size, including prime-number sizes, in O(nlogn) time. You can now use it from Julia!

Usage

Forward FFT:

using RustFFT

planner64 = RustFFT.FftPlanner64()
instance = RustFFT.plan_fft_forward(planner64, UInt(1))
data = complex([1.0])
RustFFT.fft!(instance, data)
@assert data[1] ≈ 1.0

Inverse FFT:

using RustFFT

planner64 = RustFFT.FftPlanner64()
instance = RustFFT.plan_fft_inverse(planner64, UInt(1))
data = complex([1.0])
RustFFT.fft!(instance, data)
@assert data[1] ≈ 1.0

Note that RustFFT does not normalize outputs:

Callers must manually normalize the results by scaling each element by 1/len().sqrt(). Multiple normalization steps can be merged into one via pairwise multiplication, so when doing a forward FFT followed by an inverse callers can normalize once by scaling each element by 1/len()

A few other limitations apply. It’s currently not possible to choose the specific algorithm that will be used to compute the transform. It’s also not possible to compute the FFT of an array with a rank not equal to 1. The interface provided by AbstractFFTs is not used yet, either.

Documentation
GitHub

RoyiAvital · May 29, 2023, 7:10pm

It would be great if you added some performance comparison with other implementations in the eco system.

Taaitaaiger · May 29, 2023, 7:28pm

Good point! I’ve opened an issue for it

giordano · May 29, 2023, 7:30pm

julia> using BenchmarkTools, RustFFT, FFTW, LinearAlgebra

julia> v = randn(ComplexF64, 1<<16);

julia> rust_planner64 = RustFFT.FftPlanner64();

julia> rust_instance = RustFFT.plan_fft_forward(rust_planner64, UInt(length(v)));

julia> fftw_plan! = FFTW.plan_fft!(v);

julia> @benchmark RustFFT.fft!($(rust_instance), data) setup=(data=copy(v)) evals=1
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  337.667 μs … 569.458 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     385.834 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   384.350 μs ±  13.164 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

                                  ▁▆█▇▅▁
  ▂▃▃▃▂▂▂▂▂▂▂▂▃▃▃▂▂▂▂▂▂▁▂▂▄▇▇▇▆▆▅▆██████▆▅▄▄▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂ ▃
  338 μs           Histogram: frequency by time          423 μs <

 Memory estimate: 64 bytes, allocs estimate: 2.

julia> @benchmark mul!(data, fftw_plan!, data) setup=(data=copy(v)) evals=1
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  314.833 μs … 478.750 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     316.834 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   320.026 μs ±   8.955 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▂▇█▆▃▅▅▄▁▂▂▁      ▁▁▁▁                                        ▂
  ████████████████████████████▇█▇▇▇▇▇▇▆▇█▇▇▆▇▆▆▅▆▅▅▅▅▄▅▄▅▅▄▅▄▅▆ █
  315 μs        Histogram: log(frequency) by time        357 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

Seems to be in the same ballpark as FFTW, but RustFFT.fft! isn’t fully in-place, there are a couple of allocations.

Feedback: API could be simplified a bit, for example a method

RustFFT.plan_fft_forward(v::Vector{ComplexF32}) =
    RustFFT.plan_fft_forward(RustFFT.FftPlanner32(), UInt(length(v)))

RustFFT.plan_fft_forward(v::Vector{ComplexF64}) =
    RustFFT.plan_fft_forward(RustFFT.FftPlanner64(), UInt(length(v)))

or something like that.

Edit: for the record, my platform is

julia> versioninfo()
Julia Version 1.9.0
Commit 8e630552924 (2023-05-07 11:25 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin22.4.0)
  CPU: 8 × Apple M1
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, apple-m1)
  Threads: 1 on 4 virtual cores

DNF · May 29, 2023, 7:36pm

Some more ‘difficult’ lengths would be interesting.

giordano · May 29, 2023, 7:39pm

Just slightly off a power of 2:

julia> v = randn(ComplexF64, 1<<16 + 7);

julia> rust_planner64 = RustFFT.FftPlanner64();

julia> rust_instance = RustFFT.plan_fft_forward(rust_planner64, UInt(length(v)));

julia> fftw_plan! = FFTW.plan_fft!(v);

julia> @benchmark RustFFT.fft!($(rust_instance), data) setup=(data=copy(v)) evals=1
BenchmarkTools.Trial: 1487 samples with 1 evaluation.
 Range (min … max):  3.024 ms …  3.692 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     3.272 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.274 ms ± 62.298 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

                                   ▄▆█▆
  ▂▃▄▃▂▂▂▁▁▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▂▂▂▂▂▃▆█████▇▆▅▅▄▄▄▃▃▃▃▂▃▂▃▂▁▂▁▂ ▃
  3.02 ms        Histogram: frequency by time        3.42 ms <

 Memory estimate: 64 bytes, allocs estimate: 2.

julia> @benchmark mul!(data, fftw_plan!, data) setup=(data=copy(v)) evals=1
BenchmarkTools.Trial: 2228 samples with 1 evaluation.
 Range (min … max):  2.101 ms …  3.082 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.130 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.153 ms ± 56.421 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

    ▇█
  ▃▇██▆▄▃▄▄▂▃▃▃▃▃▃▃▃▃▃▃▄▅▆▇▅▄▃▃▃▃▃▃▃▃▂▃▂▃▂▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂ ▃
  2.1 ms         Histogram: frequency by time         2.3 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

Using a multiple of either (2,3,5,7), which is still an optimal choice for FFTW:

julia> v = randn(ComplexF64, nextprod((2,3,5,7), 1<<16+1));

julia> rust_planner64 = RustFFT.FftPlanner64();

julia> rust_instance = RustFFT.plan_fft_forward(rust_planner64, UInt(length(v)));

julia> fftw_plan! = FFTW.plan_fft!(v);

julia> @benchmark RustFFT.fft!($(rust_instance), data) setup=(data=copy(v)) evals=1
BenchmarkTools.Trial: 5304 samples with 1 evaluation.
 Range (min … max):  841.375 μs …  1.117 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     890.855 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   890.873 μs ± 15.784 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

                             ▄█▇█▆▃
  ▁▂▂▂▂▁▁▁▁▂▂▂▁▁▁▁▁▁▁▁▂▃▄▅▅▇████████▆▄▃▃▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  841 μs          Histogram: frequency by time          941 μs <

 Memory estimate: 64 bytes, allocs estimate: 2.

julia> @benchmark mul!(data, fftw_plan!, data) setup=(data=copy(v)) evals=1
BenchmarkTools.Trial: 8733 samples with 1 evaluation.
 Range (min … max):  467.458 μs … 893.875 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     468.583 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   475.325 μs ±  46.382 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▃                                                            ▁
  ███▇▆▆▅▄▅▃▄▄▄▄▃▁▁▁▁▃▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅▆▇ █
  467 μs        Histogram: log(frequency) by time        822 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

stevengj · May 29, 2023, 8:00pm

If you care about performance you should generally pass a planner flag to enable MEASURE or PATIENT mode. Also for 1d transforms FFTW may be faster for out-of-place with preallocated output.

giordano · May 29, 2023, 8:34pm

I had tried PATIENT for the 2^16 vector and didn’t find any significant difference.

Side note, I did my benchmarks on aarch64 Darwin, I don’t know how much RustFFT is optimised on this platform, or if we need to turn on special flags in our build on Yggdrasil.

stevengj · May 29, 2023, 9:34pm

Julia’s FFTW build for aarch64 is missing planner support due to a missing configure flag if I recall correctly so that’s why MEASURE made no difference.

giordano · May 29, 2023, 9:59pm

On x86_64

julia> versioninfo()
Julia Version 1.9.0
Commit 8e630552924 (2023-05-07 11:25 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, haswell)
  Threads: 1 on 8 virtual cores

Code:

using BenchmarkTools, RustFFT, FFTW, LinearAlgebra

function bench(N)
    v = randn(ComplexF64, N)
    rust_planner64 = RustFFT.FftPlanner64()
    rust_instance = RustFFT.plan_fft_forward(rust_planner64, UInt(length(v)))
    fftw_plan! = FFTW.plan_fft!(copy(v); flags=FFTW.PATIENT)
    println("RustFFT:")
    display(@benchmark RustFFT.fft!($(rust_instance), data) setup=(data=copy($(v))) evals=1)
    println("FFTW:")
    display(@benchmark mul!(data, $(fftw_plan!), data) setup=(data=copy($(v))) evals=1)
end

Benchmarks:

julia> bench(1<<16)
RustFFT:
BenchmarkTools.Trial: 7155 samples with 1 evaluation.
 Range (min … max):  540.918 μs …  1.741 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     569.428 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   579.482 μs ± 48.960 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

    ▄█▅▄▇▃▁                                                     
  ▂▄████████▆▆▅▄▃▃▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  541 μs          Histogram: frequency by time          801 μs <

 Memory estimate: 64 bytes, allocs estimate: 2.
FFTW:
BenchmarkTools.Trial: 7151 samples with 1 evaluation.
 Range (min … max):  506.168 μs …  1.013 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     573.886 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   583.459 μs ± 33.517 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

                ▄█▆▄▁                                           
  ▂▂▂▂▁▁▁▂▁▂▂▃▄▅█████▇▆▇▆▇█▆█▇▅▅▄▃▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂ ▃
  506 μs          Histogram: frequency by time          715 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> bench(1<<16 + 7)
RustFFT:
BenchmarkTools.Trial: 1458 samples with 1 evaluation.
 Range (min … max):  3.055 ms …   4.598 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     3.263 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.296 ms ± 153.412 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▁▂▁       ▃▄▆██▆▅▄▄▃▂▂▁   ▁                                 ▁
  ████▆▇▇▅▅████████████████▇████▇▅▅▅▄▄▄▅▅▅▆▇▇▇▆▁▄▅▁▁▄▁▅▄▅▄▄▄▄ █
  3.06 ms      Histogram: log(frequency) by time      3.91 ms <

 Memory estimate: 64 bytes, allocs estimate: 2.
FFTW:
BenchmarkTools.Trial: 1297 samples with 1 evaluation.
 Range (min … max):  3.619 ms …   6.162 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     3.699 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.729 ms ± 134.475 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

        █▇▁▁                                                   
  ▃▃▃▃▃▆████▇▇▇▅▅▄▄▃▃▃▃▃▂▃▂▂▂▂▂▂▂▁▁▁▁▂▁▂▂▁▁▂▃▂▃▁▂▂▂▂▂▁▂▂▁▁▁▁▂ ▃
  3.62 ms         Histogram: frequency by time        4.14 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> bench(nextprod((2, 3, 5, 7), 1<<16 + 1))
RustFFT:
BenchmarkTools.Trial: 6508 samples with 1 evaluation.
 Range (min … max):  615.043 μs …  1.369 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     650.530 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   656.128 μs ± 36.265 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

           ▂▄█▅                                                 
  ▂▂▂▃▃▄▄▆▇████▇▆▆▄▄▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▁▁▂▁▂▂▂▂▂▂▂▂▂▂ ▃
  615 μs          Histogram: frequency by time          800 μs <

 Memory estimate: 64 bytes, allocs estimate: 2.
FFTW:
BenchmarkTools.Trial: 5800 samples with 1 evaluation.
 Range (min … max):  710.186 μs …  1.572 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     730.624 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   763.972 μs ± 79.095 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▄█▆▆▄▂▂▂▁▃▄▄▃▃▃▂▁▁                                           ▂
  ████████████████████▇▇▇█▇▇▆▆▆▆▇▇▇▇██▇██▇▆▆▅▅▁▄▃▃▄▅▄▄▅▃▃▅▃▆▇▅ █
  710 μs        Histogram: log(frequency) by time      1.14 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

Here RustFFT seems to beat FFTW on non-powers-of-2 sizes.

jling · May 29, 2023, 10:06pm

the main advantage is it’s not GPL?

Taaitaaiger · May 30, 2023, 1:07pm

It should be, the library is always compiled with the release flag, and the neon feature should be enabled by default:

On AArch64, the neon feature enables compilation of Neon-accelerated code. This requires rustc 1.61 or newer, and is enabled by default. If this feature is disabled, rustc 1.37 or newer is required.

On other platforms than AArch64, this feature does nothing and RustFFT will behave like it is not set.

There is some additional overhead, in particular the array and planner are tracked when the FFT is computed to avoid creating multiple mutable references, the returned data is boxed and requires building the DataType. That should be a reasonably constant amount of overhead per call, though.

stevengj · May 30, 2023, 2:01pm

FFTW also seems a lot more general. It looks like RustFFT doesn’t currently have multidimensional transforms, real-input transforms, DCTs/DSTs, or multi-threading?

Taaitaaiger · May 30, 2023, 7:10pm

Yeah, RustFFT does not support those directly as far as I’m aware. There is RealFFT, written by a contributor to RustFFT, it might be nice add support for that as well.

For this first release I decided I wanted to only release a PoC to show that a binding library could be written using jlrs and that most of the necessary glue code could be generated automatically.

Topic		Replies	Views
RustFFT.jl v0.2: AbstractFFTs interface and performance improvements Package Announcements	6	781	August 5, 2023
I wrote Rust bindings for Julia. If you're interested in calling Julia from Rust, please have a look and tell me what you think Tooling rust	0	1659	February 11, 2020
Simple loop won't vectorize New to Julia	12	1629	January 29, 2019
[ANN] Embedding Rust Library in a Julia Package General Usage announcement , rust	3	3367	September 27, 2018
FFT in Julia > 0.6.3 General Usage	2	2676	January 15, 2019

[ANN] RustFFT.jl: Compute forward and inverse FFTs with RustFFT

Usage

Related topics