Help making function faster

favba · May 18, 2017, 9:18pm

In order to save memory in my computations I created a module with array wrappers (named TransformableFields) and in-place rfft! and irfft!.
The module can be found here InPlaceRealFFT.jl. There is no dependencies, so It can be just copied and one is able to use it.
I am using fft’s to compute derivates (among other things), but the inplace version is given much slower results. Here is a sample code

using InPlaceRealFFT

function rfftfreq(n::I,s::R) where {I<:Integer,R<:Real}
  d = 2π*s/n
  [(n/2 - i)/(d*n) for i = n/2:-1:0]
end

function dx(field::A,len::Float64,n=1::Int) where {T<:Union{Float32,Float64},A<:AbstractArray{T,3}}
  nx,ny,nz = size(field)
  fieldhat = rfft(field,1)
  kim = (rfftfreq(nx,len) .* im) .^ n

  Threads.@threads for l = 1:nz
    for j = 1:ny
      for i = 1:length(kim)
        @inbounds fieldhat[i,j,l] = fieldhat[i,j,l]*kim[i]
      end
    end
  end
  return irfft(fieldhat,nx,1)
end

function dx!(field::TransformableField{T,3},len::T,n=1::Int) where T<:Union{Float64,Float32}

  nx,ny,nz = size(field.r)

  rfft!(field,1)

  kim = (rfftfreq(nx,len) .* im) .^ n
  fieldhat = field.c
  nx,ny,nz = size(field)
  Threads.@threads for l = 1:nz
    for j = 1:ny
      for i = 1:nx
        @inbounds fieldhat[i,j,l] = fieldhat[i,j,l]*kim[i]
      end
    end
  end
  return irfft!(field,1)
end

lx=1.0;
a = randt(1024,512,512); # randomly generated TransformableField
b = copy(a.r); # b Simple Array
@time c = dx(b,lx);
@time dx!(a,lx);

If I run this script I have the following output:

21.337080 seconds (1.60 M allocations: 4.088 GiB, 2.14% gc time)
93.240138 seconds (777.40 M allocations: 27.121 GiB, 3.63% gc time)

It seems to me that dx! is spending most of the time in the for loop. I can’t figure out why that is allocating so much.

Probably both functions can be optimised, but I expected that they would give similar results the way they are written now.

This was run using Julia 0.6rc1.

ksmcreynolds · May 19, 2017, 2:18pm

The issue appears to be due to the behavior of threads in 0.6. I’ve had a similar problem (see Multithreading performance regressions in 0.6?). You can solve this by wrapping the threaded loop in its own function and calling that. Doing that I get dx! to be about twice as fast as dx.

Also you should run each function once before timing it or you will include the compilation time that only occurs on the first run.

kristoffer.carlsson · May 19, 2017, 2:32pm

The code that is profiled is inside a function?

favba · May 19, 2017, 7:29pm

Thank you, that was it. I also get dx! to be about twice as fast as dx now.

Topic		Replies	Views
Inplace fft convolution General Usage	2	984	August 13, 2017
About In-line-fft Performance question , fftw	6	712	October 24, 2023
Is multithreaded FFTW on multiple arrays causing too much overhead? Performance fftw	15	882	August 19, 2023
ANN: InplaceRealFFT.jl : inplace real-to-complex and complex-to-real FFT Numerics fftw	25	3307	January 5, 2018
Scaled FFT implementation seems to only use one thread despite setting no. of threads to 8 New to Julia fftw , multithreading	2	491	October 27, 2022

Help making function faster

Related topics