Hi Experts ,I’m just studing Julia But I’m in truble. Berrow 2D turblence simulation code is that I translated from Matlab is use too much memory. At my Simulation environment it use approx. 32 GB. I found some info about In-Line fft for saving memory. But It is not working well, I mean not saving memory. I would like to borrow your help. Sorry for my bad English
using FFTW
using Printf
using LinearAlgebra
using MAT
# set parameters
N = 128 ; # N*N grid,
dt = 0.02 ; # timestep
simutimeSeconds = 10 ;
HowOftenSave = 10 ; # save velocity field per 'HowOftenSave' timesteps
HowOftenVisu = 10 ; # show illustration per 'HowOftenVisu' timesteps
simutimeSteps = round(simutimeSeconds/dt);
nu = 1/1000 ; # kinematic viscosity [nu] = m^2/s
L = 2*pi ; # domain side length
dx = L/N ; # grid spacing assumed dx=dy
Umax = 1 ;
F_0 = 0.03 ;
# create fields
x = (L-dx)*(0:(N-1))/(N-1) ; # to be precise, dx is subtracted here
y = (L-dx)*(0:(N-1))/(N-1) ; # otherwise the derivative of e.g. sin(x) wouldnt be differentiable on this periodic grid
X=zeros(N,N);
Y=zeros(N,N);
for jx=1:N
for ix=1:N
X[ix,jx]=x[jx]
Y[ix,jx]=y[ix]
end
end
# wavenumbers
kx1 = mod.(1/2 .+ (0:(N-1))/N, 1) .- 1/2;
ky1 = mod.(1/2 .+ (0:(N-1))/N, 1) .- 1/2;
kx = kx1*(2*pi/dx); # wavenumbers
ky = ky1*(2*pi/dx); # wavenumbers
KX=zeros(N,N);
KY=zeros(N,N);
for jkx=1:N
for ikx=1:N
KX[ikx,jkx]=kx[jkx]
KY[ikx,jkx]=ky[ikx]
end
end
# Anti-aliasing filter based on the 2/3*kNyq rule
AA = (abs.(KX) .< (2/3) * maximum(kx)) .* (abs.(KY) .< (2/3) * maximum(ky))
# Runge-Kutta 4 coefficients
a=[1/6 1/3 1/3 1/6]; b=[1/2 1/2 1 1];
#-- Taylor-Green --#
kmax = 4;
U = -Umax .* cos.(kmax .* X) .* sin.(kmax .* Y)
V = Umax .* sin.(kmax .* X) .* cos.(kmax .* Y)
U = U + randn(N,N)*F_0;
V = V + randn(N,N)*F_0;
Uhat = fft(U)
Vhat = fft(V)
Uhat = Uhat - (KX .* Uhat + KY .* Vhat) .* KX ./ (KX .^ 2 + KY .^ 2)
Uhat[isnan.(Uhat)] .= 0
Vhat = Vhat - (KX .* Uhat + KY .* Vhat) .* KY ./ (KX .^ 2 + KY .^ 2)
Vhat[isnan.(Vhat)] .= 0
# U = p_ifft * Uhat;
# V = p_ifft * Vhat;
matwrite("uv0000.mat", Dict("Uhat" => U, "Vhat" => V))
dUdx = zeros(N, N)
dUdy = zeros(N, N)
dVdx = zeros(N, N)
dVdy = zeros(N, N)
dUconv = complex(zeros(N, N))
dVconv = complex(zeros(N, N))
dUdiff = complex(zeros(N, N))
dVdiff = complex(zeros(N, N))
dU = complex(zeros(N, N))
dV = complex(zeros(N, N))
Uc = complex(zeros(N, N))
Vc = complex(zeros(N, N))
Uold = complex(zeros(N, N))
Vold = complex(zeros(N, N))
p_fft_U = plan_fft(U; flags=FFTW.ESTIMATE)
p_ifft_Uhat = plan_ifft(Uhat; flags=FFTW.ESTIMATE)
p_fft_V = plan_fft(V; flags=FFTW.ESTIMATE)
p_ifft_Vhat = plan_fft(Vhat; flags=FFTW.ESTIMATE)
p_ifft_dUdx = plan_ifft((1im * KX .* AA .* Uc); flags=FFTW.ESTIMATE)
p_ifft_dUdy = plan_ifft((1im * KY .* AA .* Uc); flags=FFTW.ESTIMATE)
p_ifft_dVdx = plan_ifft((1im * KX .* AA .* Vc); flags=FFTW.ESTIMATE)
p_ifft_dVdy = plan_ifft((1im * KY .* AA .* Vc); flags=FFTW.ESTIMATE)
p_fft_dUconv = plan_fft((dt*(-U .* dUdx -V .* dUdy) / 2); flags=FFTW.ESTIMATE)
p_fft_dVconv = plan_fft((dt*(-U .* dVdx -V .* dVdy) / 2); flags=FFTW.ESTIMATE)
# Navier-Stokes simulation begins
@time begin
for t=1:simutimeSteps
@printf("| Time step # %04d |\n",t)
#SolveNavierStokes2D;
Uold = Uhat;
Vold = Vhat;
Uc = Uhat;
Vc = Vhat;
for rk in 1:4
U = real.(p_ifft_Uhat*(AA .* Uc))
V = real.(p_ifft_Vhat*(AA .* Vc))
dUdx .= real.(p_ifft_dUdx * (1im * KX .* AA .* Uc));
dUdy .= real.(p_ifft_dUdy * (1im * KY .* AA .* Uc));
dVdx .= real.(p_ifft_dVdx * (1im * KX .* AA .* Vc));
dVdy .= real.(p_ifft_dVdy * (1im * KY .* AA .* Vc));
dUconv .= p_fft_dUconv*(dt*(-U .* dUdx -V .* dUdy) / 2);
dVconv .= p_fft_dVconv*(dt*(-U .* dVdx -V .* dVdy) / 2);
dUdiff .= nu * dt * ((-KX .* KX .* Uc) + (- KY .* KY .* Uc)) ;
dVdiff .= nu * dt * ((-KX .* KX .* Vc) + (- KY .* KY .* Vc)) ;
dU .= dUconv + dUdiff;
dV .= dVconv + dVdiff;
Uhat = Uhat-(KX.*Uhat + KY.*Vhat).*KX./(KX.^2 + KY.^2);
Uhat[isnan.(Uhat)] .= 0;
Vhat = Vhat-(KX.*Uhat + KY.*Vhat).*KY./(KX.^2 + KY.^2);
Vhat[isnan.(Vhat)] .= 0;
if rk < 4
Uhat = Uold + b[rk] * dU
Vhat = Vold + b[rk] * dV
end
Uc = Uc + a[rk] * dU;
Vc = Vc + a[rk] * dV;
end
Uhat = Uc
Vhat = Vc
Uhat = Uhat - ((KX .* Uhat + KY .* Vhat) .* KX ./ (KX .^ 2 + KY .^ 2))
Uhat[isnan.(Uhat)] .= 0
Vhat = Vhat - ((KX .* Uhat + KY .* Vhat) .* KY ./ (KX .^ 2 + KY .^ 2))
Vhat[isnan.(Vhat)] .= 0
U = real.(p_ifft_Uhat*Uhat);
V = real.(p_ifft_Vhat*Vhat);
# SaveInstantField2D
if mod(t, HowOftenSave) == 0
filename = "uv$(@sprintf("%04d", t)).mat"
matwrite(filename, Dict("U" => U, "V" => V))
end
end
end
It seems that you are running your code in global scope, which is terrible for performance and will lead to excessive allocations. Make sure that your code is wrapped in a function when you run it.
Also I’m not a differential equation expert but the references to “Runge-Kutta coefficients” in your code make me suspect that you are rolling your own rk4 implementation here? If that’s the case I would concur with Oscar’s advice in a thread a few weeks ago:
Thanks for the advice.
I have tried to enclose all code in functions.
However, nothing has changed.
Specifically, I wrapped everything from “Setparameter” to “SaveInstantFields2D” in functions.
Then call function for simulation.
I honestory want to run rk4 from DefferentialEquattion.jl. But my Lab’s rk4 code progression is little unique so I cannot write rk4 from pakkage…
There are a few performance gotaches in your code that can be easily addressed:
The FFT plans you created allocate extra memory whenever you apply them with the * operator. To avoid that you should preallocate storage for those results and use the mul! function to evalute the FFT into a buffer. This is mentioned in the docs here (which unfortunately are separated from the docs of FFTW.jl) API · AbstractFFTs.jl
I see you spliced in many . do broadcast operations, however, there are a few lines where you missed one or two. Perhaps consider the @. macro instead that automatically applies . to every operation in the expression it preceeds, see Arrays · The Julia Language
Try following the performance tips that were linked earlier. In particular try wrapping everything (including the non-constant parameters) into a function.
Try using a profiler like BenchmarkTools.jl, PProf.jl or ProfileView.jl to get detailed infos of where your code is bottlenecked.
I don’t believe this code is doing what it is supposed to do or at least it is horribly inefficient in what it tries to do. I put some annotations below.
To use inplace ffts, I think it is sufficient to append a “!” to your plan_fft calls.
To apply a plan with no allocations, you have two choices:
Use a pre-allocated output array and use mul!, e.g. mul!(output_array, p_ifft_dUdx, output_array, input_array)
Create an in-place plan with plan_fft!
Note that you have lots of other unnecessary memory allocations.
e.g. 1im * KX .* AA .* Uc allocates two arrays, one for 1im * KX and one for for the result of the other multiplications (since the 1im * operation is not fused, see e.g. the more dots blog post). If you want to make it fully allocation-free you would also want to pre-allocate the output, e.g. @. dUdx_fourier = 1im * KX * AA * Uc (@. makes it a lot more readable), then do dUdx .= real.(p_ifft_dUdx * dUdx_fourier) with your in-place plan (via plan_fft!).
complex(zeros(N, N)) allocates an array of real zeros, then allocates a new array of complex zeros. To use a single allocation, do zeros(ComplexF64, N, N).
dU .= dUconv + dUdiff allocates an array for the right hand side, because you didn’t use .+
Uhat = Uhat-(KX.*Uhat + KY.*Vhat).*KX./(KX.^2 + KY.^2) allocates, because you didn’t use .+ or .=. (Would have been clearer just to use @.)
Uhat[isnan.(Uhat)] .= 0 allocates an array for the result of isnan.(Uhat)
etc.
PS. There is no need for a semicolon (;) after every statement.
PPPS. In general, porting Matlab code line-by-line will result in suboptimal Julia code. You take full advantage of Julia by writing in a Julia style, not in a Matlab style.
PPPPS. If your FFT outputs are always real, i.e. your FFT inputs have conjugate symmetry, then you can save a factor of two in time and memory by using the rfft interface (where you only store the non-redundant half in the Fourier domain). This is a little trickier to work with, however, so I would put that off until you’ve gotten everything else going smoothly and have a better understanding of Julia.