Achieving C++ Speeds for a CFD Algorithm: Julia spends too much time on pre-allocation?

That documentation is out of date and the comment about O3 removed. The SLP optimizer pass is enabled by default now.

5 Likes

Hi -

Thanks to those of you who helped me make sense of the compilation/pre-allocation questions I had earlier. Once I found some time to make changes to the flow of my program, I have come up with the following logical flow.

The following code is in a file called main.jl

include("GaussSiedel.jl"); #
include("functions.jl"); # 
include("Loop.jl"); # these are files which contain all of my function declarations

# The following are values that don't change.
const delx = 1/64;
const dely = 1/64;
const x = 0:delx:1;
const y = 0:dely:1;
const dV = delx*dely;
const Re = 100;
const Co = 0.9;
const ncol = length(x);
const nrow = length(y);
const nc = nrow*ncol;
const interiorCells = nc - (2*nrow + 2*ncol - 4);
# In the following line, I initialize and pre-allocate all of the arrays in the problem.
u,v,oldu,oldv,deltau,deltav,Convx,Convy,Diffx,Diffy,Divergence,p,residue,gradxP,gradyP = initializeArrays(delx,dely);

# Here, I declare the "main" function for my program.
function main(u,v,oldu,oldv,deltau,deltav,Convx,Convy,Diffx,Diffy,Divergence,p,residue,gradxP,gradyP)
deltaVL2Norm = 1;
EnforceBoundaryConditions(u,v,p,nrow,ncol,nc);
coefficients = stencilFunction(delx,dely);
# Here is the while loop in which my function "Loop" is called repeatedly. All substantative calculations happen in the Loop function.
while deltaVL2Norm > 1e-8
	deltaVL2Norm = Loop(deltaVL2Norm,u,v,oldu,oldv,deltau,deltav,Convx,Convy,Diffx,Diffy,Divergence,p,residue,gradxP,gradyP);
end
printResults(u,v,p,nrow,ncol);
end

The way I run my code is by first typing include("main.jl") at the Julia REPL. This, I believe, pre-allocates arrays, defines constants, and (crucially) declares the function main(...) which in turn declares Loop(...). Next, I type main() together with all of its arguments at the Julia REPL, which executes everything.

Currently, this method gives me fairly good performance and accurate results compared to a benchmark CFD problem for coarse grid sizes. For finer grid sizes, the iterative solver takes too many iterations. While this is most probably my own fault, it is certainly taking much, much more time than C++ does for this problem.

I have the following questions:

  • When I execute main(...) once, the function is being executed for the first time and so it is also compiled at this time. However, I can’t execute main(...) again because now, all my variables are already at their correct values and the Loop() function will only be called once, since all variables satisfy the physical solution to this problem. One solution to this could be to place the initializing line (which resets all variables to zero)
    u,v,oldu,oldv,deltau,deltav,Convx,Convy,Diffx,Diffy,Divergence,p,residue,gradxP,gradyP = initializeArrays(delx,dely);
    inside the main() function. This way, when I call main(...) a second time, it will be re-solving the problem from scratch, and it should not be being compiled and so should be much faster. BUT when I do this, the following things happen. Why?
  1. It takes the same time in the second try as in the first, so my intuition about compiling of main(...) is wrong.
  2. Curiously, the solution is weird. The Poisson solver for pressure (an iterative Gauss-Siedel method that occurs in each time step) no longer converges in 2 iterations toward the end of the algorithm (when things are reaching steady state and the pressure equation should be near-immediately satisfied, physically) and instead always takes at least 28 iterations. This is extremely weird and I cannot see why placing the function initializeArrays(delx,dely) inside main(...) should make any difference.
  • It appears to make no difference to my code whether or not the arrays u,v,oldu,oldv etc. are included as arguments to main(..) or not. Even when I define main() to have no arguments, the code works exactly the same. What’s going on? Are my arrays (which are the crucial variables which the Loop() function operates on and modifies at each time step) living in the ‘right’ place, or are they floating around in relatively inaccessible parts of the memory?

  • Does it make sense to define constants using the keyword const in my program?

  • I am finding it harder and harder to stick to my original goal with this project, which (as the title of the original post states) was to achieve C++ speeds for an identical algorithm in Julia, at least once the code has been compiled. However, as written, there appears to be no good way to do this if I put everything into functions, as I believe I should.

Any other advice about how to improve my code, make it more ‘Julian’, and improve things like memory management would be highly appreciated.

  • It does make sense to define your consts
  • Your deltaVL2Norm should probably start out as a Float64, i.e., deltaVL2Norm = 1.0
  • If Loop takes a long time to run, the dynamic dispatch that happens due to lookup of the global variables in the arguments is probably insignificant in comparison to the runtime of Loop, hence you might not see a speedup when sending the arrays as arguments to main.
  • I can not see why it should make any difference initializing the arrays inside main, I have no idea what the function initializeArrays does.