Optimizing nested loops with conditional inside

Hello, I’m trying to build hydrodynamic codes in 1D, 2D and 3D in Julia. So far I have the codes working with first order solvers, and I’ve managed to make the 1D and 2D faster than Python and C++, now I’m trying to implement a new solver called HLL (and HLLC) and I’ve been using @turbo to improve the performance of the loops I have in the code, but it seems that I can’t use @turbo if I have an if statement inside the for. This is the code I need to optimize:

function p2f(F,UPr)
    F[1] = UPr[1]*UPr[2]
    F[2] = UPr[1]*UPr[2]*UPr[2] + UPr[4]
    F[3] = UPr[1]*UPr[2]*UPr[3]
    F[4] = UPr[2]*(0.5*UPr[1]*(UPr[2]^2 + UPr[3]^2) + 
        gamma/(gamma-1)*UPr[4])
end
function p2g(F,UPr)
    F[1] = UPr[1]*UPr[3]
    F[2] = UPr[1]*UPr[2]*UPr[3]
    F[3] = UPr[1]*UPr[3]*UPr[3] + UPr[4]
    F[4] = UPr[3]*(0.5*UPr[1]*(UPr[2]^2 + UPr[3]^2) + 
        gamma/(gamma-1)*UPr[4])
end
function HLLFluxes(U,F,G,UPrim)
    FL = zeros(Float64, neq)
    FR = zeros(Float64, neq)
    GL = zeros(Float64, neq)
    GR = zeros(Float64, neq)
    wavespeed(UPrim)
    for i in 1:nx+1
        for j in 1:ny+1
            if sl[i,j] >= 0
                @views p2f(F[:,i,j],UPrim[:,i,j])
            elseif sr[i,j] <= 0
                @views p2f(F[:,i,j],UPrim[:,i+1,j])
            else
                @views p2f(FL,UPrim[:,i,j])
                @views p2f(FR,UPrim[:,i+1,j])
                @views F[:,i,j] = (sr[i,j]*FL[:] .- sl[i,j]*FR[:] .+ sl[i,j]*sr[i,j]*(U[:,i+1,j] .- U[:,i,j]))./(
                sr[i,j]-sl[i,j])
            end
            if sd[i,j] >= 0
                @views p2g(G[:,i,j],UPrim[:,i,j])
            elseif su[i,j] <= 0
                @views p2g(G[:,i,j],UPrim[:,i,j+1])
            else
                @views p2g(GL,UPrim[:,i,j])
                @views p2g(GR,UPrim[:,i,j+1])
                @views G[:,i,j] = (su[i,j]*GL[:] .- sd[i,j]*GR[:] .+ sd[i,j]*su[i,j]*(U[:,i,j+1] .- U[:,i,j]))./(
                su[i,j]-sd[i,j])
            end
        end
    end
end

The first two functions only make the code more readable, the third function has the conditions of when the code takes the fluxes from the “left” or the “right” but since i need this condition, I can’t use the turbo macro to accelerate the code. I also tried doing A = sl[i,j] .>= 0 to get a BitMatrix and do the operations from p2f and p2g but didn’t improve the performance, so is there a way to use turbo with an if statement or there’s another way to ptimize the function HLLFluxes? Thanks in advance.

1 Like

You can try using ifelse to do a branchless if/else. It works by evaluating both branches. You can also use min/max…
These are basic branchless ops. I believe

@views p2f(FL,UPrim[:,i,j])
@views p2f(FR,UPrim[:,i+1,j])

and

@views p2g(GL,UPrim[:,i,j])
@views p2g(GR,UPrim[:,i,j+1])

could be taken outside then you do some ifelse or min/max arithmetic to get results without branch, then you could use Turbo. Not sure if it would help though, it depends on how frequently the code goes through each branch.

Yes, having conditionals in inner loops of finite-difference codes can really hurt performance (as we’ve found in other languages and problems too). If at all possible, I generally try to put these conditionals outside the loops: divide your domain into rectangular “chunks” where the conditionals are constants. If your conditionals are just there to handle boundary conditions, then it is better to use “ghost cells” for the boundary conditions and loop only over the interior of the domain without conditionals.

On a separate note, if things like UPrim[:,i,j] are N-component arrays in N dimensions, representing 1d/2d/3d vectors for velocity etcetera, I would strongly consider using StaticArrays instead.

Sorry my ignorance, but what is a branchless ifelse? Is it a ? b : c or the function ifelse(cond, b, c) or something else?

Yeah I’m trying to think a way to put the conditionals outside de for, I was thinking something like the thing I put at the end, doing A = sl[I,j] .<= 0 and then B = sr[I,j] .>= 0 doing the left or right or the average depending on these matrixes, (I haven thought this much, I’m afk) but I thing is a bit involve and I’m not sure if it would really improve performance

Also, the UPrim[1:4,1:nx+2,1:ny+2] has the primitives (the 1:4 are density, v_x, v_y and pressure in that order) in all the extension of the grid, and I checked the link you provided and it says that is recommended for dimensions <100 and the nx and ny are normally bigger than that ( between 200 and 1000 usually ) so I don’t know if that would be an issue

The ifelse function.

You would only use a StaticArray to replace the first dimension 1:4. That is, you would have a 2d (x by y) array of 4-component SVectors. This should enormously speed up your 4-component vector operations, which you are currently doing by slicing.

Ok so I tried using the ifelse like so:

@turbo for i in 1:nx+1
        for j in 1:ny+1
            @views p2f(FL,UPrim[:,i,j])
            @views p2f(FR,UPrim[:,i+1,j])
            @views F[:,i,j] = (sr[i,j]*FL[:] .- sl[i,j]*FR[:] .+ sl[i,j]*sr[i,j]*(U[:,i+1,j] .- U[:,i,j]))./(
                sr[i,j]-sl[i,j])
            @views F[:,i,j] = ifelse(sr[i,j] <= 0,FR[:],F[:,i,j])
            @views F[:,i,j] = ifelse(sl[i,j] >= 0,FL[:],F[:,i,j])
            @views p2g(GL,UPrim[:,i,j])
            @views p2g(GR,UPrim[:,i,j+1])
            @views G[:,i,j] = (su[i,j]*GL[:] .- sd[i,j]*GR[:] .+ sd[i,j]*su[i,j]*(U[:,i,j+1] .- U[:,i,j]))./(
                su[i,j]-sd[i,j])
            @views G[:,i,j] = ifelse(su[i,j] <= 0, GR[:], G[:,i,j])
            @views G[:,i,j] = ifelse(sd[i,j] >= 0, GL[:], G[:,i,j])
        end
    end

but apparently turbo cant handle the p2f and p2g functions, any idea why?

LoadError: Don't know how to handle expression.
p2f
in expression starting at In[23]:7

Note that the right hand side of this allocates a new array because you aren’t using .=. An even better fix is to use SVector as I suggested above.

Oh, I didn’t think of that, I’ll try that thx

Unfortunately I have to check the condition in all the domain to be sure if the flow of the conserved variables come from the left or right cells (up or down also) and since I’m doing the code to try to see the evolution of gas in presence of several explosions, I don’t know where the conditions are fulfilled.