Example C vs Julia vs Python

I have been using Julia for about a year. I am an EE, not a programmer. In one of my graduate EE courses I requested to use Julia instead of C for an assignment and got the task of comparing performance of C, Julia, and Python. As a test case I did a 3D finite difference time domain simulation of electromagnetic wave propagation. I compared timing and got the following results. More detail on the approach is below. Also the code is posted below. Any suggestions for how to speed things up further?

EDIT/Update - Substantial improvements were made through the suggestions in this thread. See this post for a summary of some of the improvements.

Table 1 - Execution Time (Seconds)
Run Number	C	        Julia	     Python
1	        0.410	    22.7	     144.4
2	        0.438	    22.5	     144.6
3	        0.438	    23.7	     146.7
4	        0.428	    21.7	     144.4
5	        0.420	    24.0	     146.8
6	        0.421	    21.8	     144.5
			
Avg    	    0.426	    22.6	     145.2
Ratio to C	1.00	    52.9	     341.1

Test Information:

  1. The grid size is 48 x 47 x 47 (106,032 points). I changed the C code to match this grid size.
  2. MaxTime is 150 (giving total points calculated of 15,904,800). This was enough time for the waves to propagate beyond the edge of the grid.
  3. Computer is a Windows 10 64 bit machine with:
    a. Intel Core i7-6820HQ CPU @ 2.7 GHz
    b. 32 GB of RAM
  4. Julia Version: 1.1.1
  5. Python Version: 3.7.3
  6. MingW/GCC Version: 8.2.0-3
  7. The code in each platform was set to complete the computations and store constant Ex plane results for each time-step to a 3x3x3 array called Eout (dimensions SizeY x SizeZ x MaxTime). No output plots or animations were included. I adapted your C code by removing the snapshot3d related files. I added the output variable (Eout) to the 3ddemo.c file.
  8. I did a detailed evaluation of the Julia output to make sure the results made sense and that the wave behaved as expected.
  9. I compared snapshots of results across all three platforms to make sure each was computing the same result. There were only very small variations (on the order of 0.001% difference between platforms).
  10. I closed all unnecessary applications on my computer before running the timing comparisons, one platform at a time. I waited until CPU usage was below 3% before running any script or code. I then watched the task manager throughout the execution to make sure Windows did not open applications that consumed significant resources. Windows did open burdensome network applications on a couple of occasions so I repeated runs as necessary until I got 6 sets of ā€œundisturbedā€ results for each platform.
  11. Timing of the Julia script was done with a module called ā€œCPUtimeā€ which tracks total CPU time for the script execution.
  12. Timing of the Python script was done with a module called ā€œtimeitā€ which calls the ā€œdefault_timer()ā€ function. This links to ā€œtime.perfcounter()ā€. According to discussions on StackOverflow (see link) this function is a CPU counter. python - Understanding time.perf_counter() and time.process_time() - Stack Overflow.
  13. Where possible I compared the CPU time against ā€œrealā€ time measurements. In all cases, the CPU time and real time were very close. Julia and Python are not set with hyperthreading so CPU times are expected to be close to real time. I tried hyperthreading in Julia, but it did not improve performance. I found a thread in a Julia forum where one of the developers said this was an expected result for Basic Linear Algebra Subprograms (Blas). https://github.com/JuliaLang/julia/issues/6293.
  14. I used the Jupyter Notebook Environment for both the Julia and Python scripts. I also ran the scripts in a separate IDE to make sure that the timing results were not significantly affected by Jupyter.
  15. Timing of the C code was done using a Windows PowerShell command: C:\C_FDTD>powershell ā€œMeasure-Command { .\3ddemo.exe}ā€. The output of this command is real time. A custom option exists to get CPU time, but I do not have rights on my computer to import custom scripts into PowerShell. As mentioned above, the CPU time and real time will be close unless parallel cores are being used. To preclude the possibility of having multiple cores in play I used the task manager to force commands to one core.

Code:

#3D finite difference time domain (FDTD) for an elementary dipole antenna excited by a Ricker Wavelet
#This code only generates tabular results and does not generate the animated plot.

using CPUTime

CPUtic()
start = time()

#Set constants - these are physical domain sizes for the FDTD problem  (nearly a cubic space)
const SizeX = 48
const SizeY = 47
const SizeZ = 47

const MaxTime = 150 #this is the time domain size for the FDTD problem

const Cdtds = 1.0/(sqrt(3.0)) #Courant number
const imp0 = 377.0 #characteristic impedance of free space
const abccoef = (Cdtds - 1.0)/(Cdtds + 1.0) #coefficient associated with boundary conditions
const ppw = 15 #points per wave

#initialize arrays to store electric and magnetic field values in 3D space 
Ex = zeros(Float64,(SizeX,SizeY,SizeZ)) 
Ey = zeros(Float64,(SizeX,SizeY,SizeZ)) 
Ez = zeros(Float64,(SizeX,SizeY,SizeZ)) 

Hx = zeros(Float64,(SizeX,SizeY,SizeZ)) 
Hy = zeros(Float64,(SizeX,SizeY,SizeZ)) 
Hz = zeros(Float64,(SizeX,SizeY,SizeZ)) 

#Define arrays for plotting results.  
Eout = zeros(Float64,(SizeY,SizeZ,MaxTime)) #Ouput will be electric field on a 2D cross section 
tvar = collect(1:MaxTime) #time array
    
#Set electric field update coefficients
Cexe = fill(1.0,(SizeX,SizeY,SizeZ))
Cexh = fill(Cdtds * imp0,(SizeX,SizeY,SizeZ))

Ceye = fill(1.0,(SizeX,SizeY,SizeZ))
Ceyh = fill(Cdtds * imp0,(SizeX,SizeY,SizeZ))

Ceze = fill(1.0,(SizeX,SizeY,SizeZ))
Cezh = fill(Cdtds * imp0,(SizeX,SizeY,SizeZ))

#Set magnetic field update coefficients
Chxh = fill(1.0,(SizeX,SizeY,SizeZ))
Chxe = fill(Cdtds / imp0,(SizeX,SizeY,SizeZ))

Chyh = fill(1.0,(SizeX,SizeY,SizeZ))
Chye = fill(Cdtds / imp0,(SizeX,SizeY,SizeZ))

Chzh = fill(1.0,(SizeX,SizeY,SizeZ))
Chze = fill(Cdtds / imp0,(SizeX,SizeY,SizeZ))

#Initialize temporary variables for absorbing boundary condition (ABC) function
Eyx0 = zeros(Float64, (SizeY-1,SizeZ))
Ezx0 = zeros(Float64, (SizeY,SizeZ-1)) 
Eyx1 = zeros(Float64, (SizeY-1,SizeZ))
Ezx1 = zeros(Float64, (SizeY,SizeZ-1))

Exy0 = zeros(Float64, (SizeX-1,SizeZ))
Ezy0 = zeros(Float64, (SizeX,SizeZ-1))
Exy1 = zeros(Float64, (SizeX-1,SizeZ))
Ezy1 = zeros(Float64, (SizeX,SizeZ-1))

Exz0 = zeros(Float64, (SizeX-1,SizeY))
Eyz0 = zeros(Float64, (SizeX,SizeY-1))
Exz1 = zeros(Float64, (SizeX-1,SizeY))
Eyz1 = zeros(Float64, (SizeX,SizeY-1))

#update magnetic field (H) over all space at a given time step given electric and magnetic fields from the previous time step
#achieved by nested for loops
function updateH(Ex,Ey,Ez,Hx,Hy,Hz)  

    for pp = 1:SizeZ-1
        for nn = 1:SizeY-1
            for mm = 1:SizeX
                    Hx[mm,nn,pp] = (Chxh[mm,nn,pp] * Hx[mm,nn,pp] + 
                        Chxe[mm,nn,pp] * ((Ey[mm,nn,pp+1] - Ey[mm,nn,pp]) - (Ez[mm,nn+1,pp] - Ez[mm,nn,pp])))
    end;end;end


    for pp = 1:SizeZ-1
        for nn = 1:SizeY
            for mm = 1:SizeX-1
                    Hy[mm,nn,pp] = (Chyh[mm,nn,pp] * Hy[mm,nn,pp] + 
                        Chye[mm,nn,pp] * ((Ez[mm+1,nn,pp] - Ez[mm,nn,pp]) - (Ex[mm,nn,pp+1] - Ex[mm,nn,pp])))
    end;end;end
    
    for pp = 1:SizeZ
        for nn = 1:SizeY-1
            for mm = 1:SizeX-1
                    Hz[mm,nn,pp] = (Chzh[mm,nn,pp] * Hz[mm,nn,pp] + 
                        Chze[mm,nn,pp] * ((Ex[mm,nn+1,pp] - Ex[mm,nn,pp]) - (Ey[mm+1,nn,pp] - Ey[mm,nn,pp])))
    end;end;end
    
    return Hx,Hy,Hz
end

#update electric field  (E) over all space at a given time step given electric and magnetic fields from the previous time step
function updateE(Ex,Ey,Ez,Hx,Hy,Hz) #update electric field

    for pp = 2:SizeZ-1
        for nn = 2:SizeY-1
            for mm = 1:SizeX-1
                    Ex[mm,nn,pp] = (Cexe[mm,nn,pp] * Ex[mm,nn,pp] + 
                        Cexh[mm,nn,pp] * ((Hz[mm,nn,pp] - Hz[mm,nn-1,pp]) - (Hy[mm,nn,pp] - Hy[mm,nn,pp-1])))
    end;end;end

    for pp = 2:SizeZ-1
        for nn = 1:SizeY-1
            for mm = 2:SizeX-1
                    Ey[mm,nn,pp] = (Ceye[mm,nn,pp] * Ey[mm,nn,pp] + 
                        Ceyh[mm,nn,pp] * ((Hx[mm,nn,pp] - Hx[mm,nn,pp-1]) - (Hz[mm,nn,pp] - Hz[mm-1,nn,pp])))
    end;end;end
    
    for pp = 1:SizeZ-1
        for nn = 2:SizeY-1
            for mm = 2:SizeX-1
                    Ez[mm,nn,pp] = (Ceze[mm,nn,pp] * Ez[mm,nn,pp] + 
                        Cezh[mm,nn,pp] * ((Hy[mm,nn,pp] - Hy[mm-1,nn,pp]) - (Hx[mm,nn,pp] - Hx[mm,nn-1,pp])))
    end;end;end
    
    return Ex,Ey,Ez
end

#apply absorbing boundary conditions (ABCs) at the boundary planes of all x,y,z dimensions
function abc3dfirst(Ex,Ey,Ez)
    global Eyx0, Ezx0, Eyx1, Ezx1, Exy0, Ezy0, Exy1, Ezy1, Exz0, Eyz0, Exz1, Eyz1
    
 
    # =============================================================================
    #ABC at x boundary plane 1 
    mm = 1
    for pp = 1:SizeZ
        for nn = 1:SizeY-1
            Ey[mm,nn,pp] = Eyx0[nn,pp] + abccoef * (Ey[mm+1,nn,pp] - Ey[mm,nn,pp])
            Eyx0[nn,pp] = Ey[mm+1,nn,pp]
    end;end
    
    for pp = 1:SizeZ-1
        for nn = 1:SizeY
            Ez[mm,nn,pp] = Ezx0[nn,pp] + abccoef * (Ez[mm+1,nn,pp] - Ez[mm,nn,pp])
            Ezx0[nn,pp] = Ez[mm+1,nn,pp]
    end;end
    
    
    #ABC at x boundary plane 2
    mm = SizeX
    for pp = 1:SizeZ
        for nn = 1:SizeY-1
            Ey[mm,nn,pp] = Eyx1[nn,pp] + abccoef * (Ey[mm-1,nn,pp] - Ey[mm,nn,pp])
            Eyx1[nn,pp] = Ey[mm-1,nn,pp]
    end;end
    
    for pp = 1:SizeZ-1
        for nn = 1:SizeY
            Ez[mm,nn,pp] = Ezx1[nn,pp] + abccoef * (Ez[mm-1,nn,pp] - Ez[mm,nn,pp])
            Ezx1[nn,pp] = Ez[mm-1,nn,pp]
    end;end
    
     
    # =============================================================================    
    #ABC at  y boundary plane 1
    nn = 1
    for pp = 1:SizeZ
        for mm = 1:SizeX-1
            Ex[mm,nn,pp] = Exy0[mm,pp] + abccoef * (Ex[mm,nn+1,pp] - Ex[mm,nn,pp])
            Exy0[mm,pp] = Ex[mm,nn+1,pp]
    end;end
    
    for pp = 1:SizeZ-1
        for mm = 1:SizeX
            Ez[mm,nn,pp] = Ezy0[mm,pp] + abccoef * (Ez[mm,nn+1,pp] - Ez[mm,nn,pp])
            Ezy0[mm,pp] = Ez[mm,nn+1,pp]
    end;end
    
 
    
    #ABC at  x boundary plane 2
    nn = SizeY
    for pp = 1:SizeZ
        for mm = 1:SizeX-1
            Ex[mm,nn,pp] = Exy1[mm,pp] + abccoef * (Ex[mm,nn-1,pp] - Ex[mm,nn,pp])
            Exy1[mm,pp] = Ex[mm,nn-1,pp]
    end;end
    
    for pp = 1:SizeZ-1
        for mm = 1:SizeX
            Ez[mm,nn,pp] = Ezy1[mm,pp] + abccoef * (Ez[mm,nn-1,pp] - Ez[mm,nn,pp])
            Ezy1[mm,pp] = Ez[mm,nn-1,pp]
    end;end
    
 
    # =============================================================================    
    #ABC at  z boundary plane 1
    pp = 1
    for nn = 1:SizeY
        for mm = 1:SizeX-1
            Ex[mm,nn,pp] = Exz0[mm,nn] + abccoef * (Ex[mm,nn,pp+1] - Ex[mm,nn,pp])
            Exz0[mm,nn] = Ex[mm,nn,pp+1]
    end;end
    
    for nn = 1:SizeY-1
        for mm = 1:SizeX
            Ey[mm,nn,pp] = Eyz0[mm,nn] + abccoef * (Ey[mm,nn,pp+1] - Ey[mm,nn,pp])
            Eyz0[mm,nn] = Ey[mm,nn,pp+1]
    end;end
    

    
    #ABC at  z boundary plane 2
    pp = SizeZ
    for nn = 1:SizeY
        for mm = 1:SizeX-1
            Ex[mm,nn,pp] = Exz1[mm,nn] + abccoef * (Ex[mm,nn,pp-1] - Ex[mm,nn,pp])
            Exz1[mm,nn] = Ex[mm,nn,pp-1]
    end;end
    
    for nn = 1:SizeY-1
        for mm = 1:SizeX
            Ey[mm,nn,pp] = Eyz1[mm,nn] + abccoef * (Ey[mm,nn,pp-1] - Ey[mm,nn,pp])
            Eyz1[mm,nn] = Ey[mm,nn,pp-1]
    end;end

    return  Ex,Ey,Ez
end

#define the driving source - a Ricker wavelet:  https://en.wikipedia.org/wiki/Mexican_hat_wavelet
function ricker(t,location)
    src = 0
    delay = 5
    if t > delay
        arg = Ļ€ * ((Cdtds * (t - (delay+1)) - location) / ppw - 1)
        src = (1.0 - 2.0 * arg^2) * exp(-(arg^2))
    end
    return src
end


#Main control code block.  Step through applicable functions
for qtime = 1:MaxTime #time stepping loop
    Hx,Hy,Hz = updateH(Ex,Ey,Ez,Hx,Hy,Hz)  #magnetic field update equation
    Ex,Ey,Ez = updateE(Ex,Ey,Ez,Hx,Hy,Hz)  #electric field update equation
    Ex[24,23,23] += ricker(qtime,0)        #update Ricker wavelet source
    Ex,Ey,Ez = abc3dfirst(Ex,Ey,Ez)        #apply absorbing boundary conditions

    Eout[:,:,qtime] = Ex[24,:,:] #Save constant-x plane time domain data to a 3D matrix
    #source[qtime] = ricker(qtime,0)       #source output plot for troubleshooting - normally commented out
end  


CPUtoc()
println("elapsed real time: ", round(time() - start;digits=3)," seconds")
println("Computation Complete")

The corresponding C code can be found on page 252 of:
https://www.eecs.wsu.edu/~schneidj/ufdtd/ufdtd.pdf

Understanding the FDTD Method by John B. Schneider is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Itā€™s hard for people to get much out of this post without being able to see the code. Is there any chance you can link to the different implementations to people can snoop around? I find it suspicious that the julia code is running 55x slower than the C code.


Edit: the original post now has the julia code.

3 Likes

summing the columns instead of the lines?

1 Like

Seems like youā€™re using a lot of non-constant globals. See:

https://docs.julialang.org/en/v1/manual/performance-tips/

5 Likes

So I just quickly went through and wrapped the code in a let block, removed all the const annotations and put @inbounds in front of all the loops and the first time I run the code I get

 elapsed CPU time: 0.668674 seconds
 elapsed real time: 0.799 seconds

and the second time (now that all the functions are compiled) I get

 elapsed CPU time: 0.543445 seconds
 elapsed real time: 0.551 seconds

Hereā€™s my modified code:

using CPUTime

let

    CPUtic()
    start = time()

    #Set constants - these are physical domain sizes for the FDTD problem  (nearly a cubic space)
    SizeX = 48
    SizeY = 47
    SizeZ = 47

    MaxTime = 150 #this is the time domain size for the FDTD problem

    Cdtds = 1.0/(sqrt(3.0)) #Courant number
    imp0 = 377.0 #characteristic impedance of free space
    abccoef = (Cdtds - 1.0)/(Cdtds + 1.0) #coefficient associated with boundary conditions
    ppw = 15 #points per wave

    #initialize arrays to store electric and magnetic field values in 3D space 
    Ex = zeros(Float64,(SizeX,SizeY,SizeZ)) 
    Ey = zeros(Float64,(SizeX,SizeY,SizeZ)) 
    Ez = zeros(Float64,(SizeX,SizeY,SizeZ)) 

    Hx = zeros(Float64,(SizeX,SizeY,SizeZ)) 
    Hy = zeros(Float64,(SizeX,SizeY,SizeZ)) 
    Hz = zeros(Float64,(SizeX,SizeY,SizeZ)) 

    #Define arrays for plotting results.  
    Eout = zeros(Float64,(SizeY,SizeZ,MaxTime)) #Ouput will be electric field on a 2D cross section 
    tvar = collect(1:MaxTime) #time array

    #Set electric field update coefficients
    Cexe = fill(1.0,(SizeX,SizeY,SizeZ))
    Cexh = fill(Cdtds * imp0,(SizeX,SizeY,SizeZ))

    Ceye = fill(1.0,(SizeX,SizeY,SizeZ))
    Ceyh = fill(Cdtds * imp0,(SizeX,SizeY,SizeZ))

    Ceze = fill(1.0,(SizeX,SizeY,SizeZ))
    Cezh = fill(Cdtds * imp0,(SizeX,SizeY,SizeZ))

    #Set magnetic field update coefficients
    Chxh = fill(1.0,(SizeX,SizeY,SizeZ))
    Chxe = fill(Cdtds / imp0,(SizeX,SizeY,SizeZ))

    Chyh = fill(1.0,(SizeX,SizeY,SizeZ))
    Chye = fill(Cdtds / imp0,(SizeX,SizeY,SizeZ))

    Chzh = fill(1.0,(SizeX,SizeY,SizeZ))
    Chze = fill(Cdtds / imp0,(SizeX,SizeY,SizeZ))

    #Initialize temporary variables for absorbing boundary condition (ABC) function
    Eyx0 = zeros(Float64, (SizeY-1,SizeZ))
    Ezx0 = zeros(Float64, (SizeY,SizeZ-1)) 
    Eyx1 = zeros(Float64, (SizeY-1,SizeZ))
    Ezx1 = zeros(Float64, (SizeY,SizeZ-1))

    Exy0 = zeros(Float64, (SizeX-1,SizeZ))
    Ezy0 = zeros(Float64, (SizeX,SizeZ-1))
    Exy1 = zeros(Float64, (SizeX-1,SizeZ))
    Ezy1 = zeros(Float64, (SizeX,SizeZ-1))

    Exz0 = zeros(Float64, (SizeX-1,SizeY))
    Eyz0 = zeros(Float64, (SizeX,SizeY-1))
    Exz1 = zeros(Float64, (SizeX-1,SizeY))
    Eyz1 = zeros(Float64, (SizeX,SizeY-1))

    #update magnetic field (H) over all space at a given time step given electric and magnetic fields from the previous time step
    #achieved by nested for loops
    function updateH(Ex,Ey,Ez,Hx,Hy,Hz)  
        @inbounds for pp = 1:SizeZ-1
            for nn = 1:SizeY-1
                for mm = 1:SizeX
                    Hx[mm,nn,pp] = (Chxh[mm,nn,pp] * Hx[mm,nn,pp] + 
                                    Chxe[mm,nn,pp] * ((Ey[mm,nn,pp+1] - Ey[mm,nn,pp]) - (Ez[mm,nn+1,pp] - Ez[mm,nn,pp])))
                end;end;end


        @inbounds for pp = 1:SizeZ-1
            for nn = 1:SizeY
                for mm = 1:SizeX-1
                    Hy[mm,nn,pp] = (Chyh[mm,nn,pp] * Hy[mm,nn,pp] + 
                                    Chye[mm,nn,pp] * ((Ez[mm+1,nn,pp] - Ez[mm,nn,pp]) - (Ex[mm,nn,pp+1] - Ex[mm,nn,pp])))
                end;end;end
        
        @inbounds for pp = 1:SizeZ
            for nn = 1:SizeY-1
                for mm = 1:SizeX-1
                    Hz[mm,nn,pp] = (Chzh[mm,nn,pp] * Hz[mm,nn,pp] + 
                                    Chze[mm,nn,pp] * ((Ex[mm,nn+1,pp] - Ex[mm,nn,pp]) - (Ey[mm+1,nn,pp] - Ey[mm,nn,pp])))
                end;end;end
        
        return Hx,Hy,Hz
    end

    #update electric field  (E) over all space at a given time step given electric and magnetic fields from the previous time step
    function updateE(Ex,Ey,Ez,Hx,Hy,Hz) #update electric field

        @inbounds for pp = 2:SizeZ-1
            for nn = 2:SizeY-1
                for mm = 1:SizeX-1
                    Ex[mm,nn,pp] = (Cexe[mm,nn,pp] * Ex[mm,nn,pp] + 
                                    Cexh[mm,nn,pp] * ((Hz[mm,nn,pp] - Hz[mm,nn-1,pp]) - (Hy[mm,nn,pp] - Hy[mm,nn,pp-1])))
                end;end;end

        @inbounds for pp = 2:SizeZ-1
            for nn = 1:SizeY-1
                for mm = 2:SizeX-1
                    Ey[mm,nn,pp] = (Ceye[mm,nn,pp] * Ey[mm,nn,pp] + 
                                    Ceyh[mm,nn,pp] * ((Hx[mm,nn,pp] - Hx[mm,nn,pp-1]) - (Hz[mm,nn,pp] - Hz[mm-1,nn,pp])))
                end;end;end
        
        @inbounds for pp = 1:SizeZ-1
            for nn = 2:SizeY-1
                for mm = 2:SizeX-1
                    Ez[mm,nn,pp] = (Ceze[mm,nn,pp] * Ez[mm,nn,pp] + 
                                    Cezh[mm,nn,pp] * ((Hy[mm,nn,pp] - Hy[mm-1,nn,pp]) - (Hx[mm,nn,pp] - Hx[mm,nn-1,pp])))
                end;end;end
        
        return Ex,Ey,Ez
    end

    #apply absorbing boundary conditions (ABCs) at the boundary planes of all x,y,z dimensions
    function abc3dfirst(Ex,Ey,Ez)
        # global Eyx0, Ezx0, Eyx1, Ezx1, Exy0, Ezy0, Exy1, Ezy1, Exz0, Eyz0, Exz1, Eyz1
        
        
        # =============================================================================
        #ABC at x boundary plane 1 
        mm = 1
        @inbounds for pp = 1:SizeZ
            for nn = 1:SizeY-1
                Ey[mm,nn,pp] = Eyx0[nn,pp] + abccoef * (Ey[mm+1,nn,pp] - Ey[mm,nn,pp])
                Eyx0[nn,pp] = Ey[mm+1,nn,pp]
            end;end
        
        for pp = 1:SizeZ-1
            for nn = 1:SizeY
                Ez[mm,nn,pp] = Ezx0[nn,pp] + abccoef * (Ez[mm+1,nn,pp] - Ez[mm,nn,pp])
                Ezx0[nn,pp] = Ez[mm+1,nn,pp]
            end;end
        
        
        #ABC at x boundary plane 2
        mm = SizeX
        @inbounds for pp = 1:SizeZ
            for nn = 1:SizeY-1
                Ey[mm,nn,pp] = Eyx1[nn,pp] + abccoef * (Ey[mm-1,nn,pp] - Ey[mm,nn,pp])
                Eyx1[nn,pp] = Ey[mm-1,nn,pp]
            end;end
        
        @inbounds for pp = 1:SizeZ-1
            for nn = 1:SizeY
                Ez[mm,nn,pp] = Ezx1[nn,pp] + abccoef * (Ez[mm-1,nn,pp] - Ez[mm,nn,pp])
                Ezx1[nn,pp] = Ez[mm-1,nn,pp]
            end;end
        
        
        # =============================================================================    
        #ABC at  y boundary plane 1
        nn = 1
        @inbounds for pp = 1:SizeZ
            for mm = 1:SizeX-1
                Ex[mm,nn,pp] = Exy0[mm,pp] + abccoef * (Ex[mm,nn+1,pp] - Ex[mm,nn,pp])
                Exy0[mm,pp] = Ex[mm,nn+1,pp]
            end;end
        
        @inbounds for pp = 1:SizeZ-1
            for mm = 1:SizeX
                Ez[mm,nn,pp] = Ezy0[mm,pp] + abccoef * (Ez[mm,nn+1,pp] - Ez[mm,nn,pp])
                Ezy0[mm,pp] = Ez[mm,nn+1,pp]
            end;end
        
        
        
        #ABC at  x boundary plane 2
        nn = SizeY
        @inbounds for pp = 1:SizeZ
            for mm = 1:SizeX-1
                Ex[mm,nn,pp] = Exy1[mm,pp] + abccoef * (Ex[mm,nn-1,pp] - Ex[mm,nn,pp])
                Exy1[mm,pp] = Ex[mm,nn-1,pp]
            end;end
        
        @inbounds for pp = 1:SizeZ-1
            for mm = 1:SizeX
                Ez[mm,nn,pp] = Ezy1[mm,pp] + abccoef * (Ez[mm,nn-1,pp] - Ez[mm,nn,pp])
                Ezy1[mm,pp] = Ez[mm,nn-1,pp]
            end;end
        
        
        # =============================================================================    
        #ABC at  z boundary plane 1
        pp = 1
        @inbounds for nn = 1:SizeY
            for mm = 1:SizeX-1
                Ex[mm,nn,pp] = Exz0[mm,nn] + abccoef * (Ex[mm,nn,pp+1] - Ex[mm,nn,pp])
                Exz0[mm,nn] = Ex[mm,nn,pp+1]
            end;end
        
        @inbounds for nn = 1:SizeY-1
            for mm = 1:SizeX
                Ey[mm,nn,pp] = Eyz0[mm,nn] + abccoef * (Ey[mm,nn,pp+1] - Ey[mm,nn,pp])
                Eyz0[mm,nn] = Ey[mm,nn,pp+1]
            end;end
        

        
        #ABC at  z boundary plane 2
        pp = SizeZ
        @inbounds for nn = 1:SizeY
            for mm = 1:SizeX-1
                Ex[mm,nn,pp] = Exz1[mm,nn] + abccoef * (Ex[mm,nn,pp-1] - Ex[mm,nn,pp])
                Exz1[mm,nn] = Ex[mm,nn,pp-1]
            end;end
        
        @inbounds for nn = 1:SizeY-1
            for mm = 1:SizeX
                Ey[mm,nn,pp] = Eyz1[mm,nn] + abccoef * (Ey[mm,nn,pp-1] - Ey[mm,nn,pp])
                Eyz1[mm,nn] = Ey[mm,nn,pp-1]
            end;end

        return  Ex,Ey,Ez
    end

    #define the driving source - a Ricker wavelet:  https://en.wikipedia.org/wiki/Mexican_hat_wavelet
    function ricker(t,location)
        src = 0
        delay = 5
        if t > delay
            arg = Ļ€ * ((Cdtds * (t - (delay+1)) - location) / ppw - 1)
            src = (1.0 - 2.0 * arg^2) * exp(-(arg^2))
        end
        return src
    end

    #Main control code block.  Step through applicable functions
    for qtime = 1:MaxTime #time stepping loop
        Hx,Hy,Hz = updateH(Ex,Ey,Ez,Hx,Hy,Hz)  #magnetic field update equation
        Ex,Ey,Ez = updateE(Ex,Ey,Ez,Hx,Hy,Hz)  #electric field update equation
        Ex[24,23,23] += ricker(qtime,0)        #update Ricker wavelet source
        Ex,Ey,Ez = abc3dfirst(Ex,Ey,Ez)        #apply absorbing boundary conditions

        Eout[:,:,qtime] = Ex[24,:,:] #Save constant-x plane time domain data to a 3D matrix
        #source[qtime] = ricker(qtime,0)       #source output plot for troubleshooting - normally commented out
    end  


    CPUtoc()
    println("elapsed real time: ", round(time() - start; digits=3)," seconds")
    println("Computation Complete")

end

Thereā€™s probably some remaining low-hanging fruit to improve the speed but thatā€™s a good start. I should also mention that this is not written in a recommended julia style, but thatā€™s less important.

16 Likes

Excellent! Would you mind posting the code with your changes? Iā€™m not quite sure what you mean, but will definitely do some research. Thanks!

Thank you for the reference!

I have edited my comment to include the code.

1 Like

Thanks for posting the revised code. You mention recommended Julia style. I did a quick search and came up with this link:

https://docs.julialang.org/en/v1/manual/style-guide/index.html

Any other sources you would recommend?

Donā€™t each of those index calls have to check to see if theyā€™re inbounds? This is probably a bit beyond someone just starting Julia, but if you do appropriate bounds checking at the beginning of all of this then the rest can be considered inbounds.

2 Likes

From 20 minutes of research on this subject I think bounds checking means making sure the index calls donā€™t actually reference an array element that doesnā€™t exist? If so, the FDTD calculations are carefully worked out so the bounds are well established beforehand. Is there a way to disable bounds checking in a more global fashion rather than using @inbounds around every for loop?

What are the specs of your computer? I just ran the revised code on my machine and am still averaging around 25 seconds.

Sorry, I should be more detailed. But Iā€™m writing this on my phone. Iā€™m also having trouble reading the large wall of code on my phone, so Iā€™m hesitant to make any big suggestions that could crash stuff. You can call @inbounds at different levels, not just for loops. It has a docstring you can get in REPL.

4 Likes

Wait, how did you get your original code to run? I get an error when I try to run it because the arrays that arenā€™t declared as constant are out of scope of the functions that try to use them.

Since they are not declared inside a function are they not inherently in the global scope?

Thatā€™s probably good enough, but there are some other style guides out there that people like, such as GitHub - invenia/BlueStyle: A Julia style guide that lives in a blue world. Iā€™m not much one for ā€œstyle guidesā€, but theyā€™re at least helpful in the beginning before you absorb what the rest of the community is doing.

Generally, I would strongly recommend against all these global variables, and writing your loops either in the form

for pp = 1:SizeZ-1, nn = 1:SizeY-1, mm = 1:SizeX
    # ... 
end

or

for pp = 1:SizeZ-1
    for nn = 1:SizeY-1
        for mm = 1:SizeX
            # ...
        end
    end
end

Iā€™d also never write a for loop in the global scope.

Are you sure youā€™re running the revised code? I got the same timings you got on the original code and then my edits brought that down to 0.54 seconds. I suspect you might be accidentally running the old code.

I ran the code on a macbook pro 13 inch 2016 with a 2.3 GHz i5 processor and 8GB of 2133 MHz LPDDR3 ram.

2 Likes

Ah, good call. It wasnā€™t the original code, but it was a block of code that had another issue. Fixed that and I now get the same as you. Thanks. Comparable machines. Is there a way to globally block bound checking without having to do individual @inbound macros on each of the outer for-loops?

You can always startup julia like julia --check-bounds=no to globally turn off bounds checking. I usually prefer to do things like

@inbounds begin
    for pp = 1:SizeZ-1, nn = 1:SizeY-1, mm = 1:SizeX
        Hx[mm,nn,pp] = (Chxh[mm,nn,pp] * Hx[mm,nn,pp] + 
                        Chxe[mm,nn,pp] * ((Ey[mm,nn,pp+1] - Ey[mm,nn,pp]) - (Ez[mm,nn+1,pp] - Ez[mm,nn,pp])))
    end

    for pp = 1:SizeZ-1, nn = 1:SizeY, mm = 1:SizeX-1
        Hy[mm,nn,pp] = (Chyh[mm,nn,pp] * Hy[mm,nn,pp] + 
                        Chye[mm,nn,pp] * ((Ez[mm+1,nn,pp] - Ez[mm,nn,pp]) - (Ex[mm,nn,pp+1] - Ex[mm,nn,pp])))
    end
    
    for pp = 1:SizeZ, nn = 1:SizeY-1, mm = 1:SizeX-1
        Hz[mm,nn,pp] = (Chzh[mm,nn,pp] * Hz[mm,nn,pp] + 
                        Chze[mm,nn,pp] * ((Ex[mm,nn+1,pp] - Ex[mm,nn,pp]) - (Ey[mm+1,nn,pp] - Ey[mm,nn,pp])))
    end
end

if Iā€™m going to be turning off bounds checks on a big chunk of code.

6 Likes

Summary thus far:
Thanks all for the input. I did some experimentation with the various suggestions and my results are posted below. Further input is welcome. However, this progress got me more firmly in the Julia camp. I was beginning to doubt the ā€œruns like Cā€ bit and was starting to develop a C program based on my Julia prototype for another project - glad I can give up on that approach. I suspect my Python code in the original comparison is probably in need of optimization as well, but no time for that.

FDTD Code CPU Time (seconds)								
	    Original	Case A	Case B	Case C	Case D	Case E	Case F	Case G
Compile	    43.6	  33.9	  24.0	   3.7	   2.4	  1.31	  1.20	 0.828
Run 1	    40.9	  33.2	  21.8	   2.9	   2.1	  1.11	  1.09	 0.609
Run 2	    41.7	  32.9	  23.3	   2.8	   2.2	  1.14	  1.11	 0.595
Run 3	    42.6	  32.3	  22.3	   2.5	   2.0	  1.13	  1.08	 0.657
Run 4	    40.1	  32.2	  23.7	   2.8	   2.1	  1.16	  1.11	 0.609
Run 5	    40.5	  33.1	  22.7	   2.9	   2.1	  1.14	  1.12	 0.625
Avg (1-5)	41.2	  32.7	  22.8	   2.8	   2.1	  1.14	  1.10	 0.620
Ratio to C 	96.6	  76.9	  53.4	   6.5	   4.9	  2.7	  2.6	 1.5

  • Original: Written by an EE relatively new to Julia.
  • Case A: =Nested for-loops reordered to process array columns before rows.
  • Case B: =Case A plus define global constants with ā€œconstā€
  • Case C: =Case B plus move Ceā€¦, Chā€¦ global fill terms within their respective functions.
  • Case D: =Case C plus wrap entire block (other than ā€˜usingā€¦ā€™) in ā€˜letā€¦endā€™. Removed ā€˜constā€™ from global constants. Ceā€¦, Chā€¦ terms still in respective functions.
  • Case E: =Case A plus ā€˜letā€¦endā€™ wrapping all but the ā€˜usingā€¦ā€™ line. Ceā€¦ And Chā€¦ terms are not in their respective functions.
  • Case F: =Case E plus ā€˜@inbound beginā€¦endā€™ wrapping all but ā€˜usingā€¦ā€™.
  • Case G: =Case E plus ā€˜@ inboundā€™ individually on for-loops used to operate on 3D arrays.

The full code block in the first post corresponds to Case B
The full code block in the post by Mason corresponds to Case G

17 Likes

I tried wrapping the whole code block (minus the package import) in ā€˜@inbounds beginā€¦endā€™ and didnā€™t see the same improvement as was evident with application of the macro to individual for-loops. See Case E, F, and G in the summary post.