I have been using Julia for about a year. I am an EE, not a programmer. In one of my graduate EE courses I requested to use Julia instead of C for an assignment and got the task of comparing performance of C, Julia, and Python. As a test case I did a 3D finite difference time domain simulation of electromagnetic wave propagation. I compared timing and got the following results. More detail on the approach is below. Also the code is posted below. Any suggestions for how to speed things up further?
EDIT/Update - Substantial improvements were made through the suggestions in this thread. See this post for a summary of some of the improvements.
Table 1 - Execution Time (Seconds)
Run Number C Julia Python
1 0.410 22.7 144.4
2 0.438 22.5 144.6
3 0.438 23.7 146.7
4 0.428 21.7 144.4
5 0.420 24.0 146.8
6 0.421 21.8 144.5
Avg 0.426 22.6 145.2
Ratio to C 1.00 52.9 341.1
Test Information:
- The grid size is 48 x 47 x 47 (106,032 points). I changed the C code to match this grid size.
- MaxTime is 150 (giving total points calculated of 15,904,800). This was enough time for the waves to propagate beyond the edge of the grid.
- Computer is a Windows 10 64 bit machine with:
a. Intel Core i7-6820HQ CPU @ 2.7 GHz
b. 32 GB of RAM - Julia Version: 1.1.1
- Python Version: 3.7.3
- MingW/GCC Version: 8.2.0-3
- The code in each platform was set to complete the computations and store constant Ex plane results for each time-step to a 3x3x3 array called Eout (dimensions SizeY x SizeZ x MaxTime). No output plots or animations were included. I adapted your C code by removing the snapshot3d related files. I added the output variable (Eout) to the 3ddemo.c file.
- I did a detailed evaluation of the Julia output to make sure the results made sense and that the wave behaved as expected.
- I compared snapshots of results across all three platforms to make sure each was computing the same result. There were only very small variations (on the order of 0.001% difference between platforms).
- I closed all unnecessary applications on my computer before running the timing comparisons, one platform at a time. I waited until CPU usage was below 3% before running any script or code. I then watched the task manager throughout the execution to make sure Windows did not open applications that consumed significant resources. Windows did open burdensome network applications on a couple of occasions so I repeated runs as necessary until I got 6 sets of āundisturbedā results for each platform.
- Timing of the Julia script was done with a module called āCPUtimeā which tracks total CPU time for the script execution.
- Timing of the Python script was done with a module called ātimeitā which calls the ādefault_timer()ā function. This links to ātime.perfcounter()ā. According to discussions on StackOverflow (see link) this function is a CPU counter. python - Understanding time.perf_counter() and time.process_time() - Stack Overflow.
- Where possible I compared the CPU time against ārealā time measurements. In all cases, the CPU time and real time were very close. Julia and Python are not set with hyperthreading so CPU times are expected to be close to real time. I tried hyperthreading in Julia, but it did not improve performance. I found a thread in a Julia forum where one of the developers said this was an expected result for Basic Linear Algebra Subprograms (Blas). https://github.com/JuliaLang/julia/issues/6293.
- I used the Jupyter Notebook Environment for both the Julia and Python scripts. I also ran the scripts in a separate IDE to make sure that the timing results were not significantly affected by Jupyter.
- Timing of the C code was done using a Windows PowerShell command: C:\C_FDTD>powershell āMeasure-Command { .\3ddemo.exe}ā. The output of this command is real time. A custom option exists to get CPU time, but I do not have rights on my computer to import custom scripts into PowerShell. As mentioned above, the CPU time and real time will be close unless parallel cores are being used. To preclude the possibility of having multiple cores in play I used the task manager to force commands to one core.
Code:
#3D finite difference time domain (FDTD) for an elementary dipole antenna excited by a Ricker Wavelet
#This code only generates tabular results and does not generate the animated plot.
using CPUTime
CPUtic()
start = time()
#Set constants - these are physical domain sizes for the FDTD problem (nearly a cubic space)
const SizeX = 48
const SizeY = 47
const SizeZ = 47
const MaxTime = 150 #this is the time domain size for the FDTD problem
const Cdtds = 1.0/(sqrt(3.0)) #Courant number
const imp0 = 377.0 #characteristic impedance of free space
const abccoef = (Cdtds - 1.0)/(Cdtds + 1.0) #coefficient associated with boundary conditions
const ppw = 15 #points per wave
#initialize arrays to store electric and magnetic field values in 3D space
Ex = zeros(Float64,(SizeX,SizeY,SizeZ))
Ey = zeros(Float64,(SizeX,SizeY,SizeZ))
Ez = zeros(Float64,(SizeX,SizeY,SizeZ))
Hx = zeros(Float64,(SizeX,SizeY,SizeZ))
Hy = zeros(Float64,(SizeX,SizeY,SizeZ))
Hz = zeros(Float64,(SizeX,SizeY,SizeZ))
#Define arrays for plotting results.
Eout = zeros(Float64,(SizeY,SizeZ,MaxTime)) #Ouput will be electric field on a 2D cross section
tvar = collect(1:MaxTime) #time array
#Set electric field update coefficients
Cexe = fill(1.0,(SizeX,SizeY,SizeZ))
Cexh = fill(Cdtds * imp0,(SizeX,SizeY,SizeZ))
Ceye = fill(1.0,(SizeX,SizeY,SizeZ))
Ceyh = fill(Cdtds * imp0,(SizeX,SizeY,SizeZ))
Ceze = fill(1.0,(SizeX,SizeY,SizeZ))
Cezh = fill(Cdtds * imp0,(SizeX,SizeY,SizeZ))
#Set magnetic field update coefficients
Chxh = fill(1.0,(SizeX,SizeY,SizeZ))
Chxe = fill(Cdtds / imp0,(SizeX,SizeY,SizeZ))
Chyh = fill(1.0,(SizeX,SizeY,SizeZ))
Chye = fill(Cdtds / imp0,(SizeX,SizeY,SizeZ))
Chzh = fill(1.0,(SizeX,SizeY,SizeZ))
Chze = fill(Cdtds / imp0,(SizeX,SizeY,SizeZ))
#Initialize temporary variables for absorbing boundary condition (ABC) function
Eyx0 = zeros(Float64, (SizeY-1,SizeZ))
Ezx0 = zeros(Float64, (SizeY,SizeZ-1))
Eyx1 = zeros(Float64, (SizeY-1,SizeZ))
Ezx1 = zeros(Float64, (SizeY,SizeZ-1))
Exy0 = zeros(Float64, (SizeX-1,SizeZ))
Ezy0 = zeros(Float64, (SizeX,SizeZ-1))
Exy1 = zeros(Float64, (SizeX-1,SizeZ))
Ezy1 = zeros(Float64, (SizeX,SizeZ-1))
Exz0 = zeros(Float64, (SizeX-1,SizeY))
Eyz0 = zeros(Float64, (SizeX,SizeY-1))
Exz1 = zeros(Float64, (SizeX-1,SizeY))
Eyz1 = zeros(Float64, (SizeX,SizeY-1))
#update magnetic field (H) over all space at a given time step given electric and magnetic fields from the previous time step
#achieved by nested for loops
function updateH(Ex,Ey,Ez,Hx,Hy,Hz)
for pp = 1:SizeZ-1
for nn = 1:SizeY-1
for mm = 1:SizeX
Hx[mm,nn,pp] = (Chxh[mm,nn,pp] * Hx[mm,nn,pp] +
Chxe[mm,nn,pp] * ((Ey[mm,nn,pp+1] - Ey[mm,nn,pp]) - (Ez[mm,nn+1,pp] - Ez[mm,nn,pp])))
end;end;end
for pp = 1:SizeZ-1
for nn = 1:SizeY
for mm = 1:SizeX-1
Hy[mm,nn,pp] = (Chyh[mm,nn,pp] * Hy[mm,nn,pp] +
Chye[mm,nn,pp] * ((Ez[mm+1,nn,pp] - Ez[mm,nn,pp]) - (Ex[mm,nn,pp+1] - Ex[mm,nn,pp])))
end;end;end
for pp = 1:SizeZ
for nn = 1:SizeY-1
for mm = 1:SizeX-1
Hz[mm,nn,pp] = (Chzh[mm,nn,pp] * Hz[mm,nn,pp] +
Chze[mm,nn,pp] * ((Ex[mm,nn+1,pp] - Ex[mm,nn,pp]) - (Ey[mm+1,nn,pp] - Ey[mm,nn,pp])))
end;end;end
return Hx,Hy,Hz
end
#update electric field (E) over all space at a given time step given electric and magnetic fields from the previous time step
function updateE(Ex,Ey,Ez,Hx,Hy,Hz) #update electric field
for pp = 2:SizeZ-1
for nn = 2:SizeY-1
for mm = 1:SizeX-1
Ex[mm,nn,pp] = (Cexe[mm,nn,pp] * Ex[mm,nn,pp] +
Cexh[mm,nn,pp] * ((Hz[mm,nn,pp] - Hz[mm,nn-1,pp]) - (Hy[mm,nn,pp] - Hy[mm,nn,pp-1])))
end;end;end
for pp = 2:SizeZ-1
for nn = 1:SizeY-1
for mm = 2:SizeX-1
Ey[mm,nn,pp] = (Ceye[mm,nn,pp] * Ey[mm,nn,pp] +
Ceyh[mm,nn,pp] * ((Hx[mm,nn,pp] - Hx[mm,nn,pp-1]) - (Hz[mm,nn,pp] - Hz[mm-1,nn,pp])))
end;end;end
for pp = 1:SizeZ-1
for nn = 2:SizeY-1
for mm = 2:SizeX-1
Ez[mm,nn,pp] = (Ceze[mm,nn,pp] * Ez[mm,nn,pp] +
Cezh[mm,nn,pp] * ((Hy[mm,nn,pp] - Hy[mm-1,nn,pp]) - (Hx[mm,nn,pp] - Hx[mm,nn-1,pp])))
end;end;end
return Ex,Ey,Ez
end
#apply absorbing boundary conditions (ABCs) at the boundary planes of all x,y,z dimensions
function abc3dfirst(Ex,Ey,Ez)
global Eyx0, Ezx0, Eyx1, Ezx1, Exy0, Ezy0, Exy1, Ezy1, Exz0, Eyz0, Exz1, Eyz1
# =============================================================================
#ABC at x boundary plane 1
mm = 1
for pp = 1:SizeZ
for nn = 1:SizeY-1
Ey[mm,nn,pp] = Eyx0[nn,pp] + abccoef * (Ey[mm+1,nn,pp] - Ey[mm,nn,pp])
Eyx0[nn,pp] = Ey[mm+1,nn,pp]
end;end
for pp = 1:SizeZ-1
for nn = 1:SizeY
Ez[mm,nn,pp] = Ezx0[nn,pp] + abccoef * (Ez[mm+1,nn,pp] - Ez[mm,nn,pp])
Ezx0[nn,pp] = Ez[mm+1,nn,pp]
end;end
#ABC at x boundary plane 2
mm = SizeX
for pp = 1:SizeZ
for nn = 1:SizeY-1
Ey[mm,nn,pp] = Eyx1[nn,pp] + abccoef * (Ey[mm-1,nn,pp] - Ey[mm,nn,pp])
Eyx1[nn,pp] = Ey[mm-1,nn,pp]
end;end
for pp = 1:SizeZ-1
for nn = 1:SizeY
Ez[mm,nn,pp] = Ezx1[nn,pp] + abccoef * (Ez[mm-1,nn,pp] - Ez[mm,nn,pp])
Ezx1[nn,pp] = Ez[mm-1,nn,pp]
end;end
# =============================================================================
#ABC at y boundary plane 1
nn = 1
for pp = 1:SizeZ
for mm = 1:SizeX-1
Ex[mm,nn,pp] = Exy0[mm,pp] + abccoef * (Ex[mm,nn+1,pp] - Ex[mm,nn,pp])
Exy0[mm,pp] = Ex[mm,nn+1,pp]
end;end
for pp = 1:SizeZ-1
for mm = 1:SizeX
Ez[mm,nn,pp] = Ezy0[mm,pp] + abccoef * (Ez[mm,nn+1,pp] - Ez[mm,nn,pp])
Ezy0[mm,pp] = Ez[mm,nn+1,pp]
end;end
#ABC at x boundary plane 2
nn = SizeY
for pp = 1:SizeZ
for mm = 1:SizeX-1
Ex[mm,nn,pp] = Exy1[mm,pp] + abccoef * (Ex[mm,nn-1,pp] - Ex[mm,nn,pp])
Exy1[mm,pp] = Ex[mm,nn-1,pp]
end;end
for pp = 1:SizeZ-1
for mm = 1:SizeX
Ez[mm,nn,pp] = Ezy1[mm,pp] + abccoef * (Ez[mm,nn-1,pp] - Ez[mm,nn,pp])
Ezy1[mm,pp] = Ez[mm,nn-1,pp]
end;end
# =============================================================================
#ABC at z boundary plane 1
pp = 1
for nn = 1:SizeY
for mm = 1:SizeX-1
Ex[mm,nn,pp] = Exz0[mm,nn] + abccoef * (Ex[mm,nn,pp+1] - Ex[mm,nn,pp])
Exz0[mm,nn] = Ex[mm,nn,pp+1]
end;end
for nn = 1:SizeY-1
for mm = 1:SizeX
Ey[mm,nn,pp] = Eyz0[mm,nn] + abccoef * (Ey[mm,nn,pp+1] - Ey[mm,nn,pp])
Eyz0[mm,nn] = Ey[mm,nn,pp+1]
end;end
#ABC at z boundary plane 2
pp = SizeZ
for nn = 1:SizeY
for mm = 1:SizeX-1
Ex[mm,nn,pp] = Exz1[mm,nn] + abccoef * (Ex[mm,nn,pp-1] - Ex[mm,nn,pp])
Exz1[mm,nn] = Ex[mm,nn,pp-1]
end;end
for nn = 1:SizeY-1
for mm = 1:SizeX
Ey[mm,nn,pp] = Eyz1[mm,nn] + abccoef * (Ey[mm,nn,pp-1] - Ey[mm,nn,pp])
Eyz1[mm,nn] = Ey[mm,nn,pp-1]
end;end
return Ex,Ey,Ez
end
#define the driving source - a Ricker wavelet: https://en.wikipedia.org/wiki/Mexican_hat_wavelet
function ricker(t,location)
src = 0
delay = 5
if t > delay
arg = Ļ * ((Cdtds * (t - (delay+1)) - location) / ppw - 1)
src = (1.0 - 2.0 * arg^2) * exp(-(arg^2))
end
return src
end
#Main control code block. Step through applicable functions
for qtime = 1:MaxTime #time stepping loop
Hx,Hy,Hz = updateH(Ex,Ey,Ez,Hx,Hy,Hz) #magnetic field update equation
Ex,Ey,Ez = updateE(Ex,Ey,Ez,Hx,Hy,Hz) #electric field update equation
Ex[24,23,23] += ricker(qtime,0) #update Ricker wavelet source
Ex,Ey,Ez = abc3dfirst(Ex,Ey,Ez) #apply absorbing boundary conditions
Eout[:,:,qtime] = Ex[24,:,:] #Save constant-x plane time domain data to a 3D matrix
#source[qtime] = ricker(qtime,0) #source output plot for troubleshooting - normally commented out
end
CPUtoc()
println("elapsed real time: ", round(time() - start;digits=3)," seconds")
println("Computation Complete")
The corresponding C code can be found on page 252 of:
https://www.eecs.wsu.edu/~schneidj/ufdtd/ufdtd.pdf
Understanding the FDTD Method by John B. Schneider is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.