I am writing some code that performs a simple time-history integration solver (Newmark method for structural dynamics) and I looking for some suggestions in how I can improve the performance of my code. The below snippet of code is part of a larger function that is called from another script.

Currently, I am calculating a parameter called displacements for many time steps (think on the order of 5,000,000 steps). From observing the code and running ProfileView to profile the code it is apparent that the matrix multiplication seems to be slowing down the script in the for-loop. Some details about the input:

dofs = number of degrees of freedom in the system (at most 1282)

num_time_steps = 5,000,000

yF = vector of size (num_steps, 1)

loading_vec = vector of size (dofs-2, 1)

a1, a2, a3, k_hat_inv = matrices of size (dofs-2, dofs-2) (a1, a2, a3 are sparsely populated matrices, k_hat_inv is a full matrix)

vel_const1, vel_const2, vel_const3 = Float64 constant

acc_const1, acc_const1, acc_const1= Float64 constant

For a simple example when dofs = 12 this code runs very quickly (~3-4s) but as dofs increases to 1282 the run time increases very significantly (still running after 1/2 day) and I would like to find some ways to decrease this run time. Are there any other ways that I can improve the performance of this code?

You can also see in the for-loop some commented out lines, these are all things I have tried but resulted in poorer performance compared to the uncommented lines.

```
# Initial state, forst time step at t = 0s
cdisp = zeros(dofs - 2)
cvel = zeros(dofs - 2)
#caccel = m_inv * (loading_vec .- (c * cvel) .- k * cdisp) -> Always zero for zero initial conditions
caccel = zeros(dofs - 2)
u_i_p1 = zeros(dofs-2)
udot_i_p1 = zeros(dofs-2)
u2dot_i_p1 = zeros(dofs-2)
a1_vec = zeros(dofs-2)
a2_vec = zeros(dofs-2)
a3_vec = zeros(dofs-2)
P_i_p1_hat = zeros(dofs-2)
# Initialize the displacement vector
displacements = zeros(dofs - 2, disp_steps)
# Start the time history analysis as per Chopra, 2017
disp_counter = 1
println(" Time history started")
for ii = 1:num_time_steps-1
# Obtain the force for the next time step
@inbounds loading_vec[impact_dof] = yF[ii+1]
# Obtain P_i_p1_hat
#P_i_p1_hat = loading_vec + a1 * cdisp + a2 * cvel + a3 * caccel
P_i_p1_hat .= loading_vec .+ mul!(a1_vec, a1, cdisp) .+ mul!(a2_vec, a2, cvel) .+ mul!(a3_vec, a3, caccel)
# Calculate the displacement due to this force, u_i+1 = k_hat_inv * P_i_p1
#u_i_p1 = k_hat_inv * P_i_p1_hat
mul!(u_i_p1, k_hat_inv, P_i_p1_hat)
#u_i_p1 = k_hat_inv * (loading_vec + a1 * cdisp + a2 * cvel + a3 * caccel)
#mul!(u_i_p1, k_hat_inv, (loading_vec .+ (a1 * cdisp) .+ (a2 * cvel) .+ (a3 * caccel)))
#mul!(u_i_p1, k_hat_inv, (loading_vec .+ mul!(a1_vec, a1, cdisp) .+ mul!(a2_vec, a2, cvel) .+ mul!(a3_vec, a3, caccel)))
# Calculate the velocity due to this displacement, u_dot_i+1
#udot_i_p1 = vel_const1 * (u_i_p1 - cdisp) + vel_const2 * cvel + vel_const3 * caccel
udot_i_p1 .= (vel_const1 .* (u_i_p1 .- cdisp)) .+ (vel_const2 .* cvel) .+ (vel_const3 .* caccel)
# Calculate the acceleration due to this displacement, u_2dot_i+1
#u2dot_i_p1 = acc_const1 * (u_i_p1 - cdisp) - acc_const2 * cvel - acc_const3 * caccel
u2dot_i_p1 .= (acc_const1 .* (u_i_p1 .- cdisp)) .- (acc_const2 .* cvel) .- (acc_const3 .* caccel)
# Get current values for the next step
cdisp .= u_i_p1
cvel .= udot_i_p1
caccel .= u2dot_i_p1
# Write the displacement vector
if ii % disp_save == 0
disp_counter += 1
displacements[:, disp_counter] .= u_i_p1
end
end
```