Code that uses MPI.jl does not run on cluster

I have written a version of my code that uses MPI.jl to achieve a potentially large scale parallelization. I already had MPI installed on my PC and used the instructions from here to make sure that the same mpiexec is being used by MPI.jl and the system.

I tested out my code on my PC to confirm that the code is running and producing the correct outputs (that agree with the serial version), they can be found on my other thread. The full code is given in a post in my other thread by me. The only change being that I have used dd=11 in the below benchmark that I ran on the cluster.

however on running the job on my cluster, I am being presented with the following error

[details=“Error”]

[cn152][[15784,1],2][btl_openib_component.c:1705:init_one_device] error obtaining device attributes for mlx5_0 errno says No space left on device
[cn152][[15784,1],33][btl_openib_component.c:1705:init_one_device] error obtaining device attributes for mlx5_0 errno says No space left on device
--------------------------------------------------------------------------
WARNING: There was an error initializing an OpenFabrics device.

  Local host:   cn152
  Local device: mlx5_0
--------------------------------------------------------------------------
[cn152][[15784,1],39][btl_openib_component.c:1705:init_one_device] error obtaining device attributes for mlx5_0 errno says No space left on device
[cn152][[15784,1],31][btl_openib_component.c:1705:init_one_device] error obtaining device attributes for mlx5_0 errno says No space left on device
[cn152][[15784,1],29][btl_openib_component.c:1705:init_one_device] error obtaining device attributes for mlx5_0 errno says No space left on device
[cn152][[15784,1],23][btl_openib_component.c:1705:init_one_device] error obtaining device attributes for mlx5_0 errno says No space left on device
[cn152][[15784,1],5][btl_openib_component.c:1705:init_one_device] error obtaining device attributes for mlx5_0 errno says No space left on device
[cn152][[15784,1],20][btl_openib_component.c:1705:init_one_device] error obtaining device attributes for mlx5_0 errno says No space left on device
[cn152][[15784,1],17][btl_openib_component.c:1705:init_one_device] error obtaining device attributes for mlx5_0 errno says No space left on device
[cn152][[15784,1],13][btl_openib_component.c:1705:init_one_device] error obtaining device attributes for mlx5_0 errno says No space left on device
[cn152][[15784,1],35][btl_openib_component.c:1705:init_one_device] error obtaining device attributes for mlx5_0 errno says No space left on device
[cn152][[15784,1],9][btl_openib_component.c:1705:init_one_device] error obtaining device attributes for mlx5_0 errno says No space left on device
[cn152][[15784,1],18][btl_openib_component.c:1705:init_one_device] error obtaining device attributes for mlx5_0 errno says No space left on device
[cn152][[15784,1],32][btl_openib_component.c:1705:init_one_device] error obtaining device attributes for mlx5_0 errno says No space left on device
[cn152][[15784,1],38][btl_openib_component.c:1705:init_one_device] error obtaining device attributes for mlx5_0 errno says No space left on device
[cn152][[15784,1],26][btl_openib_component.c:1705:init_one_device] error obtaining device attributes for mlx5_0 errno says No space left on device
[cn152][[15784,1],30][btl_openib_component.c:1705:init_one_device] error obtaining device attributes for mlx5_0 errno says No space left on device
[cn152][[15784,1],15][btl_openib_component.c:1705:init_one_device] error obtaining device attributes for mlx5_0 errno says No space left on device
[cn152][[15784,1],6][btl_openib_component.c:1705:init_one_device] error obtaining device attributes for mlx5_0 errno says No space left on device
[cn152][[15784,1],37][btl_openib_component.c:1705:init_one_device] error obtaining device attributes for mlx5_0 errno says No space left on device
[cn152][[15784,1],11][btl_openib_component.c:1705:init_one_device] error obtaining device attributes for mlx5_0 errno says No space left on device
[cn152][[15784,1],22][btl_openib_component.c:1705:init_one_device] error obtaining device attributes for mlx5_0 errno says No space left on device
[cn152][[15784,1],24][btl_openib_component.c:1705:init_one_device] error obtaining device attributes for mlx5_0 errno says No space left on device
[cn152][[15784,1],27][btl_openib_component.c:1705:init_one_device] error obtaining device attributes for mlx5_0 errno says No space left on device
[cn152][[15784,1],1][btl_openib_component.c:1705:init_one_device] error obtaining device attributes for mlx5_0 errno says No space left on device
[cn152][[15784,1],19][btl_openib_component.c:1705:init_one_device] error obtaining device attributes for mlx5_0 errno says No space left on device
[cn152][[15784,1],8][btl_openib_component.c:1705:init_one_device] error obtaining device attributes for mlx5_0 errno says No space left on device
[cn152][[15784,1],36][btl_openib_component.c:1705:init_one_device] error obtaining device attributes for mlx5_0 errno says No space left on device
[cn152][[15784,1],7][btl_openib_component.c:1705:init_one_device] error obtaining device attributes for mlx5_0 errno says No space left on device
[cn152][[15784,1],14][btl_openib_component.c:1705:init_one_device] error obtaining device attributes for mlx5_0 errno says No space left on device
[cn152][[15784,1],28][btl_openib_component.c:1705:init_one_device] error obtaining device attributes for mlx5_0 errno says No space left on device
[cn152][[15784,1],21][btl_openib_component.c:1705:init_one_device] error obtaining device attributes for mlx5_0 errno says No space left on device
[cn152][[15784,1],34][btl_openib_component.c:1705:init_one_device] error obtaining device attributes for mlx5_0 errno says No space left on device
[cn152][[15784,1],25][btl_openib_component.c:1705:init_one_device] error obtaining device attributes for mlx5_0 errno says No space left on device
[cn152][[15784,1],3][btl_openib_component.c:1705:init_one_device] error obtaining device attributes for mlx5_0 errno says No space left on device
[cn152][[15784,1],4][btl_openib_component.c:1705:init_one_device] error obtaining device attributes for mlx5_0 errno says No space left on device
[cn152][[15784,1],10][btl_openib_component.c:1705:init_one_device] error obtaining device attributes for mlx5_0 errno says No space left on device
[cn152][[15784,1],12][btl_openib_component.c:1705:init_one_device] error obtaining device attributes for mlx5_0 errno says No space left on device
[cn152][[15784,1],0][btl_openib_component.c:1705:init_one_device] error obtaining device attributes for mlx5_0 errno says No space left on device
[cn152][[15784,1],16][btl_openib_component.c:1705:init_one_device] error obtaining device attributes for mlx5_0 errno says No space left on device
[cn152:447286] 39 more processes have sent help message help-mpi-btl-openib.txt / error in device init
[cn152:447286] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
ERROR: ERROR: ERROR: ERROR: ERROR: ERROR: ERROR: ERROR: ERROR: ERROR: ERROR: ERROR: ERROR: ERROR: ERROR: ERROR: ERROR: ERROR: ERROR: ERROR: ERROR: ERROR: ERROR: ERROR: ERROR: LoadError: LoadError: LoadError: LoadError: LoadError: LoadError: LoadError: LoadError: ERROR: LoadError: LoadError: LoadError: LoadError: ERROR: LoadError: LoadError: LoadError: LoadError: LoadError: ERROR: LoadError: LoadError: LoadError: LoadError: ERROR: LoadError: LoadError: LoadError: LoadError: LoadError: LoadError: LoadError: LoadError: ArgumentError: ArgumentError: ichunk must be less or equal to nchunks
Stacktrace:
ArgumentError: ichunk must be less or equal to nchunks
Stacktrace:
 [1]  [1] getchunk(getchunk(array::array::ichunk must be less or equal to nchunks
Stacktrace:
ArgumentError: ichunk must be less or equal to nchunks
Stacktrace:
ArgumentError:  [1] ichunk must be less or equal to nchunks
Stacktrace:
getchunk(array:: [1] getchunk(array:: [1] getchunkArgumentError: ichunk must be less or equal to nchunks
.
.
.
.

--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[15784,1],21]
  Exit code:    1
--------------------------------------------------------------------------

I used mpiexec -n 40 julia mpi_parallel_timeloop.jl on the SLURM script to run my code.

It seems like the system is out of disk space (?) But then again it is also showing an error regarding the variable ichunks which did not occur when running it on my pc.

Formatting of your message is a bit off, code quoting swallowed your last couple of lines, it took me a bit to realise that was your text and not part of the error message.

Also, it’s a bit hard to provide concrete help without seeing the code you’re running.

What’s the MPI library configured to be used by MPI.jl? Is mpiexec the corresponding launcher? What’s the output of

using MPI
MPI.versioninfo()

?

Yes, it does seem that way.

Were you launching the same number of MPI processes on your PC?

Stupid reply from me. Forget Julia for the moment. Log onto your cluster and ‘module load openmpi’ or whatever mechanism you use to set the correct PATHs.

Search for ‘hello world mpi’ and create a small Hello World C program
mpicc -o hellp hello.c
Then submit a batch job to run that.
Using Slurm you can use salloc to allocate a few nodes then mpirun hello

I know this seems very simple, however if you cannot run a Hello World you cannot take things further.

2 Likes

Very good point. I would also suggest to try running a simple hello world making sure MPI on cluster works as expected. The error you are getting about

There was an error initializing an OpenFabrics device
mlx5_0

could mean MPI does not manage to access the Mellanox IO hardware (the fast interconnect MPI uses for inter-node communication).

Below is the output

[swapanc.uc@login06 anik]$ julia
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.9.3 (2023-08-24)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using MPI

julia> MPI.versioninfo()
MPIPreferences:
  binary:  system
  abi:     OpenMPI
  libmpi:  libmpi
  mpiexec: mpiexec

Package versions
  MPI.jl:             0.20.19
  MPIPreferences.jl:  0.1.11

Library information:
  libmpi:  libmpi
  libmpi dlpath:  /home/swapanc.uc/software/openmpi-4.1.1/lib/libmpi.so
  MPI version:  3.1.0
  Library version:  
    Open MPI v4.1.1, package: Open MPI swapanc.uc@login08.iitkgp.ac.in Distribution, ident: 4.1.1, repo rev: v4.1.1, Apr 24, 2021

julia> 

No. My PC only has 12 threads. So I was running it using 11 of them. I was using --use-hwthread-cpus option to make MPI bind to threads instead of physical CPU cores.

Stupid question from me - is there a system wide openmpi
Please type ‘module avail’

Maybe going a bit further than needed some useful utilities:

ibv_devinfo

ofed_info

ompi_info (assuming openmpi is being used)

My apologies, I do not know how this works. I just typed module avail in the cluster terminal. But that seems to show no result.

[swapanc.uc@login02 swapanc.uc]$ module avail 
^CMsg: /opt/ohpc/admin/lmod/lmod/libexec/loadModuleFile.lua:123: interrupted!
Lmod has detected the following error: Spider search timed out.

After quite some time I terminated the command.

It is there in the thread I mentioned. I couldn’t put it in this post itself due to the character limit. Either way, I have given it below

#



using MPI
MPI.Init()
using MKL
using DelimitedFiles
using LinearAlgebra
using LaTeXStrings
using TimerOutputs
using Plots
using Plots.PlotMeasures
using FFTW
using SharedArrays
using TimerOutputs
using ChunkSplitters

BLAS.set_num_threads(1)
const to = TimerOutput()


@views function GDSO(
    t::Float64,
    b::Float64,
    Δ::Float64,
    sysize::Int64,
    Ωf::Diagonal{Float64},
    Ωi::Diagonal{Float64},
    J::Matrix{Float64},
    D::Vector{Float64},
    ρ::Diagonal{Float64},
    Si::Diagonal{ComplexF64},
    Sf::Diagonal{ComplexF64},
    Bi::Diagonal{ComplexF64},
    Bf::Diagonal{ComplexF64},
    JpBi::Matrix{ComplexF64},
    JpSi::Matrix{ComplexF64},
    BB::Matrix{ComplexF64},
    AA::Matrix{ComplexF64},
    W::Matrix{ComplexF64},
    Wapp::Matrix{ComplexF64},
    Wapp_i1::Matrix{ComplexF64},
    V::Vector{ComplexF64},
    V_int1::Vector{ComplexF64},
    V_int2::Matrix{ComplexF64},
    BimSi::Diagonal{ComplexF64}
    
)
    t = t * 4.1341373335e16                     # converting time to atomic units
    tp = -im * b - t
    for i = 1:sysize
        Sf[i, i] = Ωf[i, i] / sin(Ωf[i, i] * t)
        Si[i, i] = Ωi[i, i] / sin(Ωi[i, i] * tp)
        Bf[i, i] = Ωf[i, i] / tan(Ωf[i, i] * t)
        Bi[i, i] = Ωi[i, i] / tan(Ωi[i, i] * tp)
    end
    # BB = (Bf + J' * Bi * J)
    # AA = (Sf + J' * Si * J)
    mul!(JpBi,J',Bi)
    mul!(BB,JpBi,J)
    BB .+= Bf
    mul!(JpSi,J',Si)
    mul!(AA,JpSi,J)
    AA .+= Sf

    # W = [BB -AA
    #     -AA BB] 
    W[1:sysize,1:sysize] .= BB
    W[1:sysize,sysize+1:2*sysize] .= -AA
    W[sysize+1:2*sysize,1:sysize] .= -AA
    W[sysize+1:2*sysize,sysize+1:2*sysize] .= BB

    #We wont calculate the 2Nx2N determinant directly.
    # Wapp = BB * (BB - AA * (BB \ AA))
    BBinvAA = BB \ AA
    mul!(Wapp_i1,AA,BBinvAA,-1,0)
    Wapp_i1 .+= BB
    mul!(Wapp,BB,Wapp_i1)

    # V = [J' * (Bi-Si) * D
    #      J' * (Bi-Si) * D]
    BimSi .= Bi - Si
    mul!(V_int1,BimSi,D)
    mul!(view(V, 1:sysize), J', V_int1)
    V[sysize+1:end] .=  V[1:sysize]
    # mul!(V_int2,J',V_int1)
    # V[1:sysize,1] .= V_int2
    # V[sysize+1:2*sysize,1] .= V_int2

    F1 = sqrt(det(Si * Sf * (Wapp \ ρ^2)))
    F2 = (-(im / 2) * (transpose(V) * (W \ V))) + ((im) * (transpose(D) * (Bi-Si) * D))
    g = F1 * exp(F2)
    Δ = Δ * 0.0367493  # in atomic units
    g = g * exp(im * Δ * t)
    η = 2 * 4.55633e-6  # 10 cm^-1 to hatree energy
    g = g * exp(-η * abs(t))
    return g[1]
end

@views function GSV(
    t::Float64,
    b::Float64,
    sysize::Int64,
    Ωf::Diagonal{Float64},
    Ωi::Diagonal{Float64},
    J::Matrix{Float64},
    D::Vector{Float64},
    Si::Diagonal{ComplexF64},
    Sf::Diagonal{ComplexF64},
    Bi::Diagonal{ComplexF64},
    Bf::Diagonal{ComplexF64},
    gdsopart::ComplexF64,
    T::Matrix{Float64},       ## Product if NAC and SO 
    Hso::Float64,
    X::Matrix{ComplexF64},
    Y::Vector{ComplexF64},
    W::Matrix{ComplexF64},
    V::Vector{ComplexF64},
    WinV::Vector{ComplexF64},
    XWin::Matrix{ComplexF64}
)

    ################################## Preliminary Part , Common for both Frank Condon and Herzberg Teller ##############################
    t = t * 4.1341373335e16                     # converting time to atomic units
    # tp = -im * b - t
    # for i = 1:sysize
    #     Sf[i, i] = Ωf[i, i] / sin(Ωf[i, i] * t)
    #     Si[i, i] = Ωi[i, i] / sin(Ωi[i, i] * tp)
    #     Bf[i, i] = Ωf[i, i] / tan(Ωf[i, i] * t)
    #     Bi[i, i] = Ωi[i, i] / tan(Ωi[i, i] * tp)
    # end
    # W = [(Bf+J'*Bi*J) -(Sf + J' * Si * J)
    #     -(Sf + J' * Si * J) (Bf+J'*Bi*J)]
    # # U = Bi - Si
    # V = [J' * (Bi-Si) * D
    #     J' * (Bi-Si) * D]




    ################# Final Calculation and returning value ##################
    @timeit to "Pre Loop Allocations" begin
    s = zero(ComplexF64)
    MJSJ = J' * Si * J
    MJBJ = J' * Bi * J
    MJUD = J' * (Bi-Si) * D
    Win = inv(W)
    mul!(WinV,Win,V)
    end
    @timeit to "Most Expensive Loop" begin
    for m in 1:sysize
        for mp in 1:sysize
            ########## Defining the X and Y Matrices ##################################
            @timeit to "X Y Matrices Writing" begin
            X[m, 1:sysize] .= (MJSJ)[mp, :] .* (-Bf[m, m])
            X[m, sysize+1:2*sysize] .= (MJBJ)[mp, :] .* (Bf[m, m])
            X[sysize+m, 1:sysize] .= (MJSJ)[mp, :] .* (Sf[m, m])
            X[sysize+m, sysize+1:2*sysize] .= (MJBJ)[mp, :] .* (-Sf[m, m])

            Y[m, 1] = Bf[m, m] * (MJUD)[mp, 1]
            Y[m+sysize, 1] = -Sf[m, m] * (MJUD)[mp, 1]
            end
            @timeit to "Final Multiplications" begin

            # g = im * (tr(mul!(XWin, X, Win))  + (transpose(WinV)*X*WinV)[1] - (transpose(Y)*WinV)[1])
            g = im * (tr(mul!(XWin, X, Win))  + dot(WinV, X, WinV) - dot(Y, WinV))
            


            s = s + g * gdsopart * T[m, mp]
            end

        end
        X[m, 1:sysize] .= 0
        X[m, sysize+1:2*sysize] .= 0
        X[sysize+m, 1:sysize] .= 0
        X[sysize+m, sysize+1:2*sysize] .= 0
        Y[m, 1] = 0
        Y[m+sysize, 1] = 0
    end
end
    s = s + abs(Hso)^2 * gdsopart   #### LATEST MODIFICATION
    return s
end

const hso_s1t3 = 1.49 * 0.0000046  ## SOC between S1 and T3 = 0.48 cm-1, converting to atomic units
const Delta = 0.11 # in eV
const Delta_s1t3 = 0.091 * 0.037 # Difference between S1 and T3 = 0.091 eV in atomic units
const dd = 40                            # Change this to modify number of points
const Temp = 300
kT_temp = Temp * 1.380649e-23  # Boltzmann constant times temperature in Joules
const kT = kT_temp * 2.29371227840e17         # in atomic units
const b = 1 / kT
const omegaf = Diagonal(vec(readdlm("wf.txt", '\t', Float64, '\n'))) * 4.55633e-6   # Final state  frequencies in atomic units
const omegai = Diagonal(vec(readdlm("wi.txt", '\t', Float64, '\n'))) * 4.55633e-6  # Initial state  frequencies in atomic units
const sysize = size(omegai)[1]
const P = @. 2 * sinh(b * omegai / 2)
T_temp = readdlm("nac.txt", '\t', Float64, '\n') # NAC matrix
const T = (T_temp * T_temp')* (hso_s1t3^2/Delta_s1t3^2)
const D = readdlm("d.txt", Float64)[:,1] # Displacement Matrix
const J = readdlm("j.txt", Float64) # Duchinsky Matrix
const t = collect(range(-5e-12, stop=5e-12, length=dd))
t[div(dd + 1, 2)] = 10e-25


function worker_function(xrange, t, b, Δ, sysize, Ωf, Ωi, J, D, P, T, Hso,mpi_rank,mpi_size)
    Sf = Diagonal(zeros(ComplexF64, sysize))
    Si = Diagonal(zeros(ComplexF64, sysize))
    Bf = Diagonal(zeros(ComplexF64, sysize))
    Bi = Diagonal(zeros(ComplexF64, sysize))
    X = zeros(ComplexF64, 2 * sysize, 2 * sysize)
    Y = zeros(ComplexF64, 2 * sysize)
    BB = zeros(ComplexF64, sysize, sysize)
    AA = zeros(ComplexF64, sysize, sysize)
    JpBi = zeros(ComplexF64, sysize, sysize)
    JpSi = zeros(ComplexF64, sysize, sysize)
    W = zeros(ComplexF64, 2 * sysize, 2 * sysize)
    Wapp = zeros(ComplexF64, sysize, sysize)
    Wapp_i1 = zeros(ComplexF64, sysize, sysize)
    V =  zeros(ComplexF64, 2 * sysize)
    V_int1 = zeros(ComplexF64, sysize)
    V_int2 = zeros(ComplexF64, sysize,1)
    BimSi = Diagonal(zeros(ComplexF64, sysize, sysize))
    WinV = zeros(ComplexF64, 2 * sysize)
    WinV_transpose = zeros(ComplexF64, 1, 2 * sysize)
    XWin = zeros(ComplexF64, 2 * sysize, 2 * sysize)
    Hso = 0.48 * 0.0000046  ## SOC between S1 and T2 = 0.48 cm-1, converting to atomic units
    x_worker = zeros(ComplexF64, length(xrange))
    for (local_xind,main_xind) in enumerate(xrange)
        gdsopart = GDSO(t[main_xind], b, Δ, sysize, Ωf, Ωi, J, D, P, Si, Sf, Bi, Bf, JpBi, JpSi, BB, AA, W, Wapp, Wapp_i1, V, V_int1, V_int2, BimSi)
        gtotal = GSV(t[main_xind], b, sysize, Ωf, Ωi, J, D, Si, Sf, Bi, Bf, gdsopart, T, Hso, X, Y, W, V, WinV, XWin)
        x_worker[local_xind] = gtotal
        open("ISC-MPI.TXT", "a") do f
            write(f, "ITERATION $main_xind COMPLETED BY PROCESS $(mpi_rank) \n")
        end
    end
    return x_worker
end




function calc(t, b, Δ, sysize, Ωf, Ωi, J, D, P)
    comm = MPI.COMM_WORLD
    mpi_rank = MPI.Comm_rank(comm)
    mpi_size = MPI.Comm_size(comm)
    open("ISC-MPI.TXT", "w") do f
        write(f, "STARTING LOOP WITH NUMBER OF CORES = $(mpi_size) \n====================================================\n")
    end

    Hso = 0.48 * 0.0000046  ## SOC between S1 and T2 = 0.48 cm-1, converting to atomic units
    xrange_chunknum = chunks(range(1, dd), mpi_size)
    @timeit to "GT Calculation" begin  
        # for ichunk in 1:nchunks
        #     xrange,chunknum = xrange_chunknum[ichunk]
        #     inter_result = worker_function(xrange, t, b, Δ, sysize, Ωf, Ωi, J, D, P, T, Hso,mpi_rank,mpi_size)
        # end
        xrange,chunknum = xrange_chunknum[mpi_rank+1]
        inter_result = worker_function(xrange, t, b, Δ, sysize, Ωf, Ωi, J, D, P, T, Hso,mpi_rank,mpi_size)
        MPI.Barrier(comm)
        x = MPI.gather(inter_result,comm,root=0)

    end


    if mpi_rank == 0
        x = vcat(x...)
        println("LOOP COMPLETE")
        open("ISC-MPI.TXT", "a") do f
            write(f, "LOOP COMPLETE \n====================================================")
        end
        yr = real.(x)
        yi = imag.(x)
    



    begin 
        open("ISC-MPI.TXT", "a") do f
            write(f, "-------------------------\n")
            write(f, "NUMBER OF CORES = $(1)\n")
            write(f, "The value of Δ is $Δ\n")
            write(f, "Number of time points = $dd\n")
            write(f, "Central Value = $(yr[div(dd + 1, 2)])\n")
            write(f, "-------------------------\n")
            write(f, "The values of t,Re(g(t)),Im(g(t)) are \n\n")
            for i in 1:dd
                write(f, "$(t[i])  $(yr[i])  $(yi[i])\n")
            end
        end

        open("GT-MPI.TXT", "w") do f
            for i in 1:dd
                write(f, "$(t[i])  $(yr[i])  $(yi[i])\n")
            end
        end

    end
    display(to)
    end
end


calc(t, b, Delta, sysize, omegaf, omegai, J, D, P)
MPI.Finalize()

So this is an OpenHPC cluster. I hate to be negative, however it looks like there are basic problems on the cluster.

Please start by listing:
OS on cluster head node
OS on cluster nodes
Network interfaces on cluster nodes - what ethernet types, which IB cards.

What IB switch are you using?
Where is your Subnet Manager running?

What shared storage are you using? Is this NFS shared from head node?

ps. I have been configuring and supporting clussters for 25 years. These are not idle questions.

Please run the following diagnostics on a cluster node with an IB card:
ibstat
sminfo
ibhosts
ibswitches
ibdiagnet

Run ibstat on all cluster nodes and head node. Use pdsh or clush to do this.

The output of ‘module avail’ is interesting. It should examine your MODULESPATH and list any Modules files in the path.
Please run echo $MODULESPATH
Also run id and whoami

Also lets look at the health of your filesystems

On head node: lsblk and df -h
On a compute node: lsblk and df -h

Run ‘dmesg’ as root on your head node also. please dont send the output here! However look for errors, especially any errors regarding disk devices

[swapanc.uc@login02 swapanc.uc]$ echo $MODULESPATH

[swapanc.uc@login02 swapanc.uc]$ echo $MODULEPATH
/opt/ohpc/pub/moduledeps/gnu8:/opt/ohpc/pub/modulefiles:/home/apps/modulefiles:/opt/ohpc/pub/apps/pgi/modulefiles:/home/swapanc.uc/modulefiles
[swapanc.uc@login02 swapanc.uc]$ id
uid=6327(swapanc.uc) gid=6327(swapanc.uc) groups=6327(swapanc.uc)
[swapanc.uc@login02 swapanc.uc]$ whoami
swapanc.uc
[swapanc.uc@login02 swapanc.uc]$
[swapanc.uc@login02 swapanc.uc]$ df -h
Filesystem                                                                          Size  Used Avail Use% Mounted on
/dev/mapper/centos-root                                                             2.4T   57G  2.3T   3% /
devtmpfs                                                                            189G     0  189G   0% /dev
tmpfs                                                                               189G  6.7M  189G   1% /dev/shm
tmpfs                                                                               189G  4.0G  185G   3% /run
tmpfs                                                                               189G     0  189G   0% /sys/fs/cgroup
/dev/sda2                                                                          1014M  209M  806M  21% /boot
/dev/mapper/centos-tmp                                                              500G   56M  500G   1% /tmp
/dev/mapper/centos-apps                                                             496G   33M  496G   1% /apps
/dev/mapper/centos-var                                                              500G  7.2G  493G   2% /var
/dev/mapper/centos-opt                                                              2.5T  6.2G  2.5T   1% /opt
/dev/mapper/centos-var_log_audit                                                    100G   38G   63G  38% /var/log/audit
/dev/mapper/centos-var_tmp                                                          100G   72M  100G   1% /var/tmp
172.20.3.213@o2ib,172.20.3.219@o2ib1:172.20.3.214@o2ib,172.20.3.220@o2ib1:/scratch  1.7P  647T  1.1P  38% /scratch
172.20.3.213@o2ib,172.20.3.219@o2ib1:172.20.3.214@o2ib,172.20.3.220@o2ib1:/home     345T   20T  322T   6% /home
tmpfs                                                                                38G   12K   38G   1% /run/user/42
tmpfs                                                                                38G     0   38G   0% /run/user/6613
tmpfs                                                                                38G     0   38G   0% /run/user/6039
tmpfs                                                                                38G     0   38G   0% /run/user/6060
tmpfs                                                                                38G     0   38G   0% /run/user/7393
tmpfs                                                                                38G     0   38G   0% /run/user/6762
tmpfs                                                                                38G     0   38G   0% /run/user/7397
tmpfs                                                                                38G     0   38G   0% /run/user/6689
tmpfs                                                                                38G     0   38G   0% /run/user/6745
tmpfs                                                                                38G     0   38G   0% /run/user/6083
tmpfs                                                                                38G     0   38G   0% /run/user/6499
tmpfs                                                                                38G     0   38G   0% /run/user/7269
tmpfs                                                                                38G     0   38G   0% /run/user/6279
tmpfs                                                                                38G     0   38G   0% /run/user/6186
tmpfs                                                                                38G     0   38G   0% /run/user/6165
tmpfs                                                                                38G     0   38G   0% /run/user/6348
tmpfs                                                                                38G     0   38G   0% /run/user/6292
tmpfs                                                                                38G  8.0K   38G   1% /run/user/6766
tmpfs                                                                                38G     0   38G   0% /run/user/6068
tmpfs                                                                                38G     0   38G   0% /run/user/7079
tmpfs                                                                                38G     0   38G   0% /run/user/6599
tmpfs                                                                                38G     0   38G   0% /run/user/6987
tmpfs                                                                                38G     0   38G   0% /run/user/6652
tmpfs                                                                                38G     0   38G   0% /run/user/6742
tmpfs                                                                                38G     0   38G   0% /run/user/6252
tmpfs                                                                                38G     0   38G   0% /run/user/6935
tmpfs                                                                                38G     0   38G   0% /run/user/7320
tmpfs                                                                                38G     0   38G   0% /run/user/7199
tmpfs                                                                                38G     0   38G   0% /run/user/6859
tmpfs                                                                                38G     0   38G   0% /run/user/6850
tmpfs                                                                                38G     0   38G   0% /run/user/6302
tmpfs                                                                                38G     0   38G   0% /run/user/6319
tmpfs                                                                                38G     0   38G   0% /run/user/7239
tmpfs                                                                                38G     0   38G   0% /run/user/7245
tmpfs                                                                                38G     0   38G   0% /run/user/6731
tmpfs                                                                                38G     0   38G   0% /run/user/6717
tmpfs                                                                                38G     0   38G   0% /run/user/6696
tmpfs                                                                                38G     0   38G   0% /run/user/6327

This is the output when I type them into the login node. I do not have direct access to the head node.

I do not know what is an IB card, however on running these commands on the login node, I get the following results.

[swapanc.uc@login02 swapanc.uc]$ ibstatus
Infiniband device 'mlx5_0' port 1 status:
	default gid:	 fe80:0000:0000:0000:9803:9b03:0086:d662
	base lid:	 0x21
	sm lid:		 0x1f
	state:		 4: ACTIVE
	phys state:	 5: LinkUp
	rate:		 100 Gb/sec (4X EDR)
	link_layer:	 InfiniBand

[swapanc.uc@login02 swapanc.uc]$ sminfo
ibwarn: [115937] mad_rpc_open_port: can't open UMAD port ((null):0)
sminfo: iberror: failed: Failed to open '(null)' port '0'
[swapanc.uc@login02 swapanc.uc]$ ibhosts
ibwarn: [115965] mad_rpc_open_port: can't open UMAD port ((null):0)
/var/tmp/OFED_topdir/BUILD/rdma-core-56mlnx40/libibnetdisc/ibnetdisc.c:802; can't open MAD port ((null):0)
/usr/sbin/ibnetdiscover: iberror: failed: discover failed
[swapanc.uc@login02 swapanc.uc]$ ibdiagnet
Running version:   "IBDIAGNET 2.9.0.MLNX20220418.60b8156","IBDIAG 2.1.1.60b8156","IBDM 2.1.1.60b8156","IBIS 6.0.0.5480bba"
Running command:   ibdiagnet 
Running timestamp: 2024-08-15 12:35:06 IST +0530

Switch label port numbering explanation:
  Quantum2 switch split mode: ASIC/Cage/Port/Split, e.g 1/1/1/1
  Quantum2 switch no split mode: ASIC/Cage/Port
  Quantum switch split mode: Port/Split
  Quantum switch no split mode: Port


----------
Load Plugins from:
/usr/share/ibdiagnet2.1.1/plugins/
(You can specify more paths to be looked in with "IBDIAGNET_PLUGINS_PATH" env variable)

Plugin Name                                   Result     Comment
libibdiagnet_cable_diag_plugin-2.1.1          Succeeded  Plugin loaded
libibdiagnet_phy_diag_plugin-2.1.1            Succeeded  Plugin loaded

---------------------------------------------
Discovery
-E- Failed to initialize - Failed to set port of ibis object, err=No viable ports found in the system
-I- Start Fabric Discover
-I- Discovering ... 0 Nodes (0 Switches & 0 CAs) discovered.
-I- Fill NodeDesc data
-I- Retrieving... 0/0 Request Port Nodes (0/0 Switches & 0/0 CAs) retrieved.
-I- NodeDesc finished successfully 
-E- Fabric Discover failed, err=IBDiag initialize wasn't done
-E- Fabric Discover failed, MAD err=No viable ports found in the system

---------------------------------------------
Fabric Summary

Total Nodes             : 0
IB Switches             : 0
IB Channel Adapters     : 0
IB Aggregation Nodes    : 0
IB Routers              : 0

Adaptive Routing is enabled on 0 switches.
Hashed Based Forwarding is enabled on 0 switches.

Total number of links   : 0

Master SM  : No Master SM
Standby SM : No Standby SM

---------------------------------------------
Summary
-I- Stage                               Warnings   Errors     Comment   
-I- Discovery                                                 NA
-I- Lids Check                                                NA
-I- Links Check                                               NA
-I- Subnet Manager                                            NA
-I- Port Counters                                             NA
-I- Nodes Information                                         NA
-I- Speed / Width checks                                      NA
-I- Virtualization                                            NA
-I- Partition Keys                                            NA
-I- Temperature Sensing                                       NA
-I- Create IBNetDiscover File                                 NA

-I- You can find detailed errors/warnings in: /var/tmp/ibdiagnet2/ibdiagnet2.log


-I- Database                            : /var/tmp/ibdiagnet2/ibdiagnet2.db_csv


-E- A fatal error occurred, exiting...

Also stupid question from me. From your output you are running 40 MPI processes on compute node cm152. I guess you have 40 cores.
Why is openmpi trying to use the Mellanox interfaces to make the communications?

Run opmi_info please

Is this a virtualised or cloud based cluster?

Sure :

[swapanc.uc@login02 swapanc.uc]$ ompi_info 
                 Package: Open MPI swapanc.uc@login08.iitkgp.ac.in
                          Distribution
                Open MPI: 4.1.1
  Open MPI repo revision: v4.1.1
   Open MPI release date: Apr 24, 2021
                Open RTE: 4.1.1
  Open RTE repo revision: v4.1.1
   Open RTE release date: Apr 24, 2021
                    OPAL: 4.1.1
      OPAL repo revision: v4.1.1
       OPAL release date: Apr 24, 2021
                 MPI API: 3.1.0
            Ident string: 4.1.1
                  Prefix: /home/swapanc.uc/software/openmpi-4.1.1
 Configured architecture: x86_64-pc-linux-gnu
          Configure host: login08.iitkgp.ac.in
           Configured by: swapanc.uc
           Configured on: Sat Feb  5 06:50:04 UTC 2022
          Configure host: login08.iitkgp.ac.in
  Configure command line: '--prefix=/home/swapanc.uc/software/openmpi-4.1.1/'
                Built by: swapanc.uc
                Built on: Sat Feb  5 07:19:32 UTC 2022
              Built host: login08.iitkgp.ac.in
              C bindings: yes
            C++ bindings: no
             Fort mpif.h: yes (all)
            Fort use mpi: yes (limited: overloading)
       Fort use mpi size: deprecated-ompi-info-value
        Fort use mpi_f08: no
 Fort mpi_f08 compliance: The mpi_f08 module was not built
  Fort mpi_f08 subarrays: no
           Java bindings: no
  Wrapper compiler rpath: runpath
              C compiler: gcc
     C compiler absolute: /usr/bin/gcc
  C compiler family name: GNU
      C compiler version: 4.8.5
            C++ compiler: g++
   C++ compiler absolute: /usr/bin/g++
           Fort compiler: gfortran
       Fort compiler abs: /usr/bin/gfortran
         Fort ignore TKR: no
   Fort 08 assumed shape: no
      Fort optional args: no
          Fort INTERFACE: yes
    Fort ISO_FORTRAN_ENV: yes
       Fort STORAGE_SIZE: no
      Fort BIND(C) (all): no
      Fort ISO_C_BINDING: yes
 Fort SUBROUTINE BIND(C): no
       Fort TYPE,BIND(C): no
 Fort T,BIND(C,name="a"): no
            Fort PRIVATE: no
          Fort PROTECTED: no
           Fort ABSTRACT: no
       Fort ASYNCHRONOUS: no
          Fort PROCEDURE: no
         Fort USE...ONLY: no
           Fort C_FUNLOC: no
 Fort f08 using wrappers: no
         Fort MPI_SIZEOF: no
             C profiling: yes
           C++ profiling: no
   Fort mpif.h profiling: yes
  Fort use mpi profiling: yes
   Fort use mpi_f08 prof: no
          C++ exceptions: no
          Thread support: posix (MPI_THREAD_MULTIPLE: yes, OPAL support: yes,
                          OMPI progress: no, ORTE progress: yes, Event lib:
                          yes)
           Sparse Groups: no
  Internal debug support: no
  MPI interface warnings: yes
     MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
              dl support: yes
   Heterogeneous support: no
 mpirun default --prefix: no
       MPI_WTIME support: native
     Symbol vis. support: yes
   Host topology support: yes
            IPv6 support: no
      MPI1 compatibility: no
          MPI extensions: affinity, cuda, pcollreq
   FT Checkpoint support: no (checkpoint thread: no)
   C/R Enabled Debugging: no
  MPI_MAX_PROCESSOR_NAME: 256
    MPI_MAX_ERROR_STRING: 256
     MPI_MAX_OBJECT_NAME: 64
        MPI_MAX_INFO_KEY: 36
        MPI_MAX_INFO_VAL: 256
       MPI_MAX_PORT_NAME: 1024
  MPI_MAX_DATAREP_STRING: 128
           MCA allocator: bucket (MCA v2.1.0, API v2.0.0, Component v4.1.1)
           MCA allocator: basic (MCA v2.1.0, API v2.0.0, Component v4.1.1)
           MCA backtrace: execinfo (MCA v2.1.0, API v2.0.0, Component v4.1.1)
                 MCA btl: vader (MCA v2.1.0, API v3.1.0, Component v4.1.1)
                 MCA btl: openib (MCA v2.1.0, API v3.1.0, Component v4.1.1)
                 MCA btl: self (MCA v2.1.0, API v3.1.0, Component v4.1.1)
                 MCA btl: tcp (MCA v2.1.0, API v3.1.0, Component v4.1.1)
                 MCA btl: uct (MCA v2.1.0, API v3.1.0, Component v4.1.1)
            MCA compress: bzip (MCA v2.1.0, API v2.0.0, Component v4.1.1)
            MCA compress: gzip (MCA v2.1.0, API v2.0.0, Component v4.1.1)
                 MCA crs: none (MCA v2.1.0, API v2.0.0, Component v4.1.1)
                  MCA dl: dlopen (MCA v2.1.0, API v1.0.0, Component v4.1.1)
               MCA event: libevent2022 (MCA v2.1.0, API v2.0.0, Component
                          v4.1.1)
               MCA hwloc: hwloc201 (MCA v2.1.0, API v2.0.0, Component v4.1.1)
                  MCA if: linux_ipv6 (MCA v2.1.0, API v2.0.0, Component
                          v4.1.1)
                  MCA if: posix_ipv4 (MCA v2.1.0, API v2.0.0, Component
                          v4.1.1)
         MCA installdirs: env (MCA v2.1.0, API v2.0.0, Component v4.1.1)
         MCA installdirs: config (MCA v2.1.0, API v2.0.0, Component v4.1.1)
              MCA memory: patcher (MCA v2.1.0, API v2.0.0, Component v4.1.1)
               MCA mpool: hugepage (MCA v2.1.0, API v3.0.0, Component v4.1.1)
             MCA patcher: overwrite (MCA v2.1.0, API v1.0.0, Component
                          v4.1.1)
                MCA pmix: flux (MCA v2.1.0, API v2.0.0, Component v4.1.1)
                MCA pmix: pmix3x (MCA v2.1.0, API v2.0.0, Component v4.1.1)
                MCA pmix: isolated (MCA v2.1.0, API v2.0.0, Component v4.1.1)
               MCA pstat: linux (MCA v2.1.0, API v2.0.0, Component v4.1.1)
              MCA rcache: grdma (MCA v2.1.0, API v3.3.0, Component v4.1.1)
           MCA reachable: weighted (MCA v2.1.0, API v2.0.0, Component v4.1.1)
               MCA shmem: sysv (MCA v2.1.0, API v2.0.0, Component v4.1.1)
               MCA shmem: mmap (MCA v2.1.0, API v2.0.0, Component v4.1.1)
               MCA shmem: posix (MCA v2.1.0, API v2.0.0, Component v4.1.1)
               MCA timer: linux (MCA v2.1.0, API v2.0.0, Component v4.1.1)
              MCA errmgr: default_hnp (MCA v2.1.0, API v3.0.0, Component
                          v4.1.1)
              MCA errmgr: default_orted (MCA v2.1.0, API v3.0.0, Component
                          v4.1.1)
              MCA errmgr: default_app (MCA v2.1.0, API v3.0.0, Component
                          v4.1.1)
              MCA errmgr: default_tool (MCA v2.1.0, API v3.0.0, Component
                          v4.1.1)
                 MCA ess: tool (MCA v2.1.0, API v3.0.0, Component v4.1.1)
                 MCA ess: env (MCA v2.1.0, API v3.0.0, Component v4.1.1)
                 MCA ess: slurm (MCA v2.1.0, API v3.0.0, Component v4.1.1)
                 MCA ess: singleton (MCA v2.1.0, API v3.0.0, Component
                          v4.1.1)
                 MCA ess: hnp (MCA v2.1.0, API v3.0.0, Component v4.1.1)
                 MCA ess: pmi (MCA v2.1.0, API v3.0.0, Component v4.1.1)
               MCA filem: raw (MCA v2.1.0, API v2.0.0, Component v4.1.1)
             MCA grpcomm: direct (MCA v2.1.0, API v3.0.0, Component v4.1.1)
                 MCA iof: tool (MCA v2.1.0, API v2.0.0, Component v4.1.1)
                 MCA iof: hnp (MCA v2.1.0, API v2.0.0, Component v4.1.1)
                 MCA iof: orted (MCA v2.1.0, API v2.0.0, Component v4.1.1)
                MCA odls: default (MCA v2.1.0, API v2.0.0, Component v4.1.1)
                MCA odls: pspawn (MCA v2.1.0, API v2.0.0, Component v4.1.1)
                 MCA oob: tcp (MCA v2.1.0, API v2.0.0, Component v4.1.1)
                 MCA plm: slurm (MCA v2.1.0, API v2.0.0, Component v4.1.1)
                 MCA plm: isolated (MCA v2.1.0, API v2.0.0, Component v4.1.1)
                 MCA plm: rsh (MCA v2.1.0, API v2.0.0, Component v4.1.1)
                 MCA ras: simulator (MCA v2.1.0, API v2.0.0, Component
                          v4.1.1)
                 MCA ras: slurm (MCA v2.1.0, API v2.0.0, Component v4.1.1)
                MCA regx: naive (MCA v2.1.0, API v1.0.0, Component v4.1.1)
                MCA regx: fwd (MCA v2.1.0, API v1.0.0, Component v4.1.1)
                MCA regx: reverse (MCA v2.1.0, API v1.0.0, Component v4.1.1)
               MCA rmaps: rank_file (MCA v2.1.0, API v2.0.0, Component
                          v4.1.1)
               MCA rmaps: resilient (MCA v2.1.0, API v2.0.0, Component
                          v4.1.1)
               MCA rmaps: ppr (MCA v2.1.0, API v2.0.0, Component v4.1.1)
               MCA rmaps: seq (MCA v2.1.0, API v2.0.0, Component v4.1.1)
               MCA rmaps: round_robin (MCA v2.1.0, API v2.0.0, Component
                          v4.1.1)
               MCA rmaps: mindist (MCA v2.1.0, API v2.0.0, Component v4.1.1)
                 MCA rml: oob (MCA v2.1.0, API v3.0.0, Component v4.1.1)
              MCA routed: direct (MCA v2.1.0, API v3.0.0, Component v4.1.1)
              MCA routed: binomial (MCA v2.1.0, API v3.0.0, Component v4.1.1)
              MCA routed: radix (MCA v2.1.0, API v3.0.0, Component v4.1.1)
                 MCA rtc: hwloc (MCA v2.1.0, API v1.0.0, Component v4.1.1)
              MCA schizo: orte (MCA v2.1.0, API v1.0.0, Component v4.1.1)
              MCA schizo: ompi (MCA v2.1.0, API v1.0.0, Component v4.1.1)
              MCA schizo: jsm (MCA v2.1.0, API v1.0.0, Component v4.1.1)
              MCA schizo: flux (MCA v2.1.0, API v1.0.0, Component v4.1.1)
              MCA schizo: slurm (MCA v2.1.0, API v1.0.0, Component v4.1.1)
               MCA state: tool (MCA v2.1.0, API v1.0.0, Component v4.1.1)
               MCA state: app (MCA v2.1.0, API v1.0.0, Component v4.1.1)
               MCA state: hnp (MCA v2.1.0, API v1.0.0, Component v4.1.1)
               MCA state: novm (MCA v2.1.0, API v1.0.0, Component v4.1.1)
               MCA state: orted (MCA v2.1.0, API v1.0.0, Component v4.1.1)
                 MCA bml: r2 (MCA v2.1.0, API v2.0.0, Component v4.1.1)
                MCA coll: inter (MCA v2.1.0, API v2.0.0, Component v4.1.1)
                MCA coll: basic (MCA v2.1.0, API v2.0.0, Component v4.1.1)
                MCA coll: han (MCA v2.1.0, API v2.0.0, Component v4.1.1)
                MCA coll: sm (MCA v2.1.0, API v2.0.0, Component v4.1.1)
                MCA coll: monitoring (MCA v2.1.0, API v2.0.0, Component
                          v4.1.1)
                MCA coll: self (MCA v2.1.0, API v2.0.0, Component v4.1.1)
                MCA coll: adapt (MCA v2.1.0, API v2.0.0, Component v4.1.1)
                MCA coll: sync (MCA v2.1.0, API v2.0.0, Component v4.1.1)
                MCA coll: tuned (MCA v2.1.0, API v2.0.0, Component v4.1.1)
                MCA coll: libnbc (MCA v2.1.0, API v2.0.0, Component v4.1.1)
                MCA fbtl: posix (MCA v2.1.0, API v2.0.0, Component v4.1.1)
               MCA fcoll: dynamic (MCA v2.1.0, API v2.0.0, Component v4.1.1)
               MCA fcoll: individual (MCA v2.1.0, API v2.0.0, Component
                          v4.1.1)
               MCA fcoll: two_phase (MCA v2.1.0, API v2.0.0, Component
                          v4.1.1)
               MCA fcoll: vulcan (MCA v2.1.0, API v2.0.0, Component v4.1.1)
               MCA fcoll: dynamic_gen2 (MCA v2.1.0, API v2.0.0, Component
                          v4.1.1)
                  MCA fs: lustre (MCA v2.1.0, API v2.0.0, Component v4.1.1)
                  MCA fs: ufs (MCA v2.1.0, API v2.0.0, Component v4.1.1)
                  MCA io: ompio (MCA v2.1.0, API v2.0.0, Component v4.1.1)
                  MCA io: romio321 (MCA v2.1.0, API v2.0.0, Component v4.1.1)
                  MCA op: avx (MCA v2.1.0, API v1.0.0, Component v4.1.1)
                 MCA osc: ucx (MCA v2.1.0, API v3.0.0, Component v4.1.1)
                 MCA osc: rdma (MCA v2.1.0, API v3.0.0, Component v4.1.1)
                 MCA osc: sm (MCA v2.1.0, API v3.0.0, Component v4.1.1)
                 MCA osc: monitoring (MCA v2.1.0, API v3.0.0, Component
                          v4.1.1)
                 MCA osc: pt2pt (MCA v2.1.0, API v3.0.0, Component v4.1.1)
                 MCA pml: v (MCA v2.1.0, API v2.0.0, Component v4.1.1)
                 MCA pml: ucx (MCA v2.1.0, API v2.0.0, Component v4.1.1)
                 MCA pml: monitoring (MCA v2.1.0, API v2.0.0, Component
                          v4.1.1)
                 MCA pml: ob1 (MCA v2.1.0, API v2.0.0, Component v4.1.1)
                 MCA pml: cm (MCA v2.1.0, API v2.0.0, Component v4.1.1)
                 MCA rte: orte (MCA v2.1.0, API v2.0.0, Component v4.1.1)
            MCA sharedfp: sm (MCA v2.1.0, API v2.0.0, Component v4.1.1)
            MCA sharedfp: individual (MCA v2.1.0, API v2.0.0, Component
                          v4.1.1)
            MCA sharedfp: lockedfile (MCA v2.1.0, API v2.0.0, Component
                          v4.1.1)
                MCA topo: basic (MCA v2.1.0, API v2.2.0, Component v4.1.1)
                MCA topo: treematch (MCA v2.1.0, API v2.2.0, Component
                          v4.1.1)
           MCA vprotocol: pessimist (MCA v2.1.0, API v2.0.0, Component
                          v4.1.1)

Why are you using opnmpi 4.1.1 located under your home directory?
Surely there is a system wide MPI?

iB = Infiniband

This is not something I decided. Julia MPI.jl seemed to have defaulted to the one of the home directory after following the instructions from MPI.jl of using system provided MPI.

I have no idea. Is there any command I can run to check?

Yes, on executing whereis mpiexec I get mpiexec: /usr/bin/mpiexec /home/swapanc.uc/software/openmpi-4.1.1/bin/mpiexec. However on trying to run my code using the former using

time /usr/bin/mpiexec -n 40 julia mpi_parallel_timeloop.jl >& mpi_parallel_timeloop.jl.out

In the SLURM script results in the code not running and exiting with the following error.

Error
[hm026:423007] OPAL ERROR: Unreachable in file pmix3x_client.c at line 112
[hm026:422980] OPAL ERROR: Unreachable in file pmix3x_client.c at line 112
--------------------------------------------------------------------------
The application appears to have been direct launched using "srun",
but OMPI was not built with SLURM's PMI support and therefore cannot
execute. There are several options for building PMI support under
SLURM, depending upon the SLURM version you are using:

  version 16.05 or later: you can use SLURM's PMIx support. This
  requires that you configure and build SLURM --with-pmix.

  Versions earlier than 16.05: you must use either SLURM's PMI-1 or
  PMI-2 support. SLURM builds PMI-1 by default, or you can manually
  install PMI-2. You must then build Open MPI using --with-pmi pointing
  to the SLURM PMI library location.

Please configure as appropriate and try again.
--------------------------------------------------------------------------
--------------------------------------------------------------------------