Julia crashed - Core dumped

I tried to report bug therough this command julia --bug-report=yes. But it is giving the segmentation fault for this as well. I updated to Julia 1.9, the latest stable version and now it crashes.

[ Info: Loading BugReporting package...
[ Info: Package `BugReporting` not found - attempting temporary installation

[1385509] signal (11.1): Segmentation fault
in expression starting at none:0
gc_mark_scan_objarray at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-9/src/gc.c:2080 [inlined]
gc_mark_loop at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-9/src/gc.c:2487
_jl_gc_collect at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-9/src/gc.c:3407
ijl_gc_collect at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-9/src/gc.c:3713
maybe_collect at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-9/src/gc.c:1083 [inlined]
jl_gc_pool_alloc_inner at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-9/src/gc.c:1450 [inlined]
jl_gc_pool_alloc_noinline at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-9/src/gc.c:1511 [inlined]
jl_gc_alloc_ at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-9/src/julia_internal.h:460 [inlined]
jl_gc_alloc at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-9/src/gc.c:3760
_new_array_ at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-9/src/array.c:134 [inlined]
_new_array at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-9/src/array.c:198 [inlined]
ijl_alloc_array_1d at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-9/src/array.c:436
Array at ./boot.jl:477 [inlined]
Array at ./boot.jl:486 [inlined]
similar at ./abstractarray.jl:849 [inlined]
similar at ./abstractarray.jl:840 [inlined]
_similar_for at ./array.jl:661 [inlined]
_collect at ./array.jl:717
collect at ./array.jl:707 [inlined]
#split#470 at ./strings/util.jl:607 [inlined]
split at ./strings/util.jl:605 [inlined]
#read_tarball#45 at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-9/usr/share/julia/stdlib/v1.9/Tar/src/extract.jl:408
read_tarball at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-9/usr/share/julia/stdlib/v1.9/Tar/src/extract.jl:342 [inlined]
#19 at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-9/usr/share/julia/stdlib/v1.9/Pkg/src/Registry/registry_instance.jl:257 [inlined]
#open#770 at ./process.jl:427
open at ./process.jl:414 [inlined]
uncompress_registry at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-9/usr/share/julia/stdlib/v1.9/Pkg/src/Registry/registry_instance.jl:256
RegistryInstance at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-9/usr/share/julia/stdlib/v1.9/Pkg/src/Registry/registry_instance.jl:323
#reachable_registries#25 at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-9/usr/share/julia/stdlib/v1.9/Pkg/src/Registry/registry_instance.jl:429
reachable_registries at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-9/usr/share/julia/stdlib/v1.9/Pkg/src/Registry/registry_instance.jl:399 [inlined]
#download_default_registries#39 at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-9/usr/share/julia/stdlib/v1.9/Pkg/src/Registry/Registry.jl:99
download_default_registries at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-9/usr/share/julia/stdlib/v1.9/Pkg/src/Registry/Registry.jl:98 [inlined]
#add#27 at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-9/usr/share/julia/stdlib/v1.9/Pkg/src/API.jl:146
_jl_invoke at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-9/src/gf.c:2940
add at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-9/usr/share/julia/stdlib/v1.9/Pkg/src/API.jl:145
#add#24 at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-9/usr/share/julia/stdlib/v1.9/Pkg/src/API.jl:143 [inlined]
add at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-9/usr/share/julia/stdlib/v1.9/Pkg/src/API.jl:143
jfptr_add_67321.clone_1 at /home/cornell/.julia/juliaup/julia-1.9.4+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-9/src/gf.c:2940
#82 at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-9/usr/share/julia/stdlib/v1.9/InteractiveUtils/src/InteractiveUtils.jl:353
#mktempdir#24 at ./file.jl:762
unknown function (ip: 0x7f472bffb91d)
_jl_invoke at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-9/src/gf.c:2940
mktempdir at ./file.jl:758
mktempdir at ./file.jl:758 [inlined]
report_bug at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-9/usr/share/julia/stdlib/v1.9/InteractiveUtils/src/InteractiveUtils.jl:348
unknown function (ip: 0x7f472bffa952)
_jl_invoke at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-9/src/gf.c:2940
exec_options at ./client.jl:242
_start at ./client.jl:522
jfptr__start_49509.clone_1 at /home/cornell/.julia/juliaup/julia-1.9.4+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-9/src/gf.c:2940
jl_apply at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-9/src/julia.h:1880 [inlined]
true_main at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-9/src/jlapi.c:573
jl_repl_entrypoint at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-9/src/jlapi.c:717
main at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-9/cli/loader_exe.c:59
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x4010b8)
Allocations: 571189 (Pool: 570591; Big: 598); GC: 0
Segmentation fault (core dumped)```

How is julia installed, what CPU and OS?

I have juliaup to switch betweens versions.

Commit 8e5136fa297 (2023-11-14 08:46 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 28 × Intel(R) Core(TM) i9-10940X CPU @ 3.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, cascadelake)
  Threads: 1 on 28 virtual cores```

Does it crash already when you start Julia? Or when you load a package? Or when you run a script?

Probably unrelated to the issue, but --bug-report=yes isn’t a valid flag. You need to specify something like --bug-report=rr.

  BugReporting.jl
  ≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡

  This package implements Julia's --bug-report flag, simplyfing bug reporting by enabling users to easily generate and upload reports to help developers fix bugs.

      julia --bug-report=REPORT_TYPE[,REPORT_FLAG,...]

  Currently, only the rr (https://github.com/rr-debugger/rr) tool is supported to generate bug reports, but in the future other types of reports may be supported as
  well.

  Available bug report types and flags
  ======================================

  --bug-report=help
  –––––––––––––––––––

  Print help message and exit.

  --bug-report=rr
  –––––––––––––––––

  Run julia inside rr record and upload the recorded trace.

  --bug-report=rr-local
  –––––––––––––––––––––––

  Run julia inside rr record but do not upload the recorded trace. Useful for local debugging.

  --bug-report=XXX,timeout=SECONDS
  ––––––––––––––––––––––––––––––––––

  Generate a bug report, but limit the execution time of the debugged process to SECONDS seconds. This is useful for generating reports for hangs.

  --bug-report=rr,chaos
  –––––––––––––––––––––––

  Generate an rr trace, while enabling so-called chaos mode. This is useful for flushing out threading-related issues (refer to the rr documentation for more
  details).

  Using the traces for local debugging
  ======================================

  You can use this package also for debugging your own Julia code locally. Use --bug-report=rr-local to record a trace, and replay() to replay the latest trace.

  For example, if you have a script in a project that you'd like to trace, run julia --bug-report=rr -- --project=foo run.jl.

Since you have juliaup could you try the following to see if Julia still crashes. Do I understand correctly that Julia is crashing on startup?

juliaup add 1.9.0
julia +1.9.0
juliaup remove 1.9.0

It that does not the crash, could you try 1.9.1, 1.9.2, and 1.9.3.

Also are you using a conda environment or do you have any environment variables such as LD_PRELOAD or LD_LIBRARY_PATH set?

Bringing this up because I’m getting similar problems on my university’s HPC.

I’m also using juliaup with julia 1.10.2. I’m working with MPI.jl and PencilFFTs.jl and I can’t get a good understanding on why it works sometimes and why it’s segfaulting others.

Slurm sbatch script:

#!/bin/bash

#SBATCH -p free
#SBATCH --job-name=SRRS
#SBATCH --constraint=intel&mlx5_ib
#SBATCH --nodes=1-2
#SBATCH --ntasks=20
#SBATCH --error=err-%j.log
#SBATCH --output=output-%j.log
#SBATCH --mem-per-cpu=8G
#SBATCH --mail-type=begin
#SBATCH --mail-type=end         # send email if job fails
#SBATCH --mail-type=fail         # send email if job fails
#SBATCH --mail-user=ernestob@uci.edu


echo $SLURM_JOB_NUM_NODES >> "julia-$SLURM_JOB_ID.out"
echo $SLURM_NTASKS >> "julia-$SLURM_JOB_ID.out"
echo $SLURM_CPUS_PER_TASK >> "julia-$SLURM_JOB_ID.out"
module load openmpi/4.1.2/gcc.11.2.0
module load hdf5/1.14.1/gcc.11.2.0-openmpi.4.1.2

export JULIA_CPU_TARGET="generic;skylake-avx512,clone_all; skylake,clone_all;icelake-server,clone_all; znver3,clone_all; znver2,clone_all"
export JULIA_MPI_PATH="/opt/apps/openmpi/4.1.2/gcc/11.2.0/lib"
export JULIA_MPI_LIBRARY="/opt/apps/openmpi/4.1.2/gcc/11.2.0/lib/libmpi.so"
export JULIA_HDF5_PATH="/opt/apps/hdf5/1.14.1/gcc/11.2.0/openmpi/4.1.2/lib"



julia --project=. -e 'using Pkg;  Pkg.instantiate(); Pkg.precompile()' > "julia-$SLURM_JOB_ID.out"

mpiexec -n $SLURM_NTASKS /data/homezvol0/ernestob/.juliaup/bin/julia --project=. /dfs6/pub/ernestob/Julia/NRL/DistrrubedSRRS/PencilTest.jl > "julia-$SLURM_JOB_ID.out"

Julia Script:

using MPI
#using MPIPreferences

# ENV["JULIA_MPI_BINARY"]="system"
#Pkg.add("AbstractFFTs")
using BenchmarkTools
using Test
using LinearAlgebra
using FFTW

using Base.Threads
using Printf
using IJulia
using Revise
using PkgTemplates
using Plots
using Pkg
using Base.Threads: nthreads, @threads, @spawn
using Base.Iterators: partition
using Krylov
using LinearOperators

#Pkg.build("MPI"; verbose=true)
#using MPI
#using MPIPreferences
using HDF5
using PencilArrays
using PencilFFTs
using TimerOutputs
using PencilArrays.PencilIO

using Random
using AbstractFFTs: fftfreq, rfftfreq


MPI.Init()




comm = MPI.COMM_WORLD       # MPI communicator
rank = MPI.Comm_rank(comm)  # rank of local process
root = 0


MPI.Barrier(comm)


if rank == root
    println("Finished Activating Project and Loading Packages")
    flush(stdout)
    
    
    println("-------------------")
    flush(stdout)
    println("Is ther HDF5 Parallel?")
    println("-------------------")
    flush(stdout)
    println(HDF5.has_parallel())
    flush(stdout)
    
    println("-------------------")
    flush(stdout)
    println("Beginning PencilFFT Tutorial")
    println("-------------------")
    flush(stdout)

end
#wait for all MPI processes to catch up
MPI.Barrier(comm)

# Input data dimensions (Nx × Ny × Nz)
Nx = 64
Ny = 32
dims = (64, 32)#(64, 32, 12)

# Apply a 2D real-to-complex (r2c) FFT.
transform = (Transforms.FFT(), Transforms.NoTransform())#, Transforms.NoTransform())
inplacefft = (Transforms.FFT!(), Transforms.NoTransform!())

pen = Pencil(dims,  comm)
#wait for all MPI processes to catch up
MPI.Barrier(comm)
if rank==root

    println("-------------------")
    flush(stdout)
    println("Pencils")
    println("-------------------")
    flush(stdout)
    println(pen)
    flush(stdout)
    # println(pen1D)
    # flush(stdout)
    # println("Sizes: ")
    # flush(stdout)
    # println()

end
#wait for all MPI processes to catch up
MPI.Barrier(comm)

if rank==root
    println("-------------------")
    flush(stdout)
    println("Creating Pencil Array")
    flush(stdout)
    println("-------------------")
    flush(stdout)
end
# Create plan
#plan = PencilFFTPlan(pen, transform)#, permute_dims = Val(false))
planinplace = PencilFFTPlan(pen, inplacefft )

MPI.Barrier(comm)
# Allocate data and initialise field
# theta = allocate_input(plan)
# randn!(theta)
Theta1 = allocate_input(planinplace)
Theta2 = allocate_input(planinplace)
theta1 = first(Theta1)
theta2 = first(Theta2)
randn!(theta1)
@. theta2 = theta1

if rank==root
    println("-------------------")
    flush(stdout)
    println("Gathering Pencil Array theta To Root")
    println("-------------------")
    flush(stdout)

end
theta0 = gather(theta1)
MPI.Barrier(comm)

if rank==root
    println("-------------------")
    flush(stdout)
    println("Checking Gathering To Root")
    println("-------------------")
    flush(stdout)

    println("Size of gather theta0: ",size(theta0))
    flush(stdout)
    
end
MPI.Barrier(comm)

if rank==root
    println("-------------------")
    flush(stdout)
    println("Creating 1D Array theta1D in Root")
    flush(stdout)
    println("-------------------")
    flush(stdout)
    theta1D = zeros(ComplexF64, dims[1])#allocate_input(plan1D)
    println("Size of this array: ", size(theta1D))
    flush(stdout)

    println("-------------------")
    flush(stdout)
    println("Setting 1D Array theta1D to gather theta0[:,1,1]" )
    flush(stdout)
    println("-------------------")
    flush(stdout)

    @. theta1D = theta0[:,1]#theta0[:,1,1]
    
end



MPI.Barrier(comm)

if rank==root
    println("-------------------")
    flush(stdout)
    println("Checking Sizes of Pencil Array")
    println("-------------------")
    flush(stdout)
    
end

MPI.Barrier(comm)

flush(stdout)
println(size(theta1))
flush(stdout)
MPI.Barrier(comm)


if rank==root
    println("-------------------")
    flush(stdout)
    println("Performing Pencil FFT")
    println("-------------------")
    flush(stdout)
    
end


# theta_hat = plan * theta
planinplace * Theta1;
theta_hat = last(Theta1)
thetaf = gather(theta_hat)

MPI.Barrier(comm)

if rank==root
    println("-------------------")
    flush(stdout)
    println("Checking size of Pencil FFT Output")
    println("-------------------")
    flush(stdout)
    
end
println(size(theta_hat))
MPI.Barrier(comm)

if rank==root
    println("-------------------")
    flush(stdout)
    println("Performing 1D FFTW")
    println("-------------------")
    flush(stdout)
    thetaft1D = FFTW.fft(theta1D) #plan1D * theta1D
end


#wait for all MPI processes to catch up
MPI.Barrier(comm)
if rank==root

    println("-------------------")
    flush(stdout)
    println("Check to see if 1D transform worked")
    println("-------------------")
    flush(stdout)

    
    errors = zeros(Float64, length(thetaft1D ))
    # @. errors = abs2(thetaf_glob1D - thetaf_glob[:,1,1])
    @. errors = abs2(thetaft1D - thetaf[:,1,1])
    NotZero = any(x->x<(1.0e-5), errors)  

    if NotZero
        println(rank, ": The Error is less than 1.0e-5")
        flush(stdout)
    elseif !NotZero
        println(rank, ": The Error is greater than 1.0e-5")
        flush(stdout)
    end


end


#wait for all MPI processes to catch up
MPI.Barrier(comm)

# Finally, we initialise the output that will hold ∇θ in Fourier space. Noting that ∇θ is a vector field, we choose to store it as a tuple of 3 PencilArrays.

gradTheta_hat = last(Theta1)#allocate_output(plan)
# # gradTheta_hat2 = allocate_output(plan, Val(3))
#wait for all MPI processes to catch up
MPI.Barrier(comm)
# # This is equivalent:
# # ∇θ_hat = ntuple(d -> similar(θ_hat), Val(3))
if rank==root
    println("-------------------")
    flush(stdout)
    println("FFT Output Plan of Grad Theta")
    println("-------------------")
    flush(stdout)
    # println(summary(gradTheta_hat))
    # flush(stdout)
end

println(size(gradTheta_hat))

#wait for all MPI processes to catch up
MPI.Barrier(comm)
# #fourier wave number vectors

if rank==root
    println("-------------------")
    flush(stdout)
    println("Creating FFT Wave Numbers")
    println("-------------------")
    flush(stdout)
    # println(summary(gradTheta_hat))
    # flush(stdout)
end
box_size = (2*pi, 2*pi)  # Lx, Ly, Lz
sample_rate = 2*pi .* dims ./ box_size

# In our case (Lx = 2π and Nx even), this gives kx = [0, 1, 2, ..., Nx/2].
kx = fftfreq(dims[1], sample_rate[1])
ky = ones(ComplexF64, dims[2])
#wait for all MPI processes to catch up
MPI.Barrier(comm)


if rank==root
    println("-------------------")
    flush(stdout)
    println("Creating FFT Wave Numbers Local Grid")
    println("-------------------")
    flush(stdout)
    println("Need to create a 2D local grid and Kvec grid")
    # println(summary(gradTheta_hat))
    # flush(stdout)
end
# #Local Indexing
# PencilFFTs.localgrid()
xs = range(1, Nx; length = Nx)
ys = range(1, Ny; length = Ny)
gridx = localgrid(theta_hat, (xs,ys))
yones = ones(Float64, Ny)
grid_fourier = localgrid(theta_hat, (kx,yones))

MPI.Barrier(comm)
println(grid_fourier)
MPI.Barrier(comm)
# #fourier wave number vectors
if rank==root
    println("-------------------")
    flush(stdout)
    println("Local Grid Indexing")
    println("-------------------")
end
flush(stdout)
println(summary(grid_fourier))
flush(stdout)

MPI.Barrier(comm)
# #computing gradient
if rank==root
    println("-------------------")
    flush(stdout)
    println("Computing FFT Gradient")
    println("-------------------")
    flush(stdout)
    # println(summary(gradTheta_hat))
    # flush(stdout)
end

@inbounds for I in eachindex(grid_fourier)
    # Wave number vector associated to current Cartesian index.
    #i, j = Tuple(I)
    kkx, yy= grid_fourier[I]
    # u = im * θ_hat[I]
    gradTheta_hat[I] = 1.0im * kkx * theta_hat[I]
end


if rank==root
    println("-------------------")
    flush(stdout)
    println("Gathering PencilFFT grad To Root")
    println("-------------------")
    flush(stdout)

end
gradtheta0 = gather(gradTheta_hat)
MPI.Barrier(comm)


# check gradient
if rank==root
    println("-------------------")
    flush(stdout)
    println("Check PencilFFT is the Same as FFTW gradient")
    println("-------------------")
    flush(stdout)
    println("Doing Root FFTW gradient calculation")

    # @. gradTheta_hat = 1.0im * grid_fourier * theta_hat
    gradthetaft1D = zeros(ComplexF64, size(thetaft1D))
    @. gradthetaft1D = 1.0im * kx * thetaft1D
end


#wait for all MPI processes to catch up
MPI.Barrier(comm)
if rank==root

    println("-------------------")
    flush(stdout)
    println("Check to see if 1D FFT gradient worked")
    println("-------------------")
    flush(stdout)

    
    errors = zeros(Float64, length(gradthetaft1D))
    # @. errors = abs2(thetaf_glob1D - thetaf_glob[:,1,1])
    @. errors = abs2(gradthetaft1D - gradtheta0[:,1,1])
    NotZero = any(x->x<(1.0e-5), errors)  

    if NotZero
        println(rank, ": The Error is less than 1.0e-5")
        flush(stdout)
    elseif !NotZero
        println(rank, ": The Error is greater than 1.0e-5")
        flush(stdout)
    end


end


MPI.Barrier(comm)
if rank==root

    println("-------------------")
    flush(stdout)
    println("Saving Data to HDF5")
    println("-------------------")
    flush(stdout)
end

# comm = get_comm(gradTheta_hat)

ff = open(PHDF5Driver(), "/dfs6/pub/ernestob/Julia/data/hdf5test.hdf", comm; write=true)

gradtheta1 = last(Theta1)
ff["gradTheta_hat1"] = gradtheta1
gradtheta2 = first(Theta2)
ff["gradTheta_hat2"] = gradtheta2
# ff["gradTheta_hat2"] = gradTheta_hat[2]
# ff["gradTheta_hat3"] = gradTheta_hat[3]

close(ff)
MPI.Barrier(comm)
if rank==root

    println("-------------------")
    flush(stdout)
    println("Finished Saving to HDF5")
    println("-------------------")
    flush(stdout)
end

Julia starts up fine with output file:

e]0;Julia e]0;Julia e]0;Julia e]0;Julia e]0;Julia e]0;Julia e]0;Julia e]0;Julia e]0;Julia e]0;Julia e]0;Julia e]0;Julia e]0;Julia e]0;Julia e]0;Julia e]0;Julia e]0;Julia e]0;Julia e]0;Julia e]0;Julia e]0;Julia e]0;Julia e]0;Julia e]0;Julia e]0;Julia e]0;Julia e]0;Julia e]0;Julia e]0;Julia e]0;Julia e]0;Julia e]0;Julia e]0;Julia e]0;Julia e]0;Julia e]0;Julia e]0;Julia e]0;Julia e]0;Julia e]0;Julia

But the error file:

[4142417] signal (11.128): Segmentation fault
in expression starting at /dfs6/pub/ernestob/Julia/NRL/DistrrubedSRRS/PencilTest.jl:68
ibv_reg_mr_iova2 at /lib64/libibverbs.so.1 (unknown line)
udcm_component_query at /opt/apps/openmpi/4.1.2/gcc/11.2.0/lib/openmpi/mca_btl_openib.so (unknown line)
opal_btl_openib_connect_base_select_for_local_port at /opt/apps/openmpi/4.1.2/gcc/11.2.0/lib/openmpi/mca_btl_openib.so (unknown line)
btl_openib_component_init at /opt/apps/openmpi/4.1.2/gcc/11.2.0/lib/openmpi/mca_btl_openib.so (unknown line)
mca_btl_base_select at /opt/apps/openmpi/4.1.2/gcc/11.2.0/lib/libopen-pal.so.40 (unknown line)
mca_bml_r2_component_init at /opt/apps/openmpi/4.1.2/gcc/11.2.0/lib/openmpi/mca_bml_r2.so (unknown line)
mca_bml_base_init at /opt/apps/openmpi/4.1.2/gcc/11.2.0/lib/libmpi.so (unknown line)
ompi_mpi_init at /opt/apps/openmpi/4.1.2/gcc/11.2.0/lib/libmpi.so (unknown line)
PMPI_Init_thread at /opt/apps/openmpi/4.1.2/gcc/11.2.0/lib/libmpi.so (unknown line)
MPI_Init_thread at /data/homezvol0/ernestob/.julia/packages/MPI/z2owj/src/api/generated_api.jl:1899 [inlined]
_init_thread at /data/homezvol0/ernestob/.julia/packages/MPI/z2owj/src/environment.jl:207 [inlined]
#Init#6 at /data/homezvol0/ernestob/.julia/packages/MPI/z2owj/src/environment.jl:127
Init at /data/homezvol0/ernestob/.julia/packages/MPI/z2owj/src/environment.jl:114
jfptr_Init_3236 at /data/homezvol0/ernestob/.julia/compiled/v1.10/MPI/nO0XF_RKZWp.so (unknown line)
_jl_invoke at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:3076
jl_apply at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
do_call at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/interpreter.c:126
eval_value at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/interpreter.c:223
eval_stmt_value at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/interpreter.c:174 [inlined]
eval_body at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/interpreter.c:617
jl_interpret_toplevel_thunk at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/interpreter.c:775
jl_toplevel_eval_flex at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/toplevel.c:934
jl_toplevel_eval_flex at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/toplevel.c:877
ijl_toplevel_eval_in at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/toplevel.c:985
eval at ./boot.jl:385 [inlined]
include_string at ./loading.jl:2076
_jl_invoke at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:3076
_include at ./loading.jl:2136
include at ./Base.jl:495
jfptr_include_46403.1 at /data/homezvol0/ernestob/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:3076
exec_options at ./client.jl:318
_start at ./client.jl:552
jfptr__start_82738.1 at /data/homezvol0/ernestob/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:3076
jl_apply at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
true_main at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/jlapi.c:582
jl_repl_entrypoint at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/jlapi.c:731
main at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/cli/loader_exe.c:58
__libc_start_main at /lib64/libc.so.6 (unknown line)
unknown function (ip: 0x4010b8)
Allocations: 1882102 (Pool: 1880314; Big: 1788); GC: 3

[4142462] signal (11.1): Segmentation fault
in expression starting at /dfs6/pub/ernestob/Julia/NRL/DistrrubedSRRS/PencilTest.jl:68
ibv_reg_mr_iova2 at /lib64/libibverbs.so.1 (unknown line)
udcm_component_query at /opt/apps/openmpi/4.1.2/gcc/11.2.0/lib/openmpi/mca_btl_openib.so (unknown line)
opal_btl_openib_connect_base_select_for_local_port at /opt/apps/openmpi/4.1.2/gcc/11.2.0/lib/openmpi/mca_btl_openib.so (unknown line)
btl_openib_component_init at /opt/apps/openmpi/4.1.2/gcc/11.2.0/lib/openmpi/mca_btl_openib.so (unknown line)
mca_btl_base_select at /opt/apps/openmpi/4.1.2/gcc/11.2.0/lib/libopen-pal.so.40 (unknown line)
mca_bml_r2_component_init at /opt/apps/openmpi/4.1.2/gcc/11.2.0/lib/openmpi/mca_bml_r2.so (unknown line)
mca_bml_base_init at /opt/apps/openmpi/4.1.2/gcc/11.2.0/lib/libmpi.so (unknown line)
ompi_mpi_init at /opt/apps/openmpi/4.1.2/gcc/11.2.0/lib/libmpi.so (unknown line)
PMPI_Init_thread at /opt/apps/openmpi/4.1.2/gcc/11.2.0/lib/libmpi.so (unknown line)
MPI_Init_thread at /data/homezvol0/ernestob/.julia/packages/MPI/z2owj/src/api/generated_api.jl:1899 [inlined]
_init_thread at /data/homezvol0/ernestob/.julia/packages/MPI/z2owj/src/environment.jl:207 [inlined]
#Init#6 at /data/homezvol0/ernestob/.julia/packages/MPI/z2owj/src/environment.jl:127
Init at /data/homezvol0/ernestob/.julia/packages/MPI/z2owj/src/environment.jl:114
jfptr_Init_3236 at /data/homezvol0/ernestob/.julia/compiled/v1.10/MPI/nO0XF_RKZWp.so (unknown line)
_jl_invoke at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:3076
jl_apply at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
do_call at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/interpreter.c:126
eval_value at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/interpreter.c:223
eval_stmt_value at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/interpreter.c:174 [inlined]
eval_body at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/interpreter.c:617
jl_interpret_toplevel_thunk at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/interpreter.c:775
jl_toplevel_eval_flex at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/toplevel.c:934
jl_toplevel_eval_flex at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/toplevel.c:877
ijl_toplevel_eval_in at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/toplevel.c:985
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------

[4142477] signal (15): Terminated
in expression starting at /dfs6/pub/ernestob/Julia/NRL/DistrrubedSRRS/PencilTest.jl:42
pthread_cond_wait at /lib64/libpthread.so.0 (unknown line)
uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883
ijl_task_get_next at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/partr.c:509
poptask at ./task.jl:985
wait at ./task.jl:994
#wait#645 at ./condition.jl:130
wait at ./condition.jl:125 [inlined]
#138 at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Distributed/src/remotecall.jl:281 [inlined]
lock at ./lock.jl:229
lock at ./condition.jl:78 [inlined]
#137 at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/Distributed/src/remotecall.jl:279
unknown function (ip: 0x7f4b01c43132)
_jl_invoke at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:3076
jl_apply at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
start_task at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/task.c:1238
unknown function (ip: (nil))
pthread_cond_wait at /lib64/libpthread.so.0 (unknown line)
uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883
ijl_task_get_next at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/partr.c:509
poptask at ./task.jl:985
wait at ./task.jl:994
#wait#645 at ./condition.jl:130
wait at ./condition.jl:125 [inlined]
_trywait at ./asyncevent.jl:142
wait at ./asyncevent.jl:159 [inlined]
profile_printing_listener at ./Base.jl:572
#1055 at ./Base.jl:608

[4142418] signal (15): Terminated
in expression starting at /dfs6/pub/ernestob/Julia/NRL/DistrrubedSRRS/PencilTest.jl:68
pthread_cond_wait at /lib64/libpthread.so.0 (unknown line)
uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883
ijl_task_get_next at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/partr.c:509
poptask at ./task.jl:985
wait at ./task.jl:994
task_done_hook at ./task.jl:675
jfptr_task_done_hook_75297.1 at /data/homezvol0/ernestob/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:3076
jl_apply at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
jl_finish_task at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/task.c:320
jl_threadfun at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/partr.c:191
start_thread at /lib64/libpthread.so.0 (unknown line)
clone at /lib64/libc.so.6 (unknown line)
unknown function (ip: (nil))
pthread_cond_wait at /lib64/libpthread.so.0 (unknown line)
uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883
ijl_task_get_next at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/partr.c:509
poptask at ./task.jl:985
.
.
.
.
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 0 on node hpc3-21-05 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

There seems to be a problem with the MPI setup over IB. Does mpiexec work with some simple test-program rather than the julia script? There could be an IB problem on some cluster nodes.

From what the HPC group tells me is that all nodes are on infiniband.
Is there something in the error that makes you say it’s infiniband related?

I’ll check to see if mpiexec works with other non-julia examples

The top of the stack says:

This is inside the IB setup of OpenMPI (called from your MPI_Init()). It could be that julia MPI calls it with inappropriate arguments, but then it should segfault every time, not sometimes. My guess, it’s just a guess, is that this happens on some specific nodes in the cluster, and occasionally your job starts on such a node. (It’s some years ago I managed a large HPC installation and had to fight against their demons, so I may be wrong.)

I do not think these environment variables work anymore.

https://juliaio.github.io/HDF5.jl/stable/mpi/

@sgaure got it. Thanks for the direction! I did find something since our HPC moved to Rocky Linux that there was some issues with OpenMPI and the IB.
Here’s the website: Job example scripts — RCIC 1.0.0 documentation

Which shows in the example that;

#SBATCH --constraint="mlx5_ib"   ## run only on nodes with updated IB firmware
# set these UCX parameters for openmpi
export OMP_NUM_THREADS=1
export UCX_TLS=rc,mm
export UCX_NET_DEVICES=mlx5_0:1

should be set and it looks like they suggest using mpirun with:

# original command is updated  with: -mca pml ucx 
mpirun -np $SLURM_NTASKS -mca pml ucx vasp_std

Although I notice that their might be a missing - in front of what should be --mca?
@mkitti Thanks. With all of the above I’ve reformulated my slurm script. I’ve constraint to just intel processors to better trouble shoot:

#!/bin/bash

#SBATCH -p free
#SBATCH --job-name=SRRS
#SBATCH --constraint="mlx5_ib&intel"
#SBATCH --nodes=1-4
#SBATCH --ntasks=20
#SBATCH --error=err-%j.log
#SBATCH --output=output-%j.log
#SBATCH --mem-per-cpu=8G
#SBATCH --mail-type=begin
#SBATCH --mail-type=end         # send email if job fails
#SBATCH --mail-type=fail         # send email if job fails
#SBATCH --mail-user=ernestob@uci.edu


echo $SLURM_JOB_NUM_NODES >> "julia-$SLURM_JOB_ID.out"
echo $SLURM_NTASKS >> "julia-$SLURM_JOB_ID.out"
echo $SLURM_CPUS_PER_TASK >> "julia-$SLURM_JOB_ID.out"
module load openmpi/4.1.2/gcc.11.2.0
module load hdf5/1.14.1/gcc.11.2.0-openmpi.4.1.2

# set these UCX parameters for openmpi
export OMP_NUM_THREADS=1
export UCX_TLS=rc,mm
export UCX_NET_DEVICES=mlx5_0:1
export UCX_ERROR_SIGNALS="SIGILL,SIGBUS,SIGFPE"
export JULIA_CPU_TARGET="generic;skylake-avx512,clone_all; skylake,clone_all;icelake-server,clone_all;"
export JULIA_MPI_LIBRARY="/opt/apps/openmpi/4.1.2/gcc/11.2.0/lib/libmpi"



echo "Precompiling Master" >> "julia-$SLURM_JOB_ID.out"

julia --project=. -e 'using Pkg;  Pkg.instantiate(); Pkg.precompile()' >> "julia-$SLURM_JOB_ID.out"

mpirun --mca pml ucx -n $SLURM_NTASKS /data/homezvol0/ernestob/.juliaup/bin/julia --project=. /dfs6/pub/ernestob/Julia/NRL/DistributedSRRSMASTER/PencilTest.jl >> "julia-$SLURM_JOB_ID.out"

Below is my Project.toml:


[deps]
AbstractFFTs = "621f4979-c628-5d54-868e-fcf4e3e8185c"
BPMDistributed = "634e16ad-23c4-4c3d-8de6-2ae1fdbe117a"
BenchmarkTools = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf"
FFTW = "7a1cc6ca-52ef-59f5-83cd-3a7055c09341"
HDF5 = "f67ccb44-e63f-5c2f-98bd-6dc0ccc4ba2f"
IJulia = "7073ff75-c697-5162-941a-fcdaad2a7d2a"
Krylov = "ba0b0d4f-ebba-5204-a429-3ac8c609bfb7"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
LinearOperators = "5c8ed15e-5a4c-59e4-a42b-c7e8811fb125"
MPI = "da04e1cc-30fd-572f-bb4f-1f8673147195"
MPIPreferences = "3da0fdf6-3ccc-4f1b-acd9-58baa6c99267"
PencilArrays = "0e08944d-e94e-41b1-9406-dcf66b6a9d2e"
PencilFFTs = "4a48f351-57a6-4416-9ec4-c37015456aae"
PkgTemplates = "14b8a8f1-9102-5b29-a752-f990bacb7fe1"
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
Printf = "de0858da-6303-5e67-8744-51eddeeeb8d7"
Revise = "295af30f-e4ad-537b-8983-00126c2a3abe"
SRRSCalc = "c215410e-53fa-4750-bffd-73de35b87ca4"
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
TimerOutputs = "a759f4b9-e2f1-59dc-863e-4aeb61b1ea8f"

[HDF5]
libhdf5 = "/path/to/your/libhdf5.so"
libhdf5_hl = "/path/to/your/libhdf5_hl.so"

[MPIPreferences]
__clear__ = ["preloads_env_switch"]
_format = "1.0"
abi = "OpenMPI"
binary = "system"
cclibs = []
libmpi = "/opt/apps/openmpi/4.1.2/gcc/11.2.0/lib/libmpi"
mpiexec = "/opt/apps/openmpi/4.1.2/gcc/11.2.0/bin/mpirun"
preloads = []

[extras]
HDF5_jll = "0234f1f7-429e-5d53-9886-15a909be8d59"

With my PencilTest.jl below. As an FYI, I am trying to test out part of this example: Gradient of a scalar field · PencilFFTs.jl
But with in place transforms.

using Pkg
using MPI
#using MPIPreferences
# ENV["JULIA_MPI_BINARY"]="system"
using BenchmarkTools
using Test
using LinearAlgebra
using FFTW
using Krylov
using LinearOperators
using HDF5
using PencilArrays
using PencilFFTs
using TimerOutputs
using PencilArrays.PencilIO
using Random
using AbstractFFTs: fftfreq, rfftfreq


#loading personal modules
# println("Loading SRRSCalc.jl and BPMDistributed.jl Functions");

# # # include("SRRSCalc.jl")
# # # include("BPMFFT.jl")
# # #include("SRRSCalcMod.jl")
# # #include("BPMFFTMod.jl")
# # # Pkg.develop(PackageSpec(path="./BPMFFT")) # dev ./BPMFFT
# # # Pkg.develop(PackageSpec(path="./SRRSCalc"))
# # # using .SRRSCalcMod
# # # using .BPMFFTMod
# push!(LOAD_PATH, "/dfs6/pub/ernestob/Julia/NRL/DistributedSRRSMASTER/SRRSCalc");
# push!(LOAD_PATH, "/dfs6/pub/ernestob/Julia/NRL/DistributedSRRSMASTER/BPMDistributed");

# using SRRSCalc
# using BPMDistributed


MPI.Init()
comm = MPI.COMM_WORLD       # MPI communicator
rank = MPI.Comm_rank(comm)  # rank of local process
root = 0

MPI.Barrier(comm)

if rank == root
    println("Finished Activating Project and Loading Packages")
    flush(stdout)
    
    
    println("-------------------")
    flush(stdout)
    println("Is ther HDF5 Parallel?")
    println("-------------------")
    flush(stdout)
    println(HDF5.has_parallel())
    flush(stdout)
    
    println("-------------------")
    flush(stdout)
    println("Beginning PencilFFT Tutorial")
    println("-------------------")
    flush(stdout)

end
#wait for all MPI processes to catch up
MPI.Barrier(comm)

# Input data dimensions (Nx × Ny × Nz)
Nx = 64
Ny = 32
dims = (64, 32)#(64, 32, 12)

# Apply a 2D real-to-complex (r2c) FFT.
transform = (Transforms.FFT(), Transforms.NoTransform())#, Transforms.NoTransform())
inplacefft = (Transforms.FFT!(), Transforms.NoTransform!())

pen = Pencil(dims,  comm)
#wait for all MPI processes to catch up
MPI.Barrier(comm)
if rank==root

    println("-------------------")
    flush(stdout)
    println("Pencils")
    println("-------------------")
    flush(stdout)
    println(summary(pen))
    flush(stdout)

end
#wait for all MPI processes to catch up
MPI.Barrier(comm)

if rank==root
    println("-------------------")
    flush(stdout)
    println("Creating Pencil Array")
    flush(stdout)
    println("-------------------")
    flush(stdout)
end
# Create plan
#plan = PencilFFTPlan(pen, transform)#, permute_dims = Val(false))
planinplace = PencilFFTPlan(pen, inplacefft )

MPI.Barrier(comm)
# Allocate data and initialise field
# theta = allocate_input(plan)
# randn!(theta)
Theta1 = allocate_input(planinplace)
Theta2 = allocate_input(planinplace)
theta1 = first(Theta1)
theta2 = first(Theta2)
randn!(theta1)
#@. theta2 = theta1
randn!(theta2)

if rank==root
    println("-------------------")
    flush(stdout)
    println("Gathering Pencil Array theta To Root")
    println("-------------------")
    flush(stdout)

end

theta0 = gather(theta1)

MPI.Barrier(comm)

if rank==root
    println("-------------------")
    flush(stdout)
    println("Checking Gathering To Root")
    println("-------------------")
    flush(stdout)

    println("Size of gather theta0: ",size(theta0))
    flush(stdout)
    
end
MPI.Barrier(comm)

if rank==root
    println("-------------------")
    flush(stdout)
    println("Creating 1D Array theta1D in Root")
    flush(stdout)
    println("-------------------")
    flush(stdout)
    theta1D = zeros(ComplexF64, dims[1])#allocate_input(plan1D)
    println("Size of this array: ", size(theta1D))
    flush(stdout)

    println("-------------------")
    flush(stdout)
    println("Setting 1D Array theta1D to gather theta0[:,1,1]" )
    flush(stdout)
    println("-------------------")
    flush(stdout)

    @. theta1D = theta0[:,1]#theta0[:,1,1]
    
end



MPI.Barrier(comm)
# theta_glob = global_view(theta)
# @. theta1D = theta_glob[:,1,1]


if rank==root
    println("-------------------")
    flush(stdout)
    println("Checking Sizes of Pencil Array")
    println("-------------------")
    flush(stdout)
    
end

MPI.Barrier(comm)

flush(stdout)
println(size(theta1))
flush(stdout)
MPI.Barrier(comm)

if rank==root
    println("-------------------")
    flush(stdout)
    println("Performing Pencil FFT")
    println("-------------------")
    flush(stdout)
    
end


# theta_hat = plan * theta
planinplace * Theta1;
theta_hat = last(Theta1)
thetaf = gather(theta_hat)

MPI.Barrier(comm)

if rank==root
    println("-------------------")
    flush(stdout)
    println("Checking size of Pencil FFT Output")
    println("-------------------")
    flush(stdout)
    
end
println(size(theta_hat))
MPI.Barrier(comm)

if rank==root
    println("-------------------")
    flush(stdout)
    println("Performing 1D FFTW")
    println("-------------------")
    flush(stdout)
    thetaft1D = FFTW.fft(theta1D) #plan1D * theta1D
end

#wait for all MPI processes to catch up
MPI.Barrier(comm)
if rank==root

    println("-------------------")
    flush(stdout)
    println("Check to see if 1D transform worked")
    println("-------------------")
    flush(stdout)

    
    errors = zeros(Float64, length(thetaft1D ))
    # @. errors = abs2(thetaf_glob1D - thetaf_glob[:,1,1])
    @. errors = abs2(thetaft1D - thetaf[:,1,1])
    NotZero = any(x->x<(1.0e-5), errors)  

    if NotZero
        println(rank, ": The Error is less than 1.0e-5")
        flush(stdout)
    elseif !NotZero
        println(rank, ": The Error is greater than 1.0e-5")
        flush(stdout)
    end


end

#wait for all MPI processes to catch up
MPI.Barrier(comm)

# Finally, we initialise the output that will hold ∇θ in Fourier space. Noting that ∇θ is a vector field, we choose to store it as a tuple of 3 PencilArrays.

gradTheta_hat = last(Theta1)#allocate_output(plan)
MPI.Barrier(comm)
# # This is equivalent:
# # ∇θ_hat = ntuple(d -> similar(θ_hat), Val(3))
if rank==root
    println("-------------------")
    flush(stdout)
    println("FFT Output Plan of Grad Theta")
    println("-------------------")
    flush(stdout)
    # println(summary(gradTheta_hat))
    # flush(stdout)
end

println(size(gradTheta_hat))

#wait for all MPI processes to catch up
MPI.Barrier(comm)
# #fourier wave number vectors

if rank==root
    println("-------------------")
    flush(stdout)
    println("Creating FFT Wave Numbers")
    println("-------------------")
    flush(stdout)
    # println(summary(gradTheta_hat))
    # flush(stdout)
end
box_size = (2*pi, 2*pi)  # Lx, Ly, Lz
sample_rate = 2*pi .* dims ./ box_size

# In our case (Lx = 2π and Nx even), this gives kx = [0, 1, 2, ..., Nx/2].
kx = fftfreq(dims[1], sample_rate[1])
ky = ones(ComplexF64, dims[2])
#wait for all MPI processes to catch up
MPI.Barrier(comm)

MPI.Barrier(comm)
# #fourier wave number vectors

if rank==root
    println("-------------------")
    flush(stdout)
    println("Creating FFT Wave Numbers Local Grid")
    println("-------------------")
    flush(stdout)
    println("Need to create a 2D local grid and Kvec grid")
    # println(summary(gradTheta_hat))
    # flush(stdout)
end
# #Local Indexing
# PencilFFTs.localgrid()
xs = range(1, Nx; length = Nx)
ys = range(1, Ny; length = Ny)
gridx = localgrid(theta_hat, (xs,ys))
yones = ones(Float64, Ny)
grid_fourier = localgrid(theta_hat, (kx,yones))

MPI.Barrier(comm)
println(grid_fourier)
MPI.Barrier(comm)
# #fourier wave number vectors
if rank==root
    println("-------------------")
    flush(stdout)
    println("Local Grid Indexing")
    println("-------------------")
end
flush(stdout)
println(summary(grid_fourier))
flush(stdout)

MPI.Barrier(comm)
# #computing gradient
if rank==root
    println("-------------------")
    flush(stdout)
    println("Computing FFT Gradient")
    println("-------------------")
    flush(stdout)
    # println(summary(gradTheta_hat))
    # flush(stdout)
end

@inbounds for I in eachindex(grid_fourier)
    # Wave number vector associated to current Cartesian index.
    #i, j = Tuple(I)
    kkx, yy= grid_fourier[I]
    # u = im * θ_hat[I]
    gradTheta_hat[I] = 1.0im * kkx * theta_hat[I]
end


if rank==root
    println("-------------------")
    flush(stdout)
    println("Gathering PencilFFT grad To Root")
    println("-------------------")
    flush(stdout)

end
gradtheta0 = gather(gradTheta_hat)
MPI.Barrier(comm)


# check gradient
if rank==root
    println("-------------------")
    flush(stdout)
    println("Check PencilFFT is the Same as FFTW gradient")
    println("-------------------")
    flush(stdout)
    println("Doing Root FFTW gradient calculation")

    # @. gradTheta_hat = 1.0im * grid_fourier * theta_hat
    gradthetaft1D = zeros(ComplexF64, size(thetaft1D))
    @. gradthetaft1D = 1.0im * kx * thetaft1D
end

#wait for all MPI processes to catch up
MPI.Barrier(comm)
if rank==root

    println("-------------------")
    flush(stdout)
    println("Check to see if 1D FFT gradient worked")
    println("-------------------")
    flush(stdout)

    
    errors = zeros(Float64, length(gradthetaft1D))
    # @. errors = abs2(thetaf_glob1D - thetaf_glob[:,1,1])
    @. errors = abs2(gradthetaft1D - gradtheta0[:,1,1])
    NotZero = any(x->x<(1.0e-5), errors)  

    if NotZero
        println(rank, ": The Error is less than 1.0e-5")
        flush(stdout)
    elseif !NotZero
        println(rank, ": The Error is greater than 1.0e-5")
        flush(stdout)
    end


end


MPI.Barrier(comm)
if rank==root

    println("-------------------")
    flush(stdout)
    println("Saving Data to HDF5")
    println("-------------------")
    flush(stdout)
end

# comm = get_comm(gradTheta_hat)

ff = open(PHDF5Driver(), "/dfs6/pub/ernestob/Julia/NRL/DistributedSRRSMASTER/data/hdf5test.hdf", comm; write=true)

gradtheta1 = last(Theta1)
ff["gradTheta_hat1"] = gradtheta1
gradtheta2 = first(Theta2)
ff["gradTheta_hat2"] = gradtheta2
# ff["gradTheta_hat2"] = gradTheta_hat[2]
# ff["gradTheta_hat3"] = gradTheta_hat[3]

close(ff)
MPI.Barrier(comm)
if rank==root

    println("-------------------")
    flush(stdout)
    println("Finished Saving to HDF5")
    println("-------------------")
    flush(stdout)
end

The errors I got for this run:

--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port.  As such, the openib BTL (OpenFabrics
support) will be disabled for this port.

  Local host:           hpc3-14-13
  Local device:         irdma1
  Local port:           1
  CPCs attempted:       rdmacm, udcm
--------------------------------------------------------------------------
[hpc3-14-11:290255] 1 more process has sent help message help-mpi-btl-openib-cpc-base.txt / no cpcs for port
[hpc3-14-11:290255] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

Although, I’m still getting the past errors in certain cases. Sometimes, everything loads properly and then it hits MPI.Init() or one of the MPI.Barrier(comm) lines and crashes with the seg fault error:

[2073321] signal (11.128): Segmentation fault
in expression starting at /dfs6/pub/ernestob/Julia/NRL/DistributedSRRSMASTER/PencilTest.jl:39
ibv_reg_mr_iova2 at /lib64/libibverbs.so.1 (unknown line)
udcm_component_query at /opt/apps/openmpi/4.1.2/gcc/11.2.0/lib/openmpi/mca_btl_openib.so (unknown line)
opal_btl_openib_connect_base_select_for_local_port at /opt/apps/openmpi/4.1.2/gcc/11.2.0/lib/openmpi/mca_btl_openib.so (unknown line)
btl_openib_component_init at /opt/apps/openmpi/4.1.2/gcc/11.2.0/lib/openmpi/mca_btl_openib.so (unknown line)
mca_btl_base_select at /opt/apps/openmpi/4.1.2/gcc/11.2.0/lib/libopen-pal.so.40 (unknown line)
mca_bml_r2_component_init at /opt/apps/openmpi/4.1.2/gcc/11.2.0/lib/openmpi/mca_bml_r2.so (unknown line)
mca_bml_base_init at /opt/apps/openmpi/4.1.2/gcc/11.2.0/lib/libmpi.so (unknown line)
ompi_mpi_init at /opt/apps/openmpi/4.1.2/gcc/11.2.0/lib/libmpi.so (unknown line)
PMPI_Init_thread at /opt/apps/openmpi/4.1.2/gcc/11.2.0/lib/libmpi.so (unknown line)
MPI_Init_thread at /data/homezvol0/ernestob/.julia/packages/MPI/z2owj/src/api/generated_api.jl:1899 [inlined]
_init_thread at /data/homezvol0/ernestob/.julia/packages/MPI/z2owj/src/environment.jl:207 [inlined]
#Init#6 at /data/homezvol0/ernestob/.julia/packages/MPI/z2owj/src/environment.jl:127
Init at /data/homezvol0/ernestob/.julia/packages/MPI/z2owj/src/environment.jl:114
jfptr_Init_3230 at /data/homezvol0/ernestob/.julia/compiled/v1.10/MPI/nO0XF_T1lKE.so (unknown line)
_jl_invoke at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:3076
jl_apply at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
do_call at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/interpreter.c:126
eval_value at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/interpreter.c:223
eval_stmt_value at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/interpreter.c:174 [inlined]
eval_body at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/interpreter.c:617
jl_interpret_toplevel_thunk at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/interpreter.c:775
jl_toplevel_eval_flex at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/toplevel.c:934
jl_toplevel_eval_flex at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/toplevel.c:877
ijl_toplevel_eval_in at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/toplevel.c:985

Both -mca and --mca will work (this is an old deadly serious schism in unix/linux, openmpi has chosen to please both schools).

That aside, since you sometimes get that error message about OpenFabrics, it seems something is wrong with some setup, e.g. some node has an inoperable IB (perhaps some process which should be gone is holding onto it, or something), and you occasionally get that one. E.g. the hpc3-14-13 node in your error message.

I also see there is a warning at Running UCX — OpenUCX documentation about disabling a module called uct (I don’t know what that is, and I have never used ucx) because it can mess things up, but I suppose the cluster admins has a custom built openmpi where it has already been disabled.

One thing, which I don’t know is a problem. When you set environment variables in slurm scripts they are not automatically exported to the remote nodes (only OMPI_* variables, and of course SLURM_* variables). It could be that the UCX_* and/or JULIA_* environment variables are needed over there. There is an option --tune filename which can be used to transfer environment variables and other parameters, see e.g. mpirun(1) man page (version 4.1.6)