How to submit distributed memory jobs to cluster?

I have a run_jobs.jl file that supports distributed memory parallelism. How should the beginning of run_jobs.jl look like if I want to run it on a cluster (univa grid engine) via qsub?

I can use ClusterManagers.jl to run my code interactively by

# addprocs_sge will request 16 cores (possibly on different nodes)
using ClusterManagers, Distributed
ClusterManagers.addprocs_sge(16; qsub_flags=`-l h_rt=24:00:00,h_data=4G,arch=intel-gold-61\*`)

# run my actual code that supports distributed memory
...

But this job will run for a long time, and I don’t want to wait until it finishes.

If I qsub a single-core job to run the script above, I get this error:

Base.IOError("could not spawn `qsub -N julia-2229 -wd /u/home/b/biona001 -terse -j y -R y -t 1-4 -V -l 'h_rt=24:00:00,h_data=4G,arch=intel-gold-61*'`: no such file or directory (ENOENT)", -2)`

I also tried to submit a distributed memory job (specify -pe dc* 16 in my shell script), and in run_jobs.jl the heading is simply using Distributed; addprocs(16), the job will crash and cause core dumps. If I change the job submission to shared memory node (specify -pe shared 16), the job will run successfully on a single node with 16 cores, but this will massively increase queue time.

Any tip is appreciated. Thanks!