I am running a simple script on a cluster. I simply perform many instances of the same script.
I get the following error:
<E2><94><8C> Warning: The call to compilecache failed to create a usable precompiled cache file for FileIO [5789e2e9-d7fb-5bc7-8068-2c6fae9b9549]
<E2><94><82> exception = ArgumentError: Invalid checksum in cache file /home/labs/orenraz/roiho/.julia/compiled/v1.9/FileIO/6iKRU_gncnE.so.
<E2><94><94> @ Base loading.jl:1818
Followed by
<E2><94><94> @ Base loading.jl:1793
<E2><94><8C> Warning: Module FileIO with build ID ffffffff-ffff-ffff-001c-a130a3b6bf6f is missing from the cache.
<E2><94><82> This may mean FileIO [5789e2e9-d7fb-5bc7-8068-2c6fae9b9549] does not support precompilation but is imported by a module that does.
My script was working on Julia 1.7.2, but I just upgraded to 1.9.3, and it stopped working. Does anyone know why?
Julia 1.9 caches compiled code, Julia 1.7.2 did not. However, this is only a warning so Iβm assuming your code is still working? If not, please share more details.
I think you might be right and that the code may still be running well. I am checking this now and will update.
@carstenbauer It seems that every time I open an ssh session with the cluster, I need to precompile my project. Is that the case in julia 1.9? Is there a way to avoid it?
I have had a lot of difficulty in getting things to work on cluster as well. I think at this point I just compile on one node first and hope for the best.
Getting local (login node) cache to be reused seems impossible now
where you may need to replace the different architectures as appropriate. Please do this before you start Julia. This will lead to the compiled images being compatible with different CPU instruction set architectures.
Thanks very much!
I am looking at the documentation, but I am not sure how to find out what are my appropriate settings. Do you know if there is some guidance somewhere for this?
I think you need to check what CPU is being used on the cluster nodes, and set the architecture accordingly. The list of accepted architectures may be found using julia -C help.
Thank you.
Though I do not know what to do with the information I got:
julia> cpuinfo()
Cpu Property Value
ββββββββββββββββββ ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Brand Intel(R) Xeon(R) Gold 5320 CPU @ 2.20GHz
Vendor :Intel
Architecture :UnknownIntel
Model Family: 0x06, Model: 0x6a, Stepping: 0x06, Type: 0x00
Cores 26 physical cores, 52 logical cores (on executing CPU)
Hyperthreading hardware capability detected
Clock Frequencies 2200 / 3400 MHz (base/max), 100 MHz bus
Data Cache Level 1:3 : (48, 1280, 39936) kbytes
64 byte cache line size
Address Size 57 bits virtual, 46 bits physical
SIMD 512 bit = 64 byte max. SIMD vector size
Time Stamp Counter TSC is accessible via `rdtsc`
TSC runs at constant rate (invariant from clock frequency)
Perf. Monitoring Performance Monitoring Counters (PMC) revision 5
Available hardware counters per logical core:
4 fixed-function counters of 48 bit width
8 general-purpose counters of 48 bit width
Hypervisor No
I donβt find something that matches this here:
$ julia -C help
Available CPUs for this target:
alderlake - Select the alderlake processor.
amdfam10 - Select the amdfam10 processor.
athlon - Select the athlon processor.
athlon-4 - Select the athlon-4 processor.
athlon-fx - Select the athlon-fx processor.
athlon-mp - Select the athlon-mp processor.
athlon-tbird - Select the athlon-tbird processor.
athlon-xp - Select the athlon-xp processor.
athlon64 - Select the athlon64 processor.
athlon64-sse3 - Select the athlon64-sse3 processor.
atom - Select the atom processor.
barcelona - Select the barcelona processor.
bdver1 - Select the bdver1 processor.
bdver2 - Select the bdver2 processor.
bdver3 - Select the bdver3 processor.
bdver4 - Select the bdver4 processor.
bonnell - Select the bonnell processor.
broadwell - Select the broadwell processor.
btver1 - Select the btver1 processor.
btver2 - Select the btver2 processor.
c3 - Select the c3 processor.
c3-2 - Select the c3-2 processor.
cannonlake - Select the cannonlake processor.
cascadelake - Select the cascadelake processor.
cooperlake - Select the cooperlake processor.
core-avx-i - Select the core-avx-i processor.
core-avx2 - Select the core-avx2 processor.
core2 - Select the core2 processor.
corei7 - Select the corei7 processor.
corei7-avx - Select the corei7-avx processor.
generic - Select the generic processor.
geode - Select the geode processor.
goldmont - Select the goldmont processor.
goldmont-plus - Select the goldmont-plus processor.
haswell - Select the haswell processor.
i386 - Select the i386 processor.
i486 - Select the i486 processor.
i586 - Select the i586 processor.
i686 - Select the i686 processor.
icelake-client - Select the icelake-client processor.
icelake-server - Select the icelake-server processor.
ivybridge - Select the ivybridge processor.
k6 - Select the k6 processor.
k6-2 - Select the k6-2 processor.
k6-3 - Select the k6-3 processor.
k8 - Select the k8 processor.
k8-sse3 - Select the k8-sse3 processor.
knl - Select the knl processor.
knm - Select the knm processor.
lakemont - Select the lakemont processor.
nehalem - Select the nehalem processor.
nocona - Select the nocona processor.
opteron - Select the opteron processor.
opteron-sse3 - Select the opteron-sse3 processor.
penryn - Select the penryn processor.
pentium - Select the pentium processor.
pentium-m - Select the pentium-m processor.
pentium-mmx - Select the pentium-mmx processor.
pentium2 - Select the pentium2 processor.
pentium3 - Select the pentium3 processor.
pentium3m - Select the pentium3m processor.
pentium4 - Select the pentium4 processor.
pentium4m - Select the pentium4m processor.
pentiumpro - Select the pentiumpro processor.
prescott - Select the prescott processor.
rocketlake - Select the rocketlake processor.
sandybridge - Select the sandybridge processor.
sapphirerapids - Select the sapphirerapids processor.
silvermont - Select the silvermont processor.
skx - Select the skx processor.
skylake - Select the skylake processor.
skylake-avx512 - Select the skylake-avx512 processor.
slm - Select the slm processor.
tigerlake - Select the tigerlake processor.
tremont - Select the tremont processor.
westmere - Select the westmere processor.
winchip-c6 - Select the winchip-c6 processor.
winchip2 - Select the winchip2 processor.
x86-64 - Select the x86-64 processor.
x86-64-v2 - Select the x86-64-v2 processor.
x86-64-v3 - Select the x86-64-v3 processor.
x86-64-v4 - Select the x86-64-v4 processor.
yonah - Select the yonah processor.
znver1 - Select the znver1 processor.
znver2 - Select the znver2 processor.
znver3 - Select the znver3 processor.
Available features for this target:
16bit-mode - 16-bit mode (i8086).
32bit-mode - 32-bit mode (80386).
3dnow - Enable 3DNow! instructions.
3dnowa - Enable 3DNow! Athlon instructions.
64bit - Support 64-bit instructions.
64bit-mode - 64-bit mode (x86_64).
adx - Support ADX instructions.
aes - Enable AES instructions.
amx-bf16 - Support AMX-BF16 instructions.
amx-int8 - Support AMX-INT8 instructions.
amx-tile - Support AMX-TILE instructions.
avx - Enable AVX instructions.
avx2 - Enable AVX2 instructions.
avx512bf16 - Support bfloat16 floating point.
avx512bitalg - Enable AVX-512 Bit Algorithms.
avx512bw - Enable AVX-512 Byte and Word Instructions.
avx512cd - Enable AVX-512 Conflict Detection Instructions.
avx512dq - Enable AVX-512 Doubleword and Quadword Instructions.
avx512er - Enable AVX-512 Exponential and Reciprocal Instructions.
avx512f - Enable AVX-512 instructions.
avx512fp16 - Support 16-bit floating point.
avx512ifma - Enable AVX-512 Integer Fused Multiple-Add.
avx512pf - Enable AVX-512 PreFetch Instructions.
avx512vbmi - Enable AVX-512 Vector Byte Manipulation Instructions.
avx512vbmi2 - Enable AVX-512 further Vector Byte Manipulation Instructions.
avx512vl - Enable AVX-512 Vector Length eXtensions.
avx512vnni - Enable AVX-512 Vector Neural Network Instructions.
avx512vp2intersect - Enable AVX-512 vp2intersect.
avx512vpopcntdq - Enable AVX-512 Population Count Instructions.
avxvnni - Support AVX_VNNI encoding.
bmi - Support BMI instructions.
bmi2 - Support BMI2 instructions.
branchfusion - CMP/TEST can be fused with conditional branches.
cldemote - Enable Cache Demote.
clflushopt - Flush A Cache Line Optimized.
clwb - Cache Line Write Back.
clzero - Enable Cache Line Zero.
cmov - Enable conditional move instructions.
crc32 - Enable SSE 4.2 CRC32 instruction.
cx16 - 64-bit with cmpxchg16b.
cx8 - Support CMPXCHG8B instructions.
enqcmd - Has ENQCMD instructions.
ermsb - REP MOVS/STOS are fast.
f16c - Support 16-bit floating point conversion instructions.
false-deps-lzcnt-tzcnt - LZCNT/TZCNT have a false dependency on dest register.
false-deps-popcnt - POPCNT has a false dependency on dest register.
fast-11bytenop - Target can quickly decode up to 11 byte NOPs.
fast-15bytenop - Target can quickly decode up to 15 byte NOPs.
fast-7bytenop - Target can quickly decode up to 7 byte NOPs.
fast-bextr - Indicates that the BEXTR instruction is implemented as a single uop with good throughput.
fast-gather - Indicates if gather is reasonably fast.
fast-hops - Prefer horizontal vector math instructions (haddp, phsub, etc.) over normal vector instructions with shuffles.
fast-lzcnt - LZCNT instructions are as fast as most simple integer ops.
fast-movbe - Prefer a movbe over a single-use load + bswap / single-use bswap + store.
fast-scalar-fsqrt - Scalar SQRT is fast (disable Newton-Raphson).
fast-scalar-shift-masks - Prefer a left/right scalar logical shift pair over a shift+and pair.
fast-shld-rotate - SHLD can be used as a faster rotate.
fast-variable-crosslane-shuffle - Cross-lane shuffles with variable masks are fast.
fast-variable-perlane-shuffle - Per-lane shuffles with variable masks are fast.
fast-vector-fsqrt - Vector SQRT is fast (disable Newton-Raphson).
fast-vector-shift-masks - Prefer a left/right vector logical shift pair over a shift+and pair.
fma - Enable three-operand fused multiple-add.
fma4 - Enable four-operand fused multiple-add.
fsgsbase - Support FS/GS Base instructions.
fsrm - REP MOVSB of short lengths is faster.
fxsr - Support fxsave/fxrestore instructions.
gfni - Enable Galois Field Arithmetic Instructions.
hreset - Has hreset instruction.
idivl-to-divb - Use 8-bit divide for positive values less than 256.
idivq-to-divl - Use 32-bit divide for positive values less than 2^32.
invpcid - Invalidate Process-Context Identifier.
kl - Support Key Locker kl Instructions.
lea-sp - Use LEA for adjusting the stack pointer.
lea-uses-ag - LEA instruction needs inputs at AG stage.
lvi-cfi - Prevent indirect calls/branches from using a memory operand, and precede all indirect calls/branches from a register with an LFENCE instruction to serialize control flow. Also decompose RET instructions into a POP+LFENCE+JMP sequence..
lvi-load-hardening - Insert LFENCE instructions to prevent data speculatively injected into loads from being used maliciously..
lwp - Enable LWP instructions.
lzcnt - Support LZCNT instruction.
macrofusion - Various instructions can be fused with conditional branches.
mmx - Enable MMX instructions.
movbe - Support MOVBE instruction.
movdir64b - Support movdir64b instruction.
movdiri - Support movdiri instruction.
mwaitx - Enable MONITORX/MWAITX timer functionality.
nopl - Enable NOPL instruction.
pad-short-functions - Pad short functions.
pclmul - Enable packed carry-less multiplication instructions.
pconfig - platform configuration instruction.
pku - Enable protection keys.
popcnt - Support POPCNT instruction.
prefer-128-bit - Prefer 128-bit AVX instructions.
prefer-256-bit - Prefer 256-bit AVX instructions.
prefer-mask-registers - Prefer AVX512 mask registers over PTEST/MOVMSK.
prefetchwt1 - Prefetch with Intent to Write and T1 Hint.
prfchw - Support PRFCHW instructions.
ptwrite - Support ptwrite instruction.
rdpid - Support RDPID instructions.
rdrnd - Support RDRAND instruction.
rdseed - Support RDSEED instruction.
retpoline - Remove speculation of indirect branches from the generated code, either by avoiding them entirely or lowering them with a speculation blocking construct.
retpoline-external-thunk - When lowering an indirect call or branch using a `retpoline`, rely on the specified user provided thunk rather than emitting one ourselves. Only has effect when combined with some other retpoline feature.
retpoline-indirect-branches - Remove speculation of indirect branches from the generated code.
retpoline-indirect-calls - Remove speculation of indirect calls from the generated code.
rtm - Support RTM instructions.
sahf - Support LAHF and SAHF instructions in 64-bit mode.
serialize - Has serialize instruction.
seses - Prevent speculative execution side channel timing attacks by inserting a speculation barrier before memory reads, memory writes, and conditional branches. Implies LVI Control Flow integrity..
sgx - Enable Software Guard Extensions.
sha - Enable SHA instructions.
shstk - Support CET Shadow-Stack instructions.
slow-3ops-lea - LEA instruction with 3 ops or certain registers is slow.
slow-incdec - INC and DEC instructions are slower than ADD and SUB.
slow-lea - LEA instruction with certain arguments is slow.
slow-pmaddwd - PMADDWD is slower than PMULLD.
slow-pmulld - PMULLD instruction is slow.
slow-shld - SHLD instruction is slow.
slow-two-mem-ops - Two memory operand instructions are slow.
slow-unaligned-mem-16 - Slow unaligned 16-byte memory access.
slow-unaligned-mem-32 - Slow unaligned 32-byte memory access.
soft-float - Use software floating point features.
sse - Enable SSE instructions.
sse-unaligned-mem - Allow unaligned memory operands with SSE instructions.
sse2 - Enable SSE2 instructions.
sse3 - Enable SSE3 instructions.
sse4.1 - Enable SSE 4.1 instructions.
sse4.2 - Enable SSE 4.2 instructions.
sse4a - Support SSE 4a instructions.
ssse3 - Enable SSSE3 instructions.
tagged-globals - Use an instruction sequence for taking the address of a global that allows a memory tag in the upper address bits..
tbm - Enable TBM instructions.
tsxldtrk - Support TSXLDTRK instructions.
uintr - Has UINTR Instructions.
use-aa - Use alias analysis during codegen.
use-glm-div-sqrt-costs - Use Goldmont specific floating point div/sqrt costs.
use-slm-arith-costs - Use Silvermont specific arithmetic costs.
vaes - Promote selected AES instructions to AVX512/AVX registers.
vpclmulqdq - Enable vpclmulqdq instructions.
vzeroupper - Should insert vzeroupper instructions.
waitpkg - Wait and pause enhancements.
wbnoinvd - Write Back No Invalidate.
widekl - Support Key Locker wide Instructions.
x87 - Enable X87 float instructions.
xop - Enable XOP instructions.
xsave - Support xsave instructions.
xsavec - Support xsavec instructions.
xsaveopt - Support xsaveopt instructions.
xsaves - Support xsaves instructions.
From the documentation, it seems like this is the generic (they write " This creates a system image with three separate targets; one for a generic x86_64 processor").
Does this mean that I need to run ?
I think the first is better, as the generic is a fallback. Can you log in to the compute nodes and check the CPU details? You may also have a documentation for the cluster?
If your cluster runs a slurm workload manager you should be able to log into a compute node using srun --pty /bin/bash. From there you can then run lscpu or cpuinfo() from Julia to gather the infos.