I want to build a docker container that will run on Amazon EC2 M6a and M7a instances. These have Zen 3 / Zen 4 based AMD Epyc processors.
Now the docker container should contain some already precompiled juila packages. For the precompile cache to be reusable on the cloud machines, I need to specify a CPU target.
I came up with the following and want to check if I understand it correctly:
JULIA_CPU_TARGET="generic;znver3,clone_all;znver4,base(1)"
When julia looks for a precompile cache, it will roughly do the following:
- It looks at the first target,
generic
. Except for very exotic ISA this will match.
However, before picking it, it will also look at the other target. - Next target is znver3. This would match M6a (=Zen3) and M7a (=Zen4), since Zen4 is a superset of Zen3. It would not be a match for most intel cpus.
- Last target is znver4. This would be a match only for M7a (=Zen4).
So M6a (=Zen3) would pick up the znver3 target while M7a (=Zen4) would pick up the znver4 target.
Now for the flags clone_all
and base(1)
.
znver3,clone_all
means that every single function in the precompile cache will be duplicated with a version specialized forznver3
instruction set.znver4,base(1)"
means that for znver4 we don’t clone every single function. Instead, LLVM uses a heuristic to either decide to clone a function or fall back to the znver3 version. Whyznver3
? Because ofbase(1)
where1
is the zero-based index ofznver3
in the above.
Do I get this correctly so far?
Here is another example, that I think would not do what I want:
JULIA_CPU_TARGET="generic;znver4,clone_all;znver3,clone_all"
Here the Zen4 CPU would also pick up the znver3 because it is the rightmost. Do I get this correctly as well?
Also how to debug this? Can I ask julia or LLVM or some other tool which target was chosen when loading a precompile cache? Or why a certain target is not a match?