I am trying to get OpenBLAS to compile for Power10. However, it seems OpenBLAS is defaulting to Power8 compilation, which is ~2x slower than the implementation for Power10 as it has MMA support for BLAS 3 routines like DGEMM.
I believe I have identified a prime suspect in julia/Make.inc which forces Power8 without any checks for higher processors:
# If we are running on powerpc64le or ppc64le, set certain options automatically
ifneq (,$(filter $(ARCH), powerpc64le ppc64le))
JCFLAGS += -fsigned-char
OPENBLAS_DYNAMIC_ARCH:=0
OPENBLAS_TARGET_ARCH:=POWER8
BINARY:=64
# GCC doesn't do -march= on ppc64le
MARCH=
endif
I have tried overwriting OPENBLAS_TARGET_ARCH=POWER10 with the Make.user option as well as writing it into the make command without luck.
Looking at the logs generated for OpenBLAS shown the following with the above variable attempted to be overwritten (at juliabuild_dir/usr/logs/OpenBLAS/OpenBLAS.log.gz), it still says it is using POWER8
---> flags+=(TARGET=POWER8)
The following code snippet appears in the log file (from the OpenBLAS source? can’t find it), but also seems to another problem as the last elif is setting TARGET=POWER8 without higher processor checks
# On Intel and most aarch64 architectures, engage DYNAMIC_ARCH.
# When using DYNAMIC_ARCH the TARGET specifies the minimum architecture requirement.
if [[ ${proc_family} == intel ]]; then
flags+=(DYNAMIC_ARCH=1)
# Before OpenBLAS 0.3.13, there appears to be a miscompilation bug with `clang` on setting `TARGET=GENERIC`
# As that is the case, we're just going to be safe and only use `TARGET=GENERIC` on 0.3.13+
if [ ${version_patch} -gt 12 ]; then
flags+=(TARGET=GENERIC)
else
flags+=(TARGET=)
fi
elif [[ ${target} == aarch64-* ]] && [[ ${bb_full_target} != *-libgfortran3* ]]; then
flags+=(TARGET=ARMV8 DYNAMIC_ARCH=1)
# Otherwise, engage a specific target
elif [[ ${bb_full_target} == aarch64*-libgfortran3* ]]; then
# Old GCC versions, with libgfortran3, can't build for newer
# microarchitectures, let's just use the generic one
flags+=(TARGET=ARMV8)
elif [[ ${target} == arm-* ]]; then
flags+=(TARGET=ARMV7)
elif [[ ${target} == powerpc64le-* ]]; then
flags+=(TARGET=POWER8)
fi
Grabbing one of the compiled BLAS routines from the log, we can see that -mcpu=power8 -mtune=power8 was used.
cc -O2 -DMAX_STACK_ALLOC=2048 -fopenmp -Wall -m64 -DF_INTERFACE_GFORT -fPIC -DSMP_SERVER -DUSE_OPENMP -DNO_WARMUP -DMAX_CPU_NUMBER=512 -DMAX_PARALLEL_NUMBER=1 -DBUILD_SINGLE=1 -DBUILD_DOUBLE=1 -DBUILD_COMPLEX=1 -DBUILD_COMPLEX16=1 -DVERSION=\"0.3.20\" -mcpu=power8 -mtune=power8 -mvsx -fno-fast-math -DUSE_OPENMP -fopenmp -UASMNAME -UASMFNAME -UNAME -UCNAME -UCHAR_NAME -UCHAR_CNAME -DASMNAME=saxpy -DASMFNAME=saxpy_ -DNAME=saxpy_ -DCNAME=saxpy -DCHAR_NAME=\"saxpy_\" -DCHAR_CNAME=\"saxpy\" -DNO_AFFINITY -I.. -I. -UDOUBLE -UCOMPLEX -c axpy.c -o saxpy.o
Another option that looked promising was using MCPU=power10 in Make.user because of this snippet in julia/Make.inc
# Set MCPU-specific flags
ifneq ($(MCPU),)
CC += -mcpu=$(MCPU)
CXX += -mcpu=$(MCPU)
FC += -mcpu=$(MCPU)
JULIA_CPU_TARGET ?= $(MCPU)
endif
However, this caused a error with corecompiler.jl at build-time since it didn’t recognize power10 as a valid CPU option since JULIA_CPU_TARGET is used in julia/sysimage.mk. MARCH=power10 isn’t possible with PowerPC as it isn’t used in GCC for some reason.
Lastly, there is one patch that I found that may do something for OpenBLAS’ makefile, but I have no idea how to use it or if it took affect: julia/deps/patches/openblas-ofast-power.patch
Am I miss anything or is this doomed to fail?