Warning: Linking two modules of different target triples: 'bcloader' ... 'start'

Started getting the warnings

warning: Linking two modules of different target triples: 'bcloader' is 'x86_64-unknown-linux-gnu' whereas 'start' is 'x86_64-linux-gnu'

warning: Linking two modules of different target triples: 'bcloader' is 'x86_64-unknown-linux-gnu' whereas 'start' is 'x86_64-linux-gnu'

 Warning: Using fallback BLAS replacements, performance may be degraded
└ @ Enzyme.Compiler ~/.julia/packages/GPUCompiler/qdoh1/src/utils.jl:50

when running some sciml stuff. Everuthing still seems to work as it should, so it is not really a problem, but I got curious and tried to find where it came from.

Managed to reduce it to this, which seems to error on fresh installations for both 1.8.5 and 1.9.0-beta3

using Lux, SciMLSensitivity, DifferentialEquations
using Optimization, OptimizationOptimisers

using ComponentArrays
using Random

rng = Random.default_rng()

ts = collect(0:0.01:1)
xs = [sin.(ts)'; cos.(ts)']

function df!(dx, x, p, t)
    dx .= Lux.apply(model, x, p.nnps, nnst)[1]
end

model = Lux.Chain(
    Lux.Dense(2, 32),
    Lux.Dense(32, 2),
)
nnps, nnst = Lux.setup(rng, model)

ps = ComponentVector{Float64}(; nnps)
prob_f = ODEProblem(df!, xs[:, 1], (ts[begin], ts[end]), ps)

function loss(ps, _)
    _prob = remake(prob_f, u0 = xs[:, 1], tspan=(ts[1], ts[end]), p=ps)
    xhat = Array(solve(_prob, saveat=ts))
    sum(abs2, xs .- xhat)
end

optf = Optimization.OptimizationFunction(loss, Optimization.AutoZygote())
optprob1 = Optimization.OptimizationProblem(optf, ps)
res1 = Optimization.solve(optprob1, ADAM(0.01), maxiters = 5)

I have tried to reduce it more, but several different things seem to make the error go away and I haven’t really managed to pin it down to why.

  • If I make the Lux network a single layer, no warning.
  • If I use AutoFowardDiff, no warning.
  • If I skip DifferentialEquations and and replace the loss function with the below code, no warning.
function loss(ps, _)
    xhat = Lux.apply(model, xs, ps.nnps, nnst)[1]
    sum(abs2, xs .- xhat)
end

I was thinking it was Enzyme since that is mentioned in part of the warning, but I’m unsure exactly how enzyme is tied in to this since I don’t make direct use of it and specifically use Zygote for AD.

Found some other mentions of similar warnings

  • Optim github exactly the same warning, gets told it might be Enzyme.
  • Enzyme github linked from the optim issue. It also seems like it should have been fixed like 6 months ago.
  • discourse which is exactly the same set of warnings, though the discussion is about an error that seems unrelated to it…

I have already spent more time to try to figure this out than I really have at the moment, so I thought I would put it here for now in case anyone knew anything or wanted to take a look at it.

Some of the SciML stuff (including ODEProblem/SciMLSensitivity) try to use Enzyme on the inside by default for performance reasons.

In any case, you can ignore the warning (though the BLAS FYI may be relevant for performance – which we have on our radar for doing efficiently soon).

1 Like

Hi, any idea where the fallback BLAS replacements are originating from? I’m getting a similar warning from GPUCompiler and would like to investigate.

Cross posting my answer in case someone comes across this discussion but not the other (Solving UDE segfaults or runs with poor performance - #6 by wsmoses).

Warning: Using fallback BLAS replacements, performance may be degraded

In essence, this says we have not finished implementing our internal BLAS and have a fallback implementation which exists and is correct, but is single core and may be substantially slower. Ironically we are presently on the verge of merging the much faster one for most cases of dot product, matrix multiply, and matrix vector multiply (Gemv by ZuseZ4 · Pull Request #1208 · EnzymeAD/Enzyme · GitHub) with more coming soon. cc @ZuseZ4 who is leading that effort (though FYI that branch doesn’t support runtime activity yet).

The BLAS isn’t a warning you can fix normally, but is an internal feature that is under development. That said you can work around it by writing a custom Enzyme rule (Custom rules · Enzyme.jl). Finding where to write it however, may be a bit more tricky without the better backtraces having landed in LLVM.jl.

Unfortunately the back traces at the moment are limited to basically a few function calls back until the corresponding PR’s to the latest Julia and LLVM.jl land to enable better backtraces (Add Interface to julia OJIT by gbaraldi · Pull Request #346 · maleadt/LLVM.jl · GitHub), (Expose the Julia JIT with a C API by gbaraldi · Pull Request #49858 · JuliaLang/julia · GitHub). I’m told this will only enable it for the (not released) Julia 1.10, but maybe if you ask nicely enough someone can be convinced to backport it.

It’s not just performance reasons, it’s also comparability reasons. For the definition of the adjoint we have to build an ODE which calculates the vjp of:

Since that is mutating, it must be done via Enzyme or ReverseDiff, and tries Enzyme first for performance reasons. Thus the simplest workaround is that if you have a model like this, you could just avoid mutation:

function df!(x, p, t)
    Lux.apply(model, x, p.nnps, nnst)[1]
end

In which case it will default to Zygote and be perfectly fine for this case.

Note we’ve been doing this for years and the only thing that’s new is that we now warn people that their code may not be optimal. As mentioned in the other thread, we are trying it out and may remove the warning, or delay it for a bit.

The key really is that Enzyme is improving so much that more and more it is useful to a user to know if Enzyme isn’t used on their code, but it’s still not far enough along that the vast majority of codes can use it. This makes the warning a bit noisy right now, but as @wsmoses says there’s a lot moving. I think I may just silence the warning for a bit until all of the BLAS support is out. It’s somewhat a couples dance between these packages to give the optimal user experience while Enzyme is still developing.

I’m not a Julia expert so correct me if I’m wrong, but if I understood the situation for Julia correct,
most Blas calls come from simple matrix-matrix, matrix-vector, or vector-scalar products.
Those cases should be supported hopefully by the end of tonight, once I finished updating all tests.
RuntimeActivity should also be pretty easy to support, so probably not worth updating your side if that’s
all that you are waiting for.
If Lux however calls e.g. the full BLAS.gemv/gemm function directly with all matrices and vectors and scalars explicitly set then you might hit some activity combinations which I don’t cover yet and which might take another week to support.