In a large code suite, I’ve noticed substantial performance degradation using v1.12. I’m seeing large numbers of allocations in parts of the code that are threaded and involve passing functions as arguments to other functions (perhaps related to this Performance Tip). I tried playing around with how the functions being passed are defined and how the type of the function-as-argument gets defined, and the allocations can change dramatically. This is not true of the same code on v1.11, nor is it true if I remove the Thread.@threads
in this section of the code. In the threaded v1.12, I also find that small changes the function definitions can greatly effect these results (reducing allocations in one call but increasing them in another).
Here’s a minimal example demonstrating the problem. I’ll show the results of demo()
for v1.11 and v1.12, with and without the @threads
in define_Y!()
. The main conclusions:
- v1.12 threaded shows high variability between function definitions
- v1.11 threaded shows no variability between function definitions
- Both versions unthreaded show the same results for all function definitions
MWE
import Pkg; Pkg.activate(; temp=true)
Pkg.add(["BandedMatrices", "FillArrays", "BenchmarkTools"])
using BandedMatrices
using FillArrays
using LinearAlgebra
using BenchmarkTools
using Base.Threads
# --- Problem scaffold -----
# toy "physics" — just something smooth and nontrivial
g(x, X::Vector, c) = exp(-c*x) * (1 + x) + sum(X) * 1e-6
h(x, X::Vector, c) = c * exp(-c*x) + x
# even/odd "basis" and their X-derivatives (analytic for the toy basis)
sinm(m::Int, X::Real) = sin(m * π * X)
cosm(m::Int, X::Real) = cos(m * π * X)
D_sinm(m::Int, X::Real) = m * π * cos(m * π * X)
D_cosm(m::Int, X::Real) = -m * π * sin(m * π * X)
function inner_product(op1, m, f1, op2, f2, _basis, k, X::AbstractVector, _order)
@inbounds begin
acc = 0.0
@fastmath for i in 1:length(X)-1
x₁ = X[i]; x₂ = X[i+1]
# integrand: (op1 on mode m * f1) * (op2 on mode k * f2)
g(x) = (op1(m, x) * f1(x)) * (op2(k, x) * f2(x))
acc += 0.5 * (x₂ - x₁) * (g(x₁) + g(x₂)) # trapezoid
end
return acc
end
end
# --- Three front-ends that differ only in how they define the callables -------
# (A) Named local functions (your current form)
function define_Y_named(X::Vector, c; order::Union{Nothing,Integer}=5)
gg(x) = g(x, X, c)
hh(x) = h(x, X, c)
Y = BandedMatrix(Zeros(2length(X), 2length(X)), (3, 3))
define_Y!(Y, gg, hh, X, order)
return Y
end
# (B) Arrow closures capturing X, c
function define_Y_arrow(X::Vector, c; order::Union{Nothing,Integer}=5)
gg = x -> g(x, X, c)
hh = x -> h(x, X, c)
Y = BandedMatrix(Zeros(2length(X), 2length(X)), (3, 3))
define_Y!(Y, gg, hh, X, order)
return Y
end
# (C) Concrete callable structs (forces concrete callee types)
struct G{X,T}; X::X; c::T; end
struct H{X,T}; X::X; c::T; end
(gg::G)(x) = g(x, gg.X, gg.c)
(hh::H)(x) = h(x, hh.X, hh.c)
function define_Y_functor(X::Vector, c; order::Union{Nothing,Integer}=5)
gg, hh = G(X, c), H(X, c)
Y = BandedMatrix(Zeros(2length(X), 2length(X)), (3, 3))
define_Y!(Y, gg, hh, X, order)
return Y
end
# --- Threaded fill (parametric on the callables to encourage specialization) ---
function define_Y!(Y::BandedMatrix, gg::F1, hh::F2,
X::AbstractVector{<:Real}, order::Union{Nothing,Integer}) where {F1,F2}
Threads.@threads for m in eachindex(X)
Me = 2m
Mo = Me - 1
Y[Mo, Mo] = inner_product(D_cosm, m, gg, D_cosm, hh, cosm, m, X, order)
Y[Me, Me] = inner_product(D_sinm, m, gg, D_sinm, hh, sinm, m, X, order)
end
return Y
end
# --- Driver / benchmark -------------------------------------------------------
function demo(; N=400)
X = collect(range(0.0, 1.0; length = N))
c = 0.7
define_Y_named(X, c); define_Y_arrow(X, c); define_Y_functor(X, c)
println("\nAllocations / time (Named local functions)")
@btime define_Y_named($X, $c);
println("\nAllocations / time (Arrow closures)")
@btime define_Y_arrow($X, $c);
println("\nAllocations / time (Callable structs)")
@btime define_Y_functor($X, $c);
end
v1.12, threaded
Allocations / time (Named local functions)
20.153 ms (61 allocations: 69.08 KiB)
Allocations / time (Arrow closures)
20.195 ms (10827 allocations: 348.36 KiB)
Allocations / time (Callable structs)
20.166 ms (61 allocations: 69.08 KiB)
Installation Information
==========================
Julia Version 1.12.0
Commit b907bd0600f (2025-10-07 15:42 UTC)
Build Info:
Official https://julialang.org release
Platform Info:
OS: macOS (arm64-apple-darwin22.4.0)
CPU: 12 × Apple M4 Pro
WORD_SIZE: 64
LLVM: libLLVM-18.1.7 (ORCJIT, apple-m4)
GC: Built with stock GC
Threads: 8 default, 1 interactive, 8 GC (on 8 virtual cores)
Environment:
JULIA_NUM_THREADS = auto
Status `/private/var/folders/nt/ct_lf2n94_1c21908mmbx55w0000gq/T/jl_HXcSaX/Project.toml`
[aae01518] BandedMatrices v1.9.5
[6e4b80f9] BenchmarkTools v1.6.0
[1a297f60] FillArrays v1.14.0
Status `/private/var/folders/nt/ct_lf2n94_1c21908mmbx55w0000gq/T/jl_HXcSaX/Manifest.toml`
[4c555306] ArrayLayouts v1.12.0
[aae01518] BandedMatrices v1.9.5
[6e4b80f9] BenchmarkTools v1.6.0
[34da2185] Compat v4.18.1
[1a297f60] FillArrays v1.14.0
⌅ [682c06a0] JSON v0.21.4
[69de0a69] Parsers v2.8.3
[aea7be01] PrecompileTools v1.3.3
[21216c6a] Preferences v1.5.0
[90137ffa] StaticArrays v1.9.15
[1e83bf80] StaticArraysCore v1.4.3
[10745b16] Statistics v1.11.1
[56f22d72] Artifacts v1.11.0
[ade2ca70] Dates v1.11.0
[8f399da3] Libdl v1.11.0
[37e2e46d] LinearAlgebra v1.12.0
[56ddb016] Logging v1.11.0
[a63ad114] Mmap v1.11.0
[de0858da] Printf v1.11.0
[9abbd945] Profile v1.11.0
[9a3f8284] Random v1.11.0
[ea8e919c] SHA v0.7.0
[f489334b] StyledStrings v1.11.0
[fa267f1f] TOML v1.0.3
[cf7118a7] UUIDs v1.11.0
[4ec0a83e] Unicode v1.11.0
[e66e0078] CompilerSupportLibraries_jll v1.3.0+1
[4536629a] OpenBLAS_jll v0.3.29+0
[8e850b90] libblastrampoline_jll v5.13.1+1
Info Packages marked with ⌅ have new versions available but compatibility constraints restrict them from upgrading. To see why use `status --outdated -m`
v1.12, unthreaded
Allocations / time (Named local functions)
149.662 ms (3 allocations: 64.08 KiB)
Allocations / time (Arrow closures)
149.204 ms (3 allocations: 64.08 KiB)
Allocations / time (Callable structs)
147.845 ms (3 allocations: 64.08 KiB)
Installation Information
==========================
Julia Version 1.12.0
Commit b907bd0600f (2025-10-07 15:42 UTC)
Build Info:
Official https://julialang.org release
Platform Info:
OS: macOS (arm64-apple-darwin22.4.0)
CPU: 12 × Apple M4 Pro
WORD_SIZE: 64
LLVM: libLLVM-18.1.7 (ORCJIT, apple-m4)
GC: Built with stock GC
Threads: 8 default, 1 interactive, 8 GC (on 8 virtual cores)
Environment:
JULIA_NUM_THREADS = auto
Status `/private/var/folders/nt/ct_lf2n94_1c21908mmbx55w0000gq/T/jl_6nTBnG/Project.toml`
[aae01518] BandedMatrices v1.9.5
[6e4b80f9] BenchmarkTools v1.6.0
[1a297f60] FillArrays v1.14.0
Status `/private/var/folders/nt/ct_lf2n94_1c21908mmbx55w0000gq/T/jl_6nTBnG/Manifest.toml`
[4c555306] ArrayLayouts v1.12.0
[aae01518] BandedMatrices v1.9.5
[6e4b80f9] BenchmarkTools v1.6.0
[34da2185] Compat v4.18.1
[1a297f60] FillArrays v1.14.0
⌅ [682c06a0] JSON v0.21.4
[69de0a69] Parsers v2.8.3
[aea7be01] PrecompileTools v1.3.3
[21216c6a] Preferences v1.5.0
[90137ffa] StaticArrays v1.9.15
[1e83bf80] StaticArraysCore v1.4.3
[10745b16] Statistics v1.11.1
[56f22d72] Artifacts v1.11.0
[ade2ca70] Dates v1.11.0
[8f399da3] Libdl v1.11.0
[37e2e46d] LinearAlgebra v1.12.0
[56ddb016] Logging v1.11.0
[a63ad114] Mmap v1.11.0
[de0858da] Printf v1.11.0
[9abbd945] Profile v1.11.0
[9a3f8284] Random v1.11.0
[ea8e919c] SHA v0.7.0
[f489334b] StyledStrings v1.11.0
[fa267f1f] TOML v1.0.3
[cf7118a7] UUIDs v1.11.0
[4ec0a83e] Unicode v1.11.0
[e66e0078] CompilerSupportLibraries_jll v1.3.0+1
[4536629a] OpenBLAS_jll v0.3.29+0
[8e850b90] libblastrampoline_jll v5.13.1+1
Info Packages marked with ⌅ have new versions available but compatibility constraints restrict them from upgrading. To see why use `status --outdated -m`
v1.11, threaded
Allocations / time (Named local functions)
20.179 ms (45 allocations: 68.83 KiB)
Allocations / time (Arrow closures)
20.220 ms (45 allocations: 68.83 KiB)
Allocations / time (Callable structs)
20.010 ms (45 allocations: 68.83 KiB)
Installation Information
==========================
Julia Version 1.11.7
Commit f2b3dbda30a (2025-09-08 12:10 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: macOS (arm64-apple-darwin24.0.0)
CPU: 12 × Apple M4 Pro
WORD_SIZE: 64
LLVM: libLLVM-16.0.6 (ORCJIT, apple-m1)
Threads: 8 default, 0 interactive, 4 GC (on 8 virtual cores)
Environment:
JULIA_NUM_THREADS = auto
Status `/private/var/folders/nt/ct_lf2n94_1c21908mmbx55w0000gq/T/jl_M6vZl9/Project.toml`
[aae01518] BandedMatrices v1.9.5
[6e4b80f9] BenchmarkTools v1.6.0
[1a297f60] FillArrays v1.14.0
Status `/private/var/folders/nt/ct_lf2n94_1c21908mmbx55w0000gq/T/jl_M6vZl9/Manifest.toml`
[4c555306] ArrayLayouts v1.12.0
[aae01518] BandedMatrices v1.9.5
[6e4b80f9] BenchmarkTools v1.6.0
[34da2185] Compat v4.18.1
[1a297f60] FillArrays v1.14.0
⌅ [682c06a0] JSON v0.21.4
[69de0a69] Parsers v2.8.3
⌅ [aea7be01] PrecompileTools v1.2.1
[21216c6a] Preferences v1.5.0
[90137ffa] StaticArrays v1.9.15
[1e83bf80] StaticArraysCore v1.4.3
[10745b16] Statistics v1.11.1
[56f22d72] Artifacts v1.11.0
[ade2ca70] Dates v1.11.0
[8f399da3] Libdl v1.11.0
[37e2e46d] LinearAlgebra v1.11.0
[56ddb016] Logging v1.11.0
[a63ad114] Mmap v1.11.0
[de0858da] Printf v1.11.0
[9abbd945] Profile v1.11.0
[9a3f8284] Random v1.11.0
[ea8e919c] SHA v0.7.0
[fa267f1f] TOML v1.0.3
[cf7118a7] UUIDs v1.11.0
[4ec0a83e] Unicode v1.11.0
[e66e0078] CompilerSupportLibraries_jll v1.1.1+0
[4536629a] OpenBLAS_jll v0.3.27+1
[8e850b90] libblastrampoline_jll v5.11.0+0
Info Packages marked with ⌅ have new versions available but compatibility constraints restrict them from upgrading. To see why use `status --outdated -m`
v1.11, unthreaded
Allocations / time (Named local functions)
145.687 ms (3 allocations: 64.08 KiB)
Allocations / time (Arrow closures)
146.928 ms (3 allocations: 64.08 KiB)
Allocations / time (Callable structs)
147.664 ms (3 allocations: 64.08 KiB)
Installation Information
==========================
Julia Version 1.11.7
Commit f2b3dbda30a (2025-09-08 12:10 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: macOS (arm64-apple-darwin24.0.0)
CPU: 12 × Apple M4 Pro
WORD_SIZE: 64
LLVM: libLLVM-16.0.6 (ORCJIT, apple-m1)
Threads: 8 default, 0 interactive, 4 GC (on 8 virtual cores)
Environment:
JULIA_NUM_THREADS = auto
Status `/private/var/folders/nt/ct_lf2n94_1c21908mmbx55w0000gq/T/jl_5pHsc5/Project.toml`
[aae01518] BandedMatrices v1.9.5
[6e4b80f9] BenchmarkTools v1.6.0
[1a297f60] FillArrays v1.14.0
Status `/private/var/folders/nt/ct_lf2n94_1c21908mmbx55w0000gq/T/jl_5pHsc5/Manifest.toml`
[4c555306] ArrayLayouts v1.12.0
[aae01518] BandedMatrices v1.9.5
[6e4b80f9] BenchmarkTools v1.6.0
[34da2185] Compat v4.18.1
[1a297f60] FillArrays v1.14.0
⌅ [682c06a0] JSON v0.21.4
[69de0a69] Parsers v2.8.3
⌅ [aea7be01] PrecompileTools v1.2.1
[21216c6a] Preferences v1.5.0
[90137ffa] StaticArrays v1.9.15
[1e83bf80] StaticArraysCore v1.4.3
[10745b16] Statistics v1.11.1
[56f22d72] Artifacts v1.11.0
[ade2ca70] Dates v1.11.0
[8f399da3] Libdl v1.11.0
[37e2e46d] LinearAlgebra v1.11.0
[56ddb016] Logging v1.11.0
[a63ad114] Mmap v1.11.0
[de0858da] Printf v1.11.0
[9abbd945] Profile v1.11.0
[9a3f8284] Random v1.11.0
[ea8e919c] SHA v0.7.0
[fa267f1f] TOML v1.0.3
[cf7118a7] UUIDs v1.11.0
[4ec0a83e] Unicode v1.11.0
[e66e0078] CompilerSupportLibraries_jll v1.1.1+0
[4536629a] OpenBLAS_jll v0.3.27+1
[8e850b90] libblastrampoline_jll v5.11.0+0
Info Packages marked with ⌅ have new versions available but compatibility constraints restrict them from upgrading. To see why use `status --outdated -m`