julia> N = 64; A = randn(N,N); B = randn(N,N); C = similar(A,1,N);
Using Lazy Arrays:
julia> @btime sum!($C,LazyArray(@~ $A.*$B))
3.047 μs (0 allocations: 0 bytes)
Without LazyArrays:
julia> @btime sum!($C,$A.*$B)
1.163 μs (3 allocations: 32.08 KiB)
Even with allocations, regular sum! is about 3x faster on my machine. Is this expected behavior?
Interestingly, using map! with eachcol outperforms both approaches:
julia> using LinearAlgebra
julia> @btime map!(⋅,$C,eachcol($A),eachcol($B))
788.132 ns (0 allocations: 0 bytes)
Here’s the output of versioninfo:
julia> versioninfo()
Julia Version 1.12.6
Commit 15346901f00 (2026-04-09 19:20 UTC)
Build Info:
Official https://julialang.org release
Platform Info:
OS: macOS (arm64-apple-darwin24.0.0)
CPU: 8 × Apple M2
WORD_SIZE: 64
LLVM: libLLVM-18.1.7 (ORCJIT, apple-m2)
GC: Built with stock GC
Threads: 4 default, 1 interactive, 4 GC (on 4 virtual cores)
Environment:
JULIA_EDITOR = code
JULIA_VSCODE_REPL = 1
JULIA_PKG_USE_CLI_GIT = true