Not really an issue with Setfield.jl, but this is where I’ve most encountered this.
When updating immutable types (e.g., in arrays or references), we need to construct an entirely new object with the desired fields replaced. This in turn can yield a large number of redundant load/store operations, which in the simple case seem to be eliminated. However, the process is brittle, and one breaking failure case for this optimization appears to be if the type includes a union. I’ve seen it fail in other scenarios as well, and my guess is that it can’t handle much control flow in its analysis.
Is this something that can be improved in the Julia compiler, or maybe it sits deeper in LLVM?
Example to highlight:
using Setfield
struct Good
a::Int
b::NTuple{4, Int}
c::Int
end
struct Bad
a::Int
b::NTuple{4, Int}
c::Union{Int, Float64}
end
function update(x_ref::Ref)
x = x_ref[]
x = @set x.a = 123
x_ref[] = x
nothing
end
Updating Good
optimizes well:
code_native(update, (Base.RefValue{Good},), debuginfo = :none, syntax = :intel)
.text
mov qword ptr [rsp - 8], rsi
mov rax, qword ptr [rsi]
mov qword ptr [rax], 123
movabs rax, offset jl_system_image_data
ret
Updating Bad
performs redundant load/store of all fields in the object.
code_native(update, (Base.RefValue{Bad},), debuginfo = :none, syntax = :intel)
.text
push rax
mov qword ptr [rsp], rsi
mov rax, qword ptr [rsi]
mov dl, byte ptr [rax + 48]
mov r8, qword ptr [rax + 8]
mov r9, qword ptr [rax + 16]
mov rsi, qword ptr [rax + 24]
mov rdi, qword ptr [rax + 32]
mov rcx, qword ptr [rax + 40]
inc dl
and dl, 127
cmp dl, 1
je L50
cmp dl, 2
jne L94
mov dl, 1
jmp L52
L50:
xor edx, edx
L52:
mov byte ptr [rax + 48], dl
mov qword ptr [rax + 40], rcx
mov qword ptr [rax + 32], rdi
mov qword ptr [rax + 24], rsi
mov qword ptr [rax + 16], r9
mov qword ptr [rax + 8], r8
mov qword ptr [rax], 123
movabs rax, offset jl_system_image_data
pop rcx
ret
L94:
movabs rax, offset jl_throw
movabs rdi, offset jl_system_image_data
call rax