I apologize in advance for the long post, but I did a good deal of my own analysis first to make sure I was understanding the situation correctly. I wanted to post my findings to save others the trouble of going through the same steps I did.
Getting to the actual problem, I am puzzled by the code generated by an overload of the getproperty
function. I created a struct to represent a 2D vector:
struct Vec2
x::Float64
y::Float64
end
For implementing math operators, reinterpreting the vector as a tuple is convenient. Likewise, swizzle operators are often useful. Since these are both cheap, data-rearranging operations, getproperty
seems like a reasonable way to implement them:
function Base.getproperty(v::Vec2, name::Symbol)
if name == :tuple
(v.x, v.y)
elseif name == :yx
Vec2(v.y, v.x)
else
getfield(v, name)
end
end
Wrapping one of the getproperty
calls in a function, and calling code_native
on it reveals the first problem: getproperty
is not inlined:
julia> f(v::Vec2) = v.yx;
julia> code_native(f, (Vec2,); syntax=:intel, debuginfo=:none);
.text
push rbp
mov rbp, rsp
push rsi
push rdi
sub rsp, 48
mov rsi, rcx
movabs rax, offset getproperty
lea rdi, [rbp - 32]
mov r8d, 397788528
mov rcx, rdi
call rax
test dl, dl
cmovns rax, rdi
vmovups xmm0, xmmword ptr [rax]
vmovups xmmword ptr [rsi], xmm0
mov rax, rsi
add rsp, 48
pop rdi
pop rsi
pop rbp
ret
nop
This is easily fixed by adding the @inline
tag to getproperty
. After that change, accessing the yx
property produces the expected code:
julia> code_native(f, (Vec2,); syntax=:intel, debuginfo=:none);
.text
push rbp
mov rbp, rsp
mov rax, rcx
vpermilps xmm0, xmmword ptr [rdx], 78 # xmm0 = mem[2,3,0,1]
vmovups xmmword ptr [rax], xmm0
pop rbp
ret
nop word ptr cs:[rax + rax]
nop dword ptr [rax]
Not only does the compiler succeed in all the expected constant propagation and branch elimination optimizations, but recognizes that the swizzle operation can be implemented using single permute instruction. Impressive. However, the tuple
property is still a problem. I won’t bother posting the assembly output, because it is lengthy; suffice it to say that it is very wrong. So, for some reason, wrapping the tuple operation in the getproperty
function seems to confuse the optimizer. Using code_warntype
reveals a potential problem. I introduce a temporary variable x
to make the type of the tuple
property clear:
julia> function f(v::Vec2)
x = v.tuple
x
end
julia> code_warntype(f, (Vec2,))
Variables
#self#::Core.Compiler.Const(f, false)
v::Vec2
x::Tuple{Union{Float64, Vec2, Tuple{Any,Any}},Union{Float64, Vec2, Tuple{Any,Any}}}
Body::Tuple{Union{Float64, Vec2, Tuple{Any,Any}},Union{Float64, Vec2, Tuple{Any,Any}}}
1 ─ (x = Base.getproperty(v, :tuple))
└── return x
Doing the same exercise for the yx
property tells a different story:
julia> function f(v::Vec2)
x = v.yx
x
end
julia> code_warntype(f, (Vec2,))
Variables
#self#::Core.Compiler.Const(f, false)
v::Vec2
x::Vec2
Body::Vec2
1 ─ (x = Base.getproperty(v, :yx))
└── return x
So it looks like this is a type inference problem. So question #1 is: how is it that the type of the yx
property can be properly inferred, but not the tuple
property?
Moving on, the fact that a Tuple{Any,Any}
is part of the type indicates to me that the type inference of (v.x, v.y)
is going awry. After some trial and error, I was able to coerce getproperty
into doing the right thing:
@inline function Base.getproperty(v::Vec2, name::Symbol)
if name == :tuple
#(v.x, v.y) # Original implementation
#NTuple{2,Float64}(v.x, v.y) # Doesn't work
(Float64(v.x), Float64(v.y)) # Works!
#(getfield(v, :x), getfield(v, :y)) # Also works
elseif name == :yx
Vec2(v.y, v.x)
else
getfield(v, name)
end
end
With this implementation, I finally get the assembly I expect:
f(v::Vec2) = v.tuple
julia> code_native(f, (Vec2,); syntax=:intel, debuginfo=:none);
.text
push rbp
mov rbp, rsp
mov rax, rcx
vmovups xmm0, xmmword ptr [rdx]
vmovups xmmword ptr [rax], xmm0
pop rbp
ret
nop word ptr cs:[rax + rax]
nop dword ptr [rax + rax]
Which leads me to question #2 is: why does this particular incantation fix the type inference problem? I would have expected the type inference system to be able to do this transformation on its own.
And finally, question #3 is: is there a better way to write getproperty
that avoids this problem?
In case it is relevant, here is my sytem information:
julia> versioninfo()
Julia Version 1.5.3
Commit 788b2c77c1 (2020-11-09 13:37 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i7-5930K CPU @ 3.50GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-9.0.1 (ORCJIT, haswell)