Compiler determines return value(!) but still generates additional code

Hello

This is my first contribution to this Julia forum. I hope my issue belongs to the “Performance” category, if not i would be glad if someone can tell me a more appropriate place.

I ran into an issue with a function where the compiler is able to determine the return value (not just the type) but still additional code is generated that slows down the performance.

I’m using version 1.1.0 of Julia

Julia Version 1.1.0
Commit 80516ca202 (2019-01-21 21:24 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin17.7.0)
  CPU: Intel(R) Core(TM) i5-3427U CPU @ 1.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, ivybridge)

I have defined multiple structs that have a single field whose type is a tuple with values of different types

abstract type AbstractT end
struct T1 <: AbstractT end
struct T2 <: AbstractT end
struct T3 <: AbstractT end

abstract type AbstractStruct end;
struct A <: AbstractStruct
  val::Tuple{T1,T2,T3}
end
struct B <: AbstractStruct
  val::Tuple{T2,T1}
end

The difference between struct A and struct B is the number and order in which values of different types appear in the tuple val.
Now I would like to know what kind of types are stored in val except for type T1. I could do it manually whenever a type is defined but since those types can be user-defined using macros I would like to avoid doing that.
I implemented a method using recursion for a value of any of the structs.

NonT1Types(x::T) where {T<:AbstractStruct} = _NonT1Types(x.val)
_NonT1Types(x::Tuple{T1, Vararg{<:AbstractT}}) = _NonT1Types(Base.tail(x))
_NonT1Types(x::Tuple{T, Vararg{<:AbstractT}}) where {T<:AbstractT} = (T, _NonT1Types(Base.tail(x))...)
_NonT1Types(x::Tuple{T}) where {T<:AbstractT} = tuple(T)
_NonT1Types(x::Tuple{T,T1}) where {T<:AbstractT} = tuple(T)

This function recursively removes the first element of a tuple and checks if it starts with a type T1 or not and then proceed with the tail of the tuple. This works very well since (to my surprise) the compiler is able to not only determine the return type but also the return value.

pA=A(tuple(T1(),T2(),T3()));
@code_warntype NonT1Types(pA)
Body::Tuple{DataType,DataType}
1 ─ %1 = (Core.tuple)(T2, T3)::Tuple{DataType,DataType}
└──      return %1

In my code i sometimes don’t have an instance of struct A or struct B but just the type (being the parameter of another type for example). So i extended the function to also work with types. In order to keep using the same kind of approach i extract the type of field val from a Type, wrap them in a Val (so i can dispatch on them) and place them in a tuple again.

NonT1Types(x::Type{T}) where {T<:AbstractStruct} = _NonT1Types(Val.(tuple(fieldtype(T,:val).parameters...)))
_NonT1Types(x::Tuple{Val{T1}, Vararg{Val{<:AbstractT}}}) = _NonT1Types(Base.tail(x))
_NonT1Types(x::Tuple{Val{T}, Vararg{Val{<:AbstractT}}}) where {T<:AbstractT} = (T, _NonT1Types(Base.tail(x))...)
_NonT1Types(x::Tuple{Val{T}}) where {T<:AbstractT} = tuple(T)
_NonT1Types(x::Tuple{Val{T},Val{T1}}) where {T<:AbstractT} = tuple(T)

Now the function is much slower (in the order of 30 times).

@btime NonT1Types($pA);
  9.768 ns (1 allocation: 32 bytes)
@btime NonT1Types(typeof($pA));
  350.071 ns (2 allocations: 64 bytes)

I’m assuming the reason for the slowdown is that a tuple is allocated although the compiler can again determine the return value(!).

@code_warntype NonT1Types(typeof(pA))
Body::Tuple{DataType,DataType}
1 ─ %1 = (Base.getfield)(Tuple{T1,T2,T3}, :parameters)::Core.SimpleVector
│        (Core._apply)(Main.tuple, %1)
│   %3 = (Core.tuple)(T2, T3)::Tuple{DataType,DataType}
└──      return %3

What is the reason for this “useless” creation of a tuple (%1). Is this because the compiler doesn’t know if getfield() or tuple() have any side-effects, so its better to call them even if unnecessary for the return value of the function? (If the compiler removes those functions calls also the side-effects wouldn’t happen).

Is there a way for me to hint the compiler that it is safe to remove the line?

I also noticed that it is faster to define a dummy constructor and create an instance of the struct and then call the method that works with values not with types.

A()=A((T1(),T2(),T3()))
DummyNonT1Types(x::Type{T}) where {T<:AbstractStruct} = NonT1Types(T())

@code_warntype DummyNonT1Types(typeof(pA))
Body::Tuple{DataType,DataType}
1 ─ %1 = (Core.tuple)(T2, T3)::Tuple{DataType,DataType}
└──      return %1

It appears the compiler is removing the call to the constructor altogether. But can i rely on this behavior, when the types T1, T2 and T3 are more complicated (e.g. containing multiple fields)?

Thanks in advance for any hints
Martin

PS: here is the complete example code

abstract type AbstractT end
struct T1 <: AbstractT end
struct T2 <: AbstractT end
struct T3 <: AbstractT end

abstract type AbstractStruct end;
struct A <: AbstractStruct
  val::Tuple{T1,T2,T3}
end
A()=A((T1(),T2(),T3()))

struct B <: AbstractStruct
  val::Tuple{T2,T1}
end
B()=B((T2(),T1()))

NonT1Types(x::T) where {T<:AbstractStruct} = _NonT1Types(x.val)
_NonT1Types(x::Tuple{T1, Vararg{<:AbstractT}}) = _NonT1Types(Base.tail(x))
_NonT1Types(x::Tuple{T, Vararg{<:AbstractT}}) where {T<:AbstractT} = (T, _NonT1Types(Base.tail(x))...)
_NonT1Types(x::Tuple{T}) where {T<:AbstractT} = tuple(T)
_NonT1Types(x::Tuple{T,T1}) where {T<:AbstractT} = tuple(T)

NonT1Types(x::Type{T}) where {T<:AbstractStruct} = _NonT1Types(Val.(tuple(fieldtype(T,:val).parameters...)))
_NonT1Types(x::Tuple{Val{T1}, Vararg{Val{<:AbstractT}}}) = _NonT1Types(Base.tail(x))
_NonT1Types(x::Tuple{Val{T}, Vararg{Val{<:AbstractT}}}) where {T<:AbstractT} = (T, _NonT1Types(Base.tail(x))...)
_NonT1Types(x::Tuple{Val{T}}) where {T<:AbstractT} = tuple(T)
_NonT1Types(x::Tuple{Val{T},Val{T1}}) where {T<:AbstractT} = tuple(T)

DummyNonT1Types(x::Type{T}) where {T<:AbstractStruct} = NonT1Types(T())

pA=A(tuple(T1(),T2(),T3()));
NonT1Types(pA)
NonT1Types(typeof(pA))
DummyNonT1Types(typeof(pA))

@btime NonT1Types($pA)
@btime NonT1Types(typeof($pA))
@btime DummyNonT1Types(typeof($pA))


@code_warntype NonT1Types(pA)
@code_warntype NonT1Types(typeof(pA))
@code_warntype DummyNonT1Types(typeof(pA))

Just a small update: I tested the code using Julia-1.2 and the versions of the code with types and instances have the same performance now.

@btime NonT1Types($pA);
8.674 ns (1 allocation: 32 bytes)
@btime NonT1Types(typeof($pA));
8.688 ns (1 allocation: 32 bytes)

1 Like