Type instability possibly due to variable playing role of input and output

EDIT: contrary to the title, I have come to the conclusion that the problem lies elsewhere (see post below). Unless this contradicts etiquette, I will not modify the original title, nor the original question below.

NEW EDIT: The bottom line is that the issue presented here was due to the inherent instability of iterating over a heterogeneous Tuple.


I have coded my own neural network that I have been using for some time. My neural network is an array of layers. I have defined a struct for a layer. The code works well, but contains a type instability that I only recently detected via @code_warntype. Though probably obvious to the experienced eye, I am puzzled as to why these type instabilities arise. Below, I paste my code (sorry for the idiosyncratic formatting) and the output of @code_warntype.

using LinearAlgebra, Random
#########################################################
# Constructor
#########################################################

struct layer{F}
    Din::Int64  # number of inputs to layer
    Dout::Int64 # number of outputs of layer
    f::F
end


function makelayer(;out=nou::Int64, in=nin::Int64, f = tanh)
    
    layer(in, out, f)

end


#########################################################
# Evaluation
#########################################################

#----------------------------------------------------
function (a::layer)(weights, x)
#----------------------------------------------------

    MARK = 0

    aux1 = @view weights[MARK+1:MARK +  a.Dout * a.Din]

    w = reshape(aux1, a.Dout, a.Din)

    MARK += a.Dout * a.Din

    aux2 = @view weights[MARK+1:MARK + a.Dout]
    
    b = reshape(aux2, a.Dout)

    MARK += a.Dout

    @assert(MARK == length(weights)) # make sure all weights have been used up

    a.f.(w * x .+ b) # evaluation of layer

end


#----------------------------------------------------
function (a::Array{layer,1})(weights, x)
#----------------------------------------------------

    MARK = 0

    out = x

    for l in a
        
        # the output of each layer is the input of the next layer
        # the input to the first layer is x

        out = @views l(weights[MARK+1:MARK+numweights(l)], out)
        
        MARK += numweights(l)

    end

    @assert(MARK == numweights(a)) # make sure all weights have been used up

    return out

end




#########################################################
# Number of parameters in network
#########################################################

numweights(a::layer)          = a.Din * a.Dout + a.Dout

numweights(a::Array{layer,1}) = mapreduce(numweights, +, a)

numlayers(a::Array{layer,1})  = length(a)

I have verified that function function (a::layer)(weights, x) is type stable.

As an example of creating a network and evaluating it for a set of weights and an input, we can use the following:

net = [makelayer(out=50, in=1, f=tanh); makelayer(in=50,out=50,f=tanh);  makelayer(in = 50, out=1, f=identity)]; # create a 3-layer network

w = randn(numweights(net)); # instantiate random weights

x = randn(1,1000); # instantiate 1000 random inputs

net(w,x) # evaluate the network

The output of @code_warntype net(w,x) follows below:

MethodInstance for (::Vector{ForwardNeuralNetworks.layer})(::Vector{Float64}, ::Matrix{Float64})
  from (a::Vector{ForwardNeuralNetworks.layer})(weights, x) @ ForwardNeuralNetworks ~/.julia/dev/ForwardNeuralNetworks/src/neuralnetwork.jl:49
Arguments
  a::Vector{ForwardNeuralNetworks.layer}
  weights::Vector{Float64}
  x::Matrix{Float64}
Locals
  @_4::Union{Nothing, Tuple{ForwardNeuralNetworks.layer, Int64}}
  out::Any
  MARK::Int64
  l::ForwardNeuralNetworks.layer
Body::Any
1 ─       (MARK = 0)
│         (out = x)
│   %3  = a::Vector{ForwardNeuralNetworks.layer}
│         (@_4 = Base.iterate(%3))
│   %5  = (@_4 === nothing)::Bool
│   %6  = Base.not_int(%5)::Bool
└──       goto #4 if not %6
2 ┄ %8  = @_4::Tuple{ForwardNeuralNetworks.layer, Int64}
│         (l = Core.getfield(%8, 1))
│   %10 = Core.getfield(%8, 2)::Int64
│   %11 = (MARK + 1)::Int64
│   %12 = MARK::Int64
│   %13 = ForwardNeuralNetworks.numweights(l)::Int64
│   %14 = (%12 + %13)::Int64
│   %15 = (%11:%14)::UnitRange{Int64}
│   %16 = (Base.maybeview)(weights, %15)::Core.PartialStruct(SubArray{Float64, 1, Vector{Float64}, Tuple{UnitRange{Int64}}, true}, Any[Vector{Float64}, Tuple{UnitRange{Int64}}, Int64, Core.Const(1)])
│         (out = (l)(%16, out))
│   %18 = MARK::Int64
│   %19 = ForwardNeuralNetworks.numweights(l)::Int64
│         (MARK = %18 + %19)
│         (@_4 = Base.iterate(%3, %10))
│   %22 = (@_4 === nothing)::Bool
│   %23 = Base.not_int(%22)::Bool
└──       goto #4 if not %23
3 ─       goto #2
4 ┄ %26 = MARK::Int64
│   %27 = ForwardNeuralNetworks.numweights(a)::Int64
│   %28 = (%26 == %27)::Bool
└──       goto #6 if not %28
5 ─       goto #7
6 ─ %31 = Base.AssertionError("MARK == numweights(a)")::Any
└──       Base.throw(%31)
7 ┄       return out

I don’t undestand why out is type instable. out is bound to x whose type we know. Also function function (a::layer)(weights, x) is type stable.

I apologise if the formatting above is not helpful.

It looks like the problem is that the return type of f is unconstrained and cannot be inferred. This helps, but requires the weights, inputs, and outputs to all be the same type:

#----------------------------------------------------
function (a::layer)(weights::Vector{T}, x::Matrix{T}) where T
#----------------------------------------------------

    MARK = 0

    aux1 = @view weights[MARK+1:MARK +  a.Dout * a.Din]

    w = reshape(aux1, a.Dout, a.Din)

    MARK += a.Dout * a.Din

    aux2 = @view weights[MARK+1:MARK + a.Dout]

    b = reshape(aux2, a.Dout)

    MARK += a.Dout

    @assert(MARK == length(weights)) # make sure all weights have been used up

    a.f.(w * x .+ b)::T # evaluation of layer

end

Edit: Oops, all that is really required is to constrain the weights and inputs to be the same type:

function (a::layer)(weights::Vector{T}, x::Matrix{T}) where T

The type assertion on the return type doesn’t do anything here.

Thanks for your suggestion. I am afraid, however, that the type instability persists. Here is the code again (slightly shortened) incorporating the suggestion made by @contradict:

using LinearAlgebra, Random

struct layer{F}
    Din::Int64  # number of inputs to layer
    Dout::Int64 # number of outputs of layer
    f::F
end

makelayer(;out=nou::Int64, in=nin::Int64, f = tanh) = layer(in, out, f)

#----------------------------------------------------------------------------
function (a::layer)(weights::Vector{T}, x::Matrix{T}) where T
#----------------------------------------------------------------------------

    MARK = 0

    w = reshape(weights[MARK+1:MARK +  a.Dout * a.Din], a.Dout, a.Din)

    MARK += a.Dout * a.Din

    b = weights[MARK+1:MARK + a.Dout]

    MARK += a.Dout

    @assert(MARK == length(weights)) # make sure all weights have been used up

    a.f.(w * x .+ b) # evaluation of layer

end


#----------------------------------------------------
function (a::Array{layer,1})(weights, x)
#----------------------------------------------------

    MARK = 0

    out = x

    for l in a
        
        # the output of each layer is the input of the next layer
        # the input to the first layer is x

        out = l(weights[MARK+1:MARK+numweights(l)], out)
        
        MARK += numweights(l)

    end

    @assert(MARK == numweights(a)) # make sure all weights have been used up

    return out

end

numweights(a::layer)          = a.Din * a.Dout + a.Dout

numweights(a::Array{layer,1}) = mapreduce(numweights, +, a)

Create a neural net and dummy weights and input:

# create three layer net
net = [makelayer(out=50, in=1, f=tanh); makelayer(in=50,out=50,f=tanh);  makelayer(in = 50, out=1, f=identity)]; 

# create some weights
const w = randn(numweights(net));

# create some inputs
const x = randn(1,1000);

let x = x, w = w
   @code_warntype net(w,x)
end

I omit the output of @code_warntype.

EDIT: Earlier I posted while mistakenly thinking that the above suggestion got rid of the type instability. I have heavily edited this post and deleted one that doesn’t apply any more.

After further experimentation, I figured out that my problem lies elsewhere. In particular, it has to do with how I use the struct layer. I have created the following MWE that demonstrates the issue I am facing:

struct example{F}
    
    f::F

end

function (e::example)(x)

    e.f(x)

end

function (V::Vector{example})(x)

    out = x

    for v in V

        out = v(out)

    end

    out

end

Let’s put this to work via the following:

V = [example(cos); example(sin)]

V(1.0) # try whether this works - yes it does

@code_warntype V(1.0) # check type stability - it fails!

Please help me understand why there is a type instability when calling V(1.0). I believe I am facing the same issue in my simple neural network code.

I suspect the issue has to do with me instantiating a Vector{example} type when I create V.

Yes
your vector V is of type Vector{example}, containing two elements of different type, the first is example{typeof(cos)}, the second is example{typeof(sin)}.

You could add another element of type example to that vector that has a .f which returns an integer (or anything else)

julia> push!(V, example(x -> round(Int, x)))
3-element Vector{example}:
 example{typeof(cos)}(cos)
 example{typeof(sin)}(sin)
 example{var"#1#2"}(var"#1#2"())

By just knowing the type of V, the type of the output of each element can’t be deduced

Thank you. In the context of the MWE, I wonder how I should implement the “chained evaluation”, i.e. one function feeding into the next, of example(cos) and example(sin).

I guess I should avoid working with a vector such as V = [example(cos); example(sin)], but I am not sure what I should use instead.

If your functions all have identical input and output types, you can use FunctionWrappers.jl to wrap them and then place them in a vector without the usual instability that normally arises from multiple functions.

If you’re only using relatively few functions (<16, let’s say) then you could make a tuple of functions instead of a vector.

Another option is to use (aka \circ aka ComposedFunction) to build chained functions. This is probably better behaved than the tuple but is under-the-hood very similar.

Based on your precise usage, there may be something a little more idiomatic that doesn’t require these techniques. But it’s hard to say from what we know so far.

1 Like

In the context of the MWE above, I would like to be able to flexibly specify in my code a composition of functions, as in V = [example(cos); example(sin)]. When I say flexibly I mean that I would like to easily change it to e.g. V = [example(cos); example(sin); example(exp)].

In terms of the original motivation (see neural net code at the top post), I would like to be able to flexibly specify a composition of layers, as in net = [makelayer(out=50, in=1, f=tanh); makelayer(in=50,out=50,f=tanh); makelayer(in = 50, out=1, f=identity)].

I hope I am making myself at least a bit clearer.

For the MWE above, I would strongly consider V = exp ∘ sin ∘ cos (or V = ∘(exp, sin, cos)) if it’s just a line you’re writing and changing manually. But won’t work well for long or dynamic compositions.

For long or dynamic compositions, I’ll recommend either accepting instability (the cost of dynamic dispatch once per layer is modest compared to the calculation within a layer of decent size) or using FunctionWrappers.jl.

For example, if you define your activation functions like f = FunctionWrapper{Float64, Tuple{Float64}}(tanh) they will all have the common type FunctionWrapper{Float64, Tuple{Float64}} (notice: unrelated to tanh) and so your entire array will be made of identically-typed layer{FunctionWrapper{Float64, Tuple{Float64}} so entirely stable.

Initially, I would experiment with just accepting the instability and benchmarking to see if the cost is actually significant. You can probably mitigate a lot of the cost with function barriers or the occasional type annotation, if it’s relevant. FunctionWrappers looks like it would solve your problem, but is a little involved so is not my initial recommendation.

2 Likes

It’s worth noting that a lot of this knowledge is encapsulated in existing container layers like Chain. If one looks at the implementation:

  • Tuples are used by default to preserve type stability
  • Vectors can be used for longer lists of child layers/functions to selectively mitigate the compilation overhead of long tuples
  • Instead of performing mild type piracy on a built-in type like Vector, a dedicated callable struct is used. This is also what Base.ComposedFunction (what returns) does.
  • Under specific circumstances (e.g. Zygote is being used for AD) a generated function may be more efficient than working recursively over tuples.
2 Likes

First of all, thanks very much for all the valuable input. It has been really useful.

I find the comment made above by @mikmoore important: before getting obsessed with type-instabilities (like I do) it is worth first investigating its actual impact.

I suspected that something like this must have already been dealt with in current neural network libraries, like in the case of Chain, as you have pointed out. I have had a look at the implementation, but I find it difficult to strip it down to the fundamental problem.

(1) Also, I am afraid that I don’t quite understand in what sense Tuples help with type stability. In danger of digressing (maybe I should start a separate thread), it all boils down to the type instabilities of the following simple example that I hope demonstrate the core of this seemingly simple issue:

function test_type_stability_1(result1)

    V = [cos, sin, exp]

    for v in V

        result1 = v(result1)

    end

    result1

end

function test_type_stability_2(result2)

    T = (cos, sin, v)

    for t in T

        result2 = t(result2)

    end

    result2

end

If I test the above two functions with @code_warntype I see in both cases that type instability crops up. Clearly, the naive way in which I have used Tuples did not help.

(2) To make matters more complicated: I wonder if “intermediary” type instabilities are a thing to worry about, even if the return type of the method can be reliably inferred (i.e. when checking with Test.@inferred). The reason why I think this might be a worry is because it is not clear to me whether automatic differentiation packages, e.g. Zygote, differentiating such a function can be affected by this “intermediary” type instability and propagate it further resulting in the whole gradient calculation call becoming type instable.

Iterating over a heterogeneous Tuple in a for loop is still unstable because, in this case, t is not known at compile time. You should unroll the loop either using ntuple or a @generated function + Base.@nexprs:

 function unstable(T, x)
    for t in T
        x += t(x)
    end
    return x
end

function stable1(T::NTuple{N, Function}, x0) where N
    x = Ref(x0)
    ntuple(Val(N)) do i
        x[] += T[i](x[])
    end
    return x[]
end

@generated function stable2(T::NTuple{N, Function}, x) where N
    quote 
        Base.@nexprs $N i -> x += T[i](x)
        return x
    end
end

and then

In [6]: T = (cos, sin, exp)
(cos, sin, exp)

In [7]: x=rand()
0.8447985810701389

In [8]: @btime stable1($T,$x)
  13.500 ns (0 allocations: 0 bytes)
8.946441053806423

In [9]: @btime stable2($T,$x)
  13.200 ns (0 allocations: 0 bytes)
8.946441053806423

In [10]: @btime unstable($T, $x)
  25.476 ns (0 allocations: 0 bytes)
8.946441053806423

You can check yourself with @code_warntype that stable1 and stable2 are type stable while unstable isn’t

3 Likes