Is this a bug? Nested Functions and NLsolve

I was using NLsolve and couldn’t get type stability in my model no matter how. I’m new to Julia, so I thought I wasn’t fully understanding how type stability works. But I think this is part of a bug that dates from at least 2016 (see #15276).

Just wanted to confirm it, since it basically determines that the order and place where you put functions is crucial for type stability.
If it’s not a bug, I’d appreciate whether you could explain to me how I should think about it (in particular how to think about CASE 1 and CASE 2 below, which completely puzzles me!!!).

I read the documentation, but in my examples I’m only changing the order of the functions. Many thanks!!!

I start from the issue, and then show the consequences for NLsolve.
All cases provide the same results (tested in Julia 1.8 and NLsolve v4.5.1)“”"

### NESTED FUNCTIONS - all cases provide the same result
# CASE 1 - type unstable and hence slow
function result1(zz::Int64)
    function intermediate_result1(xx::Int64)      
        yy = xx + 1
        ff1(yy)
    end

    ff1(yy) = yy + 2
    intermediate_result1(zz)
end

@code_warntype  result1(2)
@btime          result1(2)


# CASE 2 - type stable and hence faster
    #just changing the order of the functions (weird!!!)
function result2(zz::Int64)
    ff2(yy) = yy + 2

    function intermediate_result2(xx::Int64)        
        yy = xx + 1
        ff2(yy)
    end    

    intermediate_result2(zz)
end

@code_warntype  result2(2)
@btime          result2(2)


# CASE 3 - type stable and hence similar to CASE 2 
    #putting function ff outside

ff3(yy) = yy + 2

function result3(zz::Int64)
    function intermediate_result3(xx::Int64)        
        yy = xx + 1
        ff3(yy)
    end    

    intermediate_result3(zz)
end

@code_warntype  result3(2)
@btime          result3(2)

# CASE 3b - type stable and hence similar to CASE 2 
    #defining function after behaves like CASE 3    

    function result3b(zz::Int64)
        function intermediate_result3b(xx::Int64)        
            yy = xx + 1
            ff3b(yy)
        end    
    
        intermediate_result3b(zz)
    end

    ff3b(yy) = yy + 2
    
    @code_warntype  result3b(2)
    @btime          result3b(2)


# CASE 4 - type stable and hence similar to CASE 2
# adding ff as an argument of the inner function
function result4(zz::Int64)
    function intermediate_result4(ff4,xx::Int64)
        yy = xx + 1
        ff4(yy)
    end

    ff4(yy) = yy + 2
    intermediate_result4(ff4,zz)
end

@code_warntype  result4(2)
@btime          result4(2)

What confuses me more is that if you wrap CASE 3 into function execute () and then run it, it becomes even slower than CASE 1 (while CASES 2 and 4 are not affected).

As for NLsolve: for my model, I have to solve several systems of equations that are nested. The model is quite long and hard to reproduce, so I provide an extremely simple example.
I even added specific numbers, so you can see the problem more clearly. If it’s not a bug, I think it’d be really useful if this behavior is documented in NLsolve.

using NLsolve, BenchmarkTools
# NLsolve - all cases provide the same result
##CASE 1 - type unstable and hence slow
function ff1(xx2)
    function solving1!(res,sol; xx2=xx2)
        yy1 = sol[1:2]
        yy2 = sol[3:4]
        
        res[1:2]     =   yy1 .- xx2
        res[3:4]     =   yy2 .- xx2
        return res
    end

        sol0        =   [ [0.99,0.99]; [0.99,0.99]]
        solution    =   nlsolve(solving1!, sol0)
        
        yy1         =   solution.zero[1:2]
        yy2         =   solution.zero[3:4]
        
        (yy1,yy2)  
end


@code_warntype ff1([2.0, 3.0])
@btime ff1([2.0, 3.0])

## CASE 2 - type stable and hence faster
# we only add `let` and `end` (or wrap the second part into a function)
function ff2(xx2)
    function solving2!(res,sol; xx2=xx2)
        yy1 = sol[1:2]
        yy2 = sol[3:4]
        
        res[1:2]     =   yy1 .- xx2
        res[3:4]     =   yy2 .- xx2
        return res
    end   
    
        let # or `function execute()`
        sol0        =   [ [0.99,0.99]; [0.99,0.99]]
        solution    =   nlsolve(solving2!, sol0)
        
        yy1         =   solution.zero[1:2]
        yy2         =   solution.zero[3:4]
        
        (yy1,yy2)
        end
end

@code_warntype ff2([2.0, 3.0])
@btime ff2([2.0, 3.0])



#CASE 3 - type stable and hence similar to CASE 2
# we define `solving` out of the function
function input(xx2)
    function solving3!(res,sol;parameter=xx2)
        yy1 = sol[1:2]
        yy2 = sol[3:4]
        
        res[1:2]     =   yy1 .- xx2
        res[3:4]     =   yy2 .- xx2
        return res
    end    
end

function ff3(xx2)
        sol0        =   [ [0.99,0.99]; [0.99,0.99]]
        solution    =   nlsolve(input(xx2), sol0)
        
        yy1         =   solution.zero[1:2]
        yy2         =   solution.zero[1+2 : 2*2]
        
        (yy1,yy2)  
end

@code_warntype ff3([2.0, 3.0])
@btime ff3([2.0, 3.0])
2 Likes

That for sure an interesting case :slight_smile:

I think the perception of this depends a lot on what your previous programming experiences are. Coming from a C++ background, I’m more surprised that result1 works at all, simply since ff1 is used before being defined. But there are many languages which allow this kind of reference.

I’m not sure if I would call it a bug or a missing feature,
but maybe it helps to see how one could look at the situation.

A bit shorter is this example

function easy_capture()
    y = [0]
    get_y() = y
    y[1] += 1
    @code_warntype get_y()
    return get_y()
end
easy_capture()

If we run this code the inside @code_warntype prints

Arguments
  #self#::var"#get_y#91"{Vector{Int64}}
Body::Vector{Int64}
1 ─ %1 = Core.getfield(#self#, :y)::Vector{Int64}
└──      return %1

Which essentially says that get_y() is a function which returns the data of type Vector{Int64} which is defined in the local scope of the surrounding function.

Since the variable y was already defined before get_y() was defined, there is never the situation
that one would need to worry about calling get_y() before the variable is defined.

To give an counter example of what could go wrong:

function bad_capture()
    get_y() = y 
    @code_warntype get_y()
    x = get_y()   # Error: 'y' is not yet defined! 
    y = 10
    return x 
end

The problem is clearly that calling get_y() too early is bad, which is reflected in the generated
code for this function

Arguments
  #self#::var"#get_y#11"
Locals
  y::Union{}
Body::Any
1 ─ %1 = Core.getfield(#self#, :y)::Core.Box
│   %2 = Core.isdefined(%1, :contents)::Bool
└──      goto #3 if not %2
2 ─      goto #4
3 ─      Core.NewvarNode(:(y))
└──      y
4 ┄ %7 = Core.getfield(%1, :contents)::Any
└──      return %7

It says that it has to check if y is defined (since it cannot predict when get_y() will be called).

Now, let’s spice this up a little bit. You can even get type instability when you define y before the capture!

function tricky_capture()
    y = [0]
    get_y() = y
    @code_warntype get_y()
    y = [1]  # this create a new array and binds the variable `y` to this new data
    return get_y()
end
tricky_capture()

Suddenly, get_y() is more complicated since Julia cannot predict if the variable is bound to the array [0] or the array [1] or if it even exists!
Note that the line y = [1] does not overwrite the existing array but creates a new array and bounds y to this new array!

Looking at the code for the inner function we get

Arguments
  #self#::var"#get_y#8"
Locals
  y::Union{}
Body::Any
1 ─ %1 = Core.getfield(#self#, :y)::Core.Box
│   %2 = Core.isdefined(%1, :contents)::Bool
└──      goto #3 if not %2
2 ─      goto #4
3 ─      Core.NewvarNode(:(y))
└──      y
4 ┄ %7 = Core.getfield(%1, :contents)::Any
└──      return %7

This is similar to before, whenget_y() is called it checks if it is defined and when it returns it it cannot predict the type since the variables y might have been bound to new data or not.


Overall: If you want type stability, make sure that whatever you want to capture is defined before
you define the function which should capture it (to avoid the bad_capture case where you refer to undefined data)
and, in addition, make sure that the variable is never reassigned to new data. That means in place operations are fine (such as y[1] += 1) but operations which bind y to new data are problematic.

1 Like

Thanks for your response! My point is that my case is different: I’m not calling any function. I should have rephrased this and said
since it basically determines that the order and place where you DEFINE functions is crucial for type stability”.
If you compare CASE 1 and CASE 2, I’m not executing anything. Just defining the functions, which in Julia shouldn’t matter as long as both functions are visible when you call them.

Mh, I think it is very similar, since ff1 is essentially a variable name which is bound to a function. The datatype is Function instead of Int64 but we still use a variable name to refer to it and it could be undefined or overwritten. I admit it is a bit strange to think of functions as something which is stored in variables.

For example:

function result1_bad_capture(zz::Int64)
    function intermediate_result1(xx::Int64)      
        yy = xx + 1
        ff1(yy)
    end

    result = intermediate_result1(zz)   # Error since `ff1` not yet defined.
    ff1(yy) = yy + 2
    return result
end

or

function result1_overwrite(zz::Int64)
    ff1(yy) = yy + 2
    function intermediate_result1(xx::Int64)      
        yy = xx + 1
        ff1(yy)
    end
    @code_warntype intermediate_result1(1)
    ff1(yy) = yy + 100
    intermediate_result1(zz)
end

If you check the output of @code_warntype it shows that it is the same as in the other examples:

Arguments
  #self#::var"#intermediate_result1#8"
  xx::Int64
Locals
  yy::Int64
  ff1::Union{}
Body::Any
1 ─      (yy = xx + 1)
│   %2 = Core.getfield(#self#, :ff1)::Core.Box
│   %3 = Core.isdefined(%2, :contents)::Bool
└──      goto #3 if not %3
2 ─      goto #4
3 ─      Core.NewvarNode(:(ff1))
└──      ff1
4 ┄ %8 = Core.getfield(%2, :contents)::Any
│   %9 = (%8)(yy)::Any
└──      return %9

Case 1 is as documented in the performance tips, while the situation with Case 2 is better than expected by looking at the documentation. So basically there’s nothing surprising here, you just got lucky with Case 2.

Lucky? What do you mean?

I’m defining exactly the same functions in all cases and not executing any of them. I’m only changing the order in which I define functions before any execution. And then the execution performs the calculations following the same steps in all cases.
So this is quite surprising. In fact, the more I read, the more I think it’s a bug.

A bug is when software behaves worse than documented. The situation is the opposite here.

Where exactly is this documented? You even said in the post you deleted that this is a known bug

1 Like

Performance tips, you provided the link.

You even said in the post you deleted that this is a known bug

Well I deleted the post because I changed my mind. I don’t think the issue you discuss has anything to do with the bugs I linked in the deleted post.

IMO it is this bug, effectively. I would just avoid that kind of construct (avoid nested functions and/or pass them as parameters, as in case 4). Even if that “worked” having a function definition after the function being called makes the code pretty hard to read, anyway.

1 Like

Thanks @lmiq . I can see why nested functions could make the code hard to read. My concern was for two specific cases.

  1. I thought that wrapping the main script into a function was sometimes good for performance (see
    here for example). Do you think this could create some issues in this respect?
  2. My case arose due NLsolve. CASE 1 was actually my attempt to keep the code clearer: in my model, the solution of the NLsolve is actually the system of variables of another system of equations to be solved with NLsolve. So, I was trying to keep each part within a function. I guess the lesson here is to keep everything as modular as possible.

Maybe this should be documented in Performance Tips and NLsolve? I guess I’ll write a post related to the use of NLsolve for economics. Models with nested objects/equilibria are quite common there, and it’s not obvious that the order in which you define functions should matter within a function (but not outside functions, like in CASE 3 and CASE3b, which I added).

Thanks again!

:+1: Yes, in general writing type stable code in Julia is a bit of an art, since sometimes
the compiler is smarter than one would expect and in other situation it’s the opposite.

When you write on this topic, you might have to talk about scopes (which is a surprisingly challenging topic): Scope of Variables · The Julia Language
On this page you find documentation on the scoping rules which apply in your cases and these rules are exactly the difference between wrapping everything inside a main() function or not!

Details:

First, a look at global scoping rules, i.e. inside the REPL. As you can see we have two variables here, a global y and a local y. They share nothing but only there name!

y = 10  # variable in global scope
function do_stuff()
  y = 0  # new variable in local scope 
         # hard scope rules: ignore the global variable in this case!
  return y^2
end 
do_stuff()
@show y  # y = 10!

In contrast, this is how it behaves when wrapped into a function. Now we have only one local variable y. I.e. the nested function inherits the local variables from the surrounding local scope!

function main()
  y = 10  # variable in local scope
  function do_stuff()
    y = 0  # hard scope rules: use (pre-)existing local variables!
    return y^2
  end 
  do_stuff()
  @show y  # y = 0
end

As you said, the behaviour changes between defining everything in REPL mode (i.e. the Main module)
vs defining everything inside a function. In REPL mode one works in the global scope and if you create a new local scope (e.g. a function) it creates new local variables (see documentation). Whereas inside functions, on works with the “hard” local scoping rules: if you create inside such a local scope a new function (i.e. nestled local scopes), then you will still re-use the local variables from the surrounding local scope.

This is what happens in CASE 1: yy1 and yy2 are local variables inside ff1 and the function solving1! uses these local variables instead of creating new ones. The nlsolve will actually update yy1 and yy2 many, many times and you could read yy1 before you assign it manually (of course, one shouldn’t do, but it’s a detail which was new to me as well :smiley: ).

For example:

##CASE 1 - type unstable and hence slow
function ff1(xx2)
    function solving1!(res,sol; xx2=xx2)
        yy1 = sol[1:2]
        yy2 = sol[3:4]
        
        res[1:2]     =   yy1 .- xx2
        res[3:4]     =   yy2 .- xx2
        return res
    end

        sol0        =   [ [0.99,0.99]; [0.99,0.99]]
        solution    =   nlsolve(solving1!, sol0)
        
        @show yy1    # this will show you the last values which were used by nlsolve

        yy1         =   solution.zero[1:2]
        yy2         =   solution.zero[3:4]
        
        (yy1,yy2)  
end

I hope this doesn’t add confusion. I just tried to provide a link to relevant documentation :slight_smile:

2 Likes

Thanks Steffen. Definitely identifying when a function can identify the type is an art.
The issue here is a little bit different I think. Take this example I was writing (forget about the details of what each object means).

# TYPE-UNSTABLE FUNCTION
function market_share(beta::Int64)    
    price(beta)   =  elast(beta) / (elast(beta) - 1)  
    elast(beta)   =  beta -1

    return price(beta)^(1-elast(beta))    
end

@code_warntype  market_share(3) #type unstable
@btime          market_share(3) #slow

Notice I didn’t execute any of the functions price and elast— only defined them. Just changing the order in which the functions are defined completely solves the type instability

# TYPE-STABLE FUNCTION
# define function "elast" first
function market_share1(beta::Int64)
    elast(beta)   =  beta -1
    price(beta)   =  elast(beta) / (elast(beta) - 1)    
    
    return price(beta)^(1-elast(beta))    
end

@code_warntype  market_share1(3) #type stable
@btime          market_share1(3) #fast

The price() = ... closure captures the elast local variable, and in the first example this variable is not yet defined at the point of capture. So, Julia boxes it, leading to worse performance.

Can this be improved in principle? I think, yes, closures sometimes have too tricky performance characteristics. But at least it’s logical, agrees with Julia performance tips, and doesn’t seem a bug.

3 Likes

First I would make a difference between:

  • a bug: incorrect behavior
  • performance issue: correct behavior but slow performance

The case we observe here is performance issue in my opinion. In particular, what we discuss is the performance of Julia in its current version (I tested it on 1.8.0 - in future versions the performance of the same code might change).

Let me narrow down the MWE even more, as you do not need two closures to have the undesired behavior:

julia> function market_share()
           price()   =  elast / (elast - 1)
           elast = 10
           return price()
       end
market_share (generic function with 1 method)

julia> @code_warntype market_share()
MethodInstance for market_share()
  from market_share() in Main at REPL[1]:1
Arguments
  #self#::Core.Const(market_share)
Locals
  elast::Core.Box
  price::var"#price#1"
Body::Any
1 ─      (elast = Core.Box())
│        (price = %new(Main.:(var"#price#1"), elast))
│        Core.setfield!(elast, :contents, 10)
│   %4 = (price)()::Any
└──      return %4

What does it show us? As @aplavin explained in an answer the elast variable is defined after the price closure is created. CURRENT state of the Julia compiler in this case does not do a look-ahead processing of the whole function body to check that elast is guaranteed to be a constant (this might change in the future). Therefore it assumes its type is not known and thus does boxing.

When Julia creates a closure (like price in your examples) then what it does is described here. As you can see a special struct is created and this struct needs to capture the elast variable as its field. Again - this is a situation at current state of the Julia compiler. It might change in the future.

If you move elast in front of price things work as expected:

julia> function market_share2()
           elast = 10
           price()   =  elast / (elast - 1)
           return price()
       end
market_share2 (generic function with 1 method)

julia> @code_warntype market_share2()
MethodInstance for market_share2()
  from market_share2() in Main at REPL[3]:1
Arguments
  #self#::Core.Const(market_share2)
Locals
  price::var"#price#2"{Int64}
  elast::Int64
Body::Float64
1 ─      (elast = 10)
│   %2 = Main.:(var"#price#2")::Core.Const(var"#price#2")
│   %3 = Core.typeof(elast::Core.Const(10))::Core.Const(Int64)
│   %4 = Core.apply_type(%2, %3)::Core.Const(var"#price#2"{Int64})
│        (price = %new(%4, elast::Core.Const(10)))
│   %6 = (price::Core.Const(var"#price#2"{Int64}(10)))()::Core.Const(1.1111111111111112)
└──      return %6

as current compiler can prove that elast is a constant.

However, any assignment to elast after price definition makes inference currently fail:

julia> function market_share3()
           elast = 10
           price()   =  elast / (elast - 1)
           elast = 10
           return price()
       end
market_share3 (generic function with 1 method)

julia> @code_warntype market_share3()
MethodInstance for market_share3()
  from market_share3() in Main at REPL[5]:1
Arguments
  #self#::Core.Const(market_share3)
Locals
  price::var"#price#3"
  elast::Core.Box
Body::Any
1 ─      (elast = Core.Box())
│        Core.setfield!(elast, :contents, 10)
│        (price = %new(Main.:(var"#price#3"), elast))
│        Core.setfield!(elast, :contents, 10)
│   %5 = (price)()::Any
└──      return %5

The same is if even you have an assignment to elast twice before price is defined:

julia> function market_share4()
           elast = 10
           elast = 11
           price()   =  elast / (elast - 1)
           return price()
       end
market_share4 (generic function with 1 method)

julia> @code_warntype market_share4()
MethodInstance for market_share4()
  from market_share4() in Main at REPL[3]:1
Arguments
  #self#::Core.Const(market_share4)
Locals
  price::var"#price#2"
  elast::Core.Box
Body::Any
1 ─      (elast = Core.Box())
│        Core.setfield!(elast, :contents, 10)
│        Core.setfield!(elast, :contents, 11)
│        (price = %new(Main.:(var"#price#2"), elast))
│   %5 = (price)()::Any
└──      return %5

I hope this helps.

6 Likes

I had planned to come back to this thread, but it seems you’ve already arrived at a conclusion!

It does manage to at least recognize that the elast is local rather than attempting to find a global variable.

The linked performance tips section in the OP doesn’t actually help my understanding so much because there, the captured r value was assigned before the closure was defined. So reading about how the parser does the boxing before the type inference can work, I wouldn’t expect assigning elast just once prior to the closure could be inferred, either. I think this is why OP keeps bringing up the order of definition mattering for inference. Perhaps there was some progress made on issue 15276 since 2016 that allowed market_share2 and OP’s case 2 to infer, but the trailing comments from 2021 say “a full solution” requires rewriting “the lowering stage” in Julia.

1 Like

Yes, because AFAICT compiler does do look-ahead to identify all local variables since this is a relatively easy task to do. What is hard is proving that elast is a constant (except for the case when there is a single assignment to elast before price is defined as in this case it seems it manages to prove this).

Some more thoughts for OP because I’m taking OP’s word that they are new to Julia, though I have to say this is not a beginner’s question.

  1. The scoping rules mentioned earlier reminded me to say that closures capture by variable rather than by value. It’s not worded this way but the behavior is shown in the “Let Blocks” and “Loops and Comprehensions” sections of Scope of Variables · The Julia Language. Note how different closures capturing the same variable will use that variable’s most updated value, not the value at capture; to replicate the behavior of capturing values, you have to create different variables, and Julia’s let/for scoping rules makes that easy.

  2. As mentioned and linked by @bkamins, a closure that captures variables is implemented as a hidden functor. The fields of the struct component correspond to the captured variables. Given this implementation, it actually makes a little sense that inference only happens when the captured variable is assigned just once prior to the closure. The closure is just a struct instance, and you can’t reassign a field there. If the captured variable is assigned after the closure or is assigned multiple times, then when the closure is instantiated, the corresponding field must be a mutable placeholder, the Core.Box. Core.Box has been described as an internal analog of Base.RefValue{Any}, so hypothetically it could be redesigned to be parametric like a RefValue, which I expect a fix of issue 15276 to do.

  3. The workaround let r=r in the Performance Tips section creates a new local r in a new local scope and assigns the outer r’s value to it, and boxing does not happen because that assignment occurs only once and prior to the closure instantiation. The let block is assigned to f because the block expression’s value is the last line’s value, so it somewhat acts like a function’s return. Note that not all block expressions do this; for and while do not.

  4. Case 3 and case 4 in the OP are a couple known workarounds to get type inference working. Case 3: global struct and function definitions are implicitly const variables (local variables cannot be const!), and since functions only check global variables when compiled, they can compile with known constants, which are great for type inference. Case 4: functions compile with argument instances present at the function call, so passing in a function won’t be a problem for type inference. You should be careful that input variables are type stable; while the function call itself might be type-stable, you would have to dispatch to one of many functions at runtime rather than dispatching to one known function at compile-time of the surrounding method. Functions are singletons (a type with only 1 instance), so you should make sure the input variable is only ever assigned that one function, even if multiple times for no reason e.g. ff4 = ff4.

1 Like

Thanks to everyone! This is really subtle! I thought that only conditional modifications to a local variable could hurt type stability.
The fact that the following example breaks type stability is really surprising.

function market_share4()
    elast = 10
    elast = 11
    price()   =  elast / (elast - 1)
    return price()
end

So, correct me if I’m wrong. From a practical point of view: basically, Julia looks inside the function and outside of it to see if the function/variable exists. My understanding in this respect should not change. However, Julia “struggles” to infer types when you manipulate a variable that is local to the function. Essentially, the problem is similar to what @SteffenPL was indicating.

Is there any advice to stay on the safe side in this respect?

I can imagine easily falling into this when you translate a huge model into a script. For example, take your suggestion of wrapping main scripts into function main() (link).

Thanks again!

EDIT: there was an example that was wrong, I deleted it.

The problem with this advice is exactly what think lead @alfaromartino to post this issue. The thing is that NLsolve.jl (and similar packages like Optim.jl) expect that the functions you pass to them only accept as arguments variables (not parameters).

What I do if I have this constraints:

  1. I define functions in global scope, if possible (in this way they go to global method table and do not lead to boxing)
  2. If I need to pass a parameter to a local closure (as said - this is sometimes unavoidable) I make sure that these parameters are never assigned to. If I need to manipulate them I use function barrier to ensure that this restriction is met. Here is an example. I want to minimize a function x -> (x - p) ^ 2, where p is a computed parameter (I have not even checked if in this case we would have a problem with inference, but the way I write it makes sure I do not have inference problem):
using Optim

function test1()
    fun(x) = (x-p)^2
    p = 0.0
    res = Float64[]
    for x in 1.0:0.1:2.0
        p = x
        push!(res, optimize(fun, -2p, 2p).minimizer)
    end
    return res
end

function test2()
    function fun(p)
        fun2(x) = (x-p)^2
        return optimize(fun2, -2p, 2p).minimizer
    end
    p = 0.0
    res = Float64[]
    for x in 1.0:0.1:2.0
        p = x
        push!(res, fun(p))
    end
    return res
end

test1 is bad, test2 is OK. I know it could have been written here in a better way, but I want to show a general pattern that makes sure things are OK always (i.e. without having to check if Julia was or was not able to prove that it does not need to do boxing).

Note that in test2 the fun function could also go to global scope (i.e. to methods table), per my advice #1.

2 Likes