ResumableFunctions typing issue


#1

@oxinabox - thanks for the great Lazy Sequences blog post.

ResumableFunctions is a really cool package but I was curious why it ended up being slower than the Channel implementation for generating primes. I did some benchmarking and it appears that the problem may come from the typing of the internal states - known_primes and x.

Expanding the macro, the type of both known_primes and x became Any (where they should have been Array{BigInt,1} and BigInt respectively. In additional, the literal big"3" became an Int in the generated code. In addition, a::Array{BigInt,1} and z::BigInt fields were generated but unused. It’s looking more like a bug to me.

Just sharing :slight_smile:

julia> @macroexpand(
       @resumable function primes_rf()
           known_primes = BigInt[2]
           x = big"3"
           @yield big"2" # Output the first prime, as we already put int in the list of known primes
           while true
               for p in known_primes
                   if p > sqrt(x)
                       # Must be prime as we have not found any divisor
                       push!(known_primes, x)
                       @yield x
                       break
                   end
                   if x % p == 0 # p divides
                       # not prime
                       break
                   end
               end
               x+=1            
           end
       end
       )|> MacroTools.striplines
quote 
    begin 
        mutable struct ##824 <: ResumableFunctions.FiniteStateMachineIterator
            _state::UInt8
            x::Any
            a::Array{BigInt,1}
            p::Any
            _iterstate_1::Any
            known_primes::Any
            _iterator_1::Any
            z::BigInt
            function ##824(; )::##824
                fsmi = new()
                fsmi._state = 0x00
                fsmi
            end
        end
    end
    function (_fsmi::##824)(_arg::Any=nothing; )::Any
        _fsmi._state == 0x00 && $(Expr(:symbolicgoto, Symbol("#159#_STATE_0")))
        _fsmi._state == 0x01 && $(Expr(:symbolicgoto, Symbol("#157#_STATE_1")))
        _fsmi._state == 0x02 && $(Expr(:symbolicgoto, Symbol("#158#_STATE_2")))
        error("@resumable function has stopped!")
        $(Expr(:symboliclabel, Symbol("#159#_STATE_0")))
        _fsmi._state = 0xff
        _arg isa Exception && throw(_arg)
        _fsmi.known_primes = BigInt[2]
        _fsmi.x = 3
        _fsmi._state = 0x01
        return 2
        $(Expr(:symboliclabel, Symbol("#157#_STATE_1")))
        _fsmi._state = 0xff
        _arg isa Exception && throw(_arg)
        while true
            _fsmi._iterator_1 = _fsmi.known_primes
            _fsmi._iterstate_1 = start(_fsmi._iterator_1)
            while !(done(_fsmi._iterator_1, _fsmi._iterstate_1))
                (_fsmi.p, _fsmi._iterstate_1) = next(_fsmi._iterator_1, _fsmi._iterstate_1)
                if _fsmi.p > sqrt(_fsmi.x)
                    push!(_fsmi.known_primes, _fsmi.x)
                    _fsmi._state = 0x02
                    return _fsmi.x
                    $(Expr(:symboliclabel, Symbol("#158#_STATE_2")))
                    _fsmi._state = 0xff
                    _arg isa Exception && throw(_arg)
                    break
                end
                if _fsmi.x % _fsmi.p == 0
                    break
                end
            end
            _fsmi.x += 1
        end
    end
    function primes_rf(; )::##824
        ##824()
    end
end

#2

Hi

Hereby a copy of my email to the creator of the lazy sequences blog explaining the results.

I was intrigued by the bad behaviour of @Resumable function primes_rf.
For 1000 primes I get the same result (factor 2 slower) but for 90 primes ResumableFunctions is 3 times faster and consumes 1/8 of the memory.
The problem is that the iterator for p in known_primes is stored every time we do a @yield and this is not needed because we do a break afterwards. A partial solution can be found for the inner loop:

for p in 1:length(known_primes)
                  if known_primes[p] > sqrt(x)
                      # Must be prime as we have not found any divisor
                      push!(known_primes, x)
                      @yield x
                      break
                  end
                  if x % known_primes[p] == 0 # p divides
                      # not prime
                      break
                  end
              end
              x+=1

On my computer this is a little slower than primes_ch but consumes less memory.
I get even a little better result when I manually remove p and the iterator variables from the list of stored variables:

mutable struct PrimesStruct <: ResumableFunctions.FiniteStateMachineIterator
                  _state::UInt8
                  x::Any
                  a::Array{BigInt,1}
                  known_primes::Any
                  z::BigInt
                  function PrimesStruct()
                      fsmi = new()
                      fsmi._state = 0x00
                      fsmi
                  end
              end
function (_fsmi::PrimesStruct)(_arg::Any=nothing; )::Any
              _fsmi._state == 0x00 && @goto _STATE_0
              _fsmi._state == 0x01 && @goto _STATE_1
              _fsmi._state == 0x02 && @goto _STATE_2
              error("@resumable function has stopped!")
              @label _STATE_0
              _fsmi._state = 0xff
              _arg isa Exception && throw(_arg)
              _fsmi.known_primes = BigInt[2]
              _fsmi.x = 3
              _fsmi._state = 0x01
              return 2
              @label _STATE_1
              _fsmi._state = 0xff
              _arg isa Exception && throw(_arg)
              while true
                  _iterator_1 = _fsmi.known_primes
                  _iterstate_1 = start(_iterator_1)
                  while !(done(_iterator_1, _iterstate_1))
                      (p, _iterstate_1) = next(_iterator_1, _iterstate_1)
                      if p > sqrt(_fsmi.x)
                          push!(_fsmi.known_primes, _fsmi.x)
                          _fsmi._state = 0x02
                          return _fsmi.x
                          @label _STATE_2
                          _fsmi._state = 0xff
                          _arg isa Exception && throw(_arg)
                          break
                      end
                      if _fsmi.x % p == 0
                          break
                      end
                  end
                  _fsmi.x += 1
              end
          end

The iterator over the known primes is not lazy but the iterator over the integers is … what makes a big difference. I think this is a nice supplement to your blog about lazy sequences…
As a final remark, the difference between ResumableFunctions and Tasks fades away when the calculations in the function take more time than the task switching (loop over all known primes and calling the remainder) … This is the case in the primes example and the overhead of accessing non local memory becomes important, e.g. precalculating the sqrt(x) outside the inner loop improves already the performance.

Benchmark results on my computer:
Primes_ch 1000: 30.724 ms (354943 allocations: 13.77 MiB)
Primes_rf 1000: 55.424 ms (365256 allocations: 13.28 MiB)
Primes_rf modified 1000: 35.873 ms (255592 allocations: 7.07 MiB)
PrimesStruct 1000: 35.103 ms (255592 allocations: 7.07 MiB)
Primes_ch 90: 5.029 ms (62717 allocations: 2.42 MiB)
Primes_rf 90:1.677 ms (10722 allocations: 351.93 KiB)
Primes_rf modified 90: 1.294 ms (9190 allocations: 265.56 KiB)
PrimesStruct 90: 1.279 ms (9190 allocations: 265.56 KiB)

A nice additions to ResumableFunctions could be a@local` macro that excludes variables from being stored between function calls. The problem is how to do this in a straightforward way for hidden variables as the iterator and its state.

I hope that this clarifies why the primes benchmark is worse for @resumable functions.


#3

Hi

I reviewed the code and I found another problem. During the creation of the state-machine the Julia compiler optimises too aggressively. This is the new output of the transformation:

@macroexpand(
              @resumable function primes_rf()
                  known_primes = BigInt[2]
                  x = big"3"
                  @yield big"2" # Output the first prime, as we already put int in the list of known primes
                  while true
                      for p in known_primes
                          if p > sqrt(x)
                              # Must be prime as we have not found any divisor
                              push!(known_primes, x)
                              @yield x
                              break
                          end
                          if x % p == 0 # p divides
                              # not prime
                              break
                          end
                      end
                      x+=1
                  end
              end
              )|> MacroTools.striplines
quote
    begin
        mutable struct ##662 <: ResumableFunctions.FiniteStateMachineIterator
            _state::UInt8
            _result
            x::BigInt
            p::Any
            _iterstate_1::Any
            known_primes::Array{BigInt,1}
            _iterator_1::Any
            function ##662(; )::##662
                fsmi = new()
                fsmi._state = 0x00
                fsmi
            end
        end
    end
    function (_fsmi::##662)(_arg::Any=nothing; )::Any
        _fsmi._state == 0x00 && $(Expr(:symbolicgoto, Symbol("#4#_STATE_0")))
        _fsmi._state == 0x01 && $(Expr(:symbolicgoto, Symbol("#2#_STATE_1")))
        _fsmi._state == 0x02 && $(Expr(:symbolicgoto, Symbol("#3#_STATE_2")))
        error("@resumable function has stopped!")
        $(Expr(:symboliclabel, Symbol("#4#_STATE_0")))
        _fsmi._state = 0xff
        _arg isa Exception && throw(_arg)
        _fsmi.known_primes::Array{BigInt, 1} = BigInt[2]
        _fsmi.x::BigInt = 3
        _fsmi._state = 0x01
        return 2
        $(Expr(:symboliclabel, Symbol("#2#_STATE_1")))
        _fsmi._state = 0xff
        _arg isa Exception && throw(_arg)
        while true
            _fsmi._iterator_1 = _fsmi.known_primes
            _fsmi._iterstate_1 = start(_fsmi._iterator_1)
            while !(done(_fsmi._iterator_1, _fsmi._iterstate_1))
                (_fsmi.p, _fsmi._iterstate_1) = next(_fsmi._iterator_1, _fsmi._iterstate_1)
                if _fsmi.p > sqrt(_fsmi.x)
                    push!(_fsmi.known_primes, _fsmi.x)
                    _fsmi._state = 0x02
                    return _fsmi.x
                    $(Expr(:symboliclabel, Symbol("#3#_STATE_2")))
                    _fsmi._state = 0xff
                    _arg isa Exception && throw(_arg)
                    break
                end
                if _fsmi.x % _fsmi.p == 0
                    break
                end
            end
            _fsmi.x += 1
        end
    end
    function primes_rf(; )::##662
        ##662()
    end
end

So thanks @tk3369 for pointing out this issue!

Ben


#4

Hi

I implemented a let block in ResumableFunctions to allow some variables not to be stored in between function calls (@yields).

@resumable function primes_rf()
                  known_primes = BigInt[2]
                  x = big"3"
                  @yield big"2" # Output the first prime, as we already put int in the list of known primes
                  while true
                      is_prime=false
                      let iter = known_primes, state = 0x00, p = big"0"
                          state = start(iter)
                          while !done(iter, state)
                              p, state = next(iter, state)
                              if p > sqrt(x)
                                  # Must be prime as we have not found any divisor
                                  push!(known_primes, x)
                                  is_prime=true
                                  break
                              end
                              if x % p == 0 # p divides
                                  # not prime
                                  break
                              end
                          end
                      end
                      if is_prime
                          @yield x
                      end
                      x+=1
                  end
              end

On my computer to collect 1000 primes, this is faster (2 to 3) than primes_ch and takes less memory (10 to 14).
I could automate this behaviour by wrapping every for loop without @yield statement inside into a let block.