The default iteration interface can cause memory leak?

I just read about the post “Iterate on it” which says the current implementation of iterate could cause a memory leak since the xs must remain alive over the entire loop and that is uneconomic for linked lists:

next = iterate(xs)
while next != nothing
  x, state = next
  # do something with x
  next = iterate(xs, state)
end

And the author suggested using the following protocol:

itr = iterator(xs)
while true
  next = iterate(itr)
  next == nothing && break
  x, itr = next
  # do something with x
end

Is it still true as of 2022? If so, will there be an improvement in the future?

This behavior is not a memory leak, since the object is (intentionally) still accessible. A memory leak would be if you can’t access the object anymore in your code, but the corresponding memory is not being freed. This being uneconomic for linked lists is unfortunate, but not really a problem (as the article says - it’s a minor optimization at most).

What I’m missing from the article though his some more details on this paragraph:

In many cases, this is a minor missed optimisation opportunity. If you have very large functional data structures, even infinite ones (the functional equivalent of an indefinitely long generator / channel), it’s a disaster.

What I think the author is getting at is that in their particular implementation of lazy datastructures, they’ve chosen to represent infinite lists as linked lists and would like them freed eagerly when iterating over them (but only in same cases, and cache the computed elements in others?). So knowing when to throw elements away and when not to is a property that can’t be deduced just from looking at the loop itself.

It also seems to me that it would have been better to represent such infinite lists as almost-no-state iterables, where the produced list elements are passed along in the state variable of iterate, instead of caching them explicitly in the List struct itself.

2 Likes

This is not a memory leak because Julia still release the memory if it got unused.

Julia’s iteration interface does not cause more memory assumption than other programming languages (Python/Java/C++/C#). Just like in other programming languages, Julia’s iterable object itself (xs in your example) should usually keep alive during the iteration, and a specified iteration interface creates a helper object (traditionally called iterator) to perform the following 4 operations:

  1. getiterator: create an iterator from the iterable object
  2. current: get the current element from the iterator
  3. movenext: move the iterator to the next
  4. hasnext: check if the iterator has ended

Java uses exactly 4 methods for the above 4 operations, C# and Python uses 2 methods (combine movenext, hasnext and current together).

Julia also uses 2 methods (iterate(itr) and iterate(itr, state)), which is not that different from those in other languages. This is to say, the following statement might not be appropriate:

state here can be regarded as the iterator, and we can implement the iteration interface for LinkedList this way:

# return iterator
function iterate(itr)
       if itr is empty
           return nothing
       else
           return (the first element, the second node of the linked list)
       end
end 

# hasnext + movenext + current
function iterate(itr #= unused =#, node)
     if node has next
         return (the value of the node, the next node)
     else
         return nothing
     end
end
#

Such implementation iterates LinkedLists in O(n) time.

Specifically, using above implementation, Julia is smart enough to not remain xs alive by inlining the code. As can be seen, the itr argument from iterate(itr, node) method is unused, hence disappear after inlined. If there is no use of itr after the iteration, Julia can release itr before stopping the iteration, if needed.

6 Likes

I think this is a complete non-issue? I.e. both protocols are mostly equivalent under code transformations that the julia compiler is good at. Consider:

#julia iteration
next = vanilla_iterate(xs)
while next != nothing
  x, state = next
  # do something with x
  next = vanilla_iterate(xs, state)
end

#mikeInnes' original proposal
itr = mi_iterator(xs)
while true
  next = mi_iterate(itr) #problematic line
  next == nothing && break
  x, itr = next
  # do something with x
end

#mikeInnes proposal (improved)
itr = mi_iterator(xs)
next = mi_iterate(itr)
while next != nothing 
  x, itr = next
  # do something with x
  next = mi_iterate(itr)
end

Now suppose we have a code construction that works well with the mikeInnes proposal. Then we can simply define

@inline vanilla_iterator(xs) = mi_iterate(mi_iterator(xs))
@inline vanilla_iterate(xs, state) = mi_iterate(state)

If you plug that into the snippet for vanilla iteration, you get:

itr = mi_iterator(xs)
next = mi_iterate(itr)
while next != nothing
  x, state = next
  # do something with x
  next = mi_iterate(state)
end

So that’s awesome! Julia iteration is at least as good / flexible / performant as Mike’s proposal in cases where type inference works.

Now next: Suppose julia moved to my modification of Mike’s proposal. Then we modify old-style iteration in the following way:

struct IterHelper{T} item::T end

@inline mi_iterator(xs) = IterHelper(xs)

@inline mi_iterate(wrappedXS::IterHelper) = begin
    xs = wrappedXS.item
    next = vanilla_iterate(xs)
    if next == nothing
        nothing
    else
        nextItem, nextState = next
        (nextItem, (xs, nextState))
    end
end

@inline mi_iterate(itr) = begin 
    xs, state = itr
    next = vanilla_iterate(xs, state)
    if next == nothing
        nothing
    else
        nextItem, nextState = next
        (nextItem, (xs, nextState))
    end
end

then let’s plug that into the modified iteration protocol proposal:

itr =  IterHelper(xs)
next =  vanilla_iterate(xs)
if next != nothing 
        x, state = next
        next = (x, (xs, state))
end
while next != nothing 
  x, itr = next
  # do something with x
  xs, state = itr
  next = vanilla_iterate(xs, state)
  if next != nothing
    nextItem, nextState = next
    next = (nextItem, (xs, nextState))
  end
end

It is quite clear how the compiler will continue to simplify that and end up with the original vanilla julia iteration protocol.

If you try the same with Mike’s original proposal, then you will encounter a problem in the marked problematic line

  next = mi_iterate(itr) #problematic line

The problem is that itr is either of type IterHelper or of type typeof((xs, state)). This is a problematic type instability, because the necessary compiler transforms (e.g. loop rotate) to double this line and make it type stable afaiu only happen after type inference. Maybe julia emits good enough code with suboptimal type inference that llvm can fix that; maybe not. I wouldn’t bet on it.

But maybe the compiler improved in that regard since I last looked?

cc @MikeInnes since you disabled comments on your blog post.

4 Likes

Languages have to be a bit cautious about what they promise as an optimisation. Whatever benefit is given when the compiler works, is taken away when it fails. It’s usually not the end of the world to lose a constant speed factor, but changes in time or space complexity are a bigger deal: they can make the difference between a valid, working program and a failing one.

This is an example of that. Sure, in simple cases the list will almost always be optimised away, but it’s not guaranteed, and it’s not hard to foil that optimisation. Julia’s heuristics are also not formally part of its guaranteed API/spec, so if you rely on this then in principle a change of compiler version could break your code (admittedly that’s unlikely in practice).

I don’t want to overstate the case, though. This is (just about) a real technical problem, but also a minor one. I don’t think anyone desperately wants to build web servers that iterate over lazy lists of requests or whatever in Julia. Stateful, self-destructing iterators are idiomatic and unproblematic. The blog post is more about obscure language design nerdery than any kind of critical flaw. The vast majority of people need not worry about it.

6 Likes

Inspired by the blog post and this issue I made SetupIterators.jl which I beleive addresses the concerns brought up here and is compatible with Julia’s current lowering with no language changes required

3 Likes