Proposal for deterministic memory management

Amin_Yahyaabadi · May 10, 2020, 7:36am

I gave a proposal for deterministic memory management here:
https://github.com/JuliaLang/julia/issues/7721#issuecomment-626123507

StefanKarpinski · May 10, 2020, 3:35pm

That proposal addresses finalization, not memory management. Unless you can prove that no references to the memory escape the block, you have to keep it around or you’ll get segfaults. But if you can prove that no references escape, then you can do both finalization and memory reclamation automatically.

Amin_Yahyaabadi · May 10, 2020, 6:03pm

I think I was clear about the references in the Scope Propagation section. Can you give an example that does not work with the system?

StefanKarpinski · May 10, 2020, 10:09pm

What happens if someone accesses a reference that has escaped after the scope has ended?

Amin_Yahyaabadi · May 10, 2020, 10:45pm

An error would be returned. Similar to what happens here:

function fun()
    for i = 1:10
        x = 0
        y = x
    end
    return x + y
end

julia> fun()
ERROR: UndefVarError: x not defined
Stacktrace:
 [1] fun() at .\untitled-cd39f4c585141dc6f008275b9240b511:6
 [2] top-level scope at none:0

StefanKarpinski · May 10, 2020, 11:43pm

How? If the memory that an escaped reference points to has been collected and reused, how do you prevent memory corruption?

Amin_Yahyaabadi · May 11, 2020, 10:46am

For stack variables, RAII should work normally.
For heap variables, Var can be like a smart pointer. When the object goes out of the scope, the memory is freed and the pointer/reference to the object is deleted.

Var could be like this pseudocode (the actual thing should be in C/C++):

mutable struct Var{scope, type}
    value::type
    pointer::Ptr{type}
    scope::Symbol
    Var{scope}(value::type) where {scope, type} = new{scope, type}(value, Ptr{type}(pointer_from_objref(value)), scope)
end

If the concern is about the actual implementation, I understand that this might involve a lot of work and changes.

StefanKarpinski · May 11, 2020, 3:18pm

This does not work. If you free the object’s memory when the object goes out of scope but a reference to the object escapes, then when some code accesses that reference, boom bad things happen—i.e. segfault or memory corruption. The only ways to make that safe are: (1) prove that the reference doesn’t escape, in which case we can already reclaim the memory without any language changes, or (2) you don’t reclaim the memory so that accessing the reference doesn’t cause memory corruption—but in that case you haven’t actually solved the memory management problem. So, as I said, your proposal addresses finalization but not memory management. You can finalize an object at the end of the scope, but you have to keep the memory around for as long as any reference to it might exist.

Amin_Yahyaabadi · May 11, 2020, 3:52pm

I am not sure why you say this is impossible. How does RAII work in C++ (and other languages) then? Even C has a cleanup macro that does this. Can’t we remove the reference?

StefanKarpinski · May 11, 2020, 4:08pm

C and C++ are not memory-safe languages. In C++ if a reference to an RAII variable escapes and you access it after the scope has terminated, your program crashes (or worse, has memory corruption).

Amin_Yahyaabadi · May 11, 2020, 4:43pm

You say if a reference escapes the scope, but this is not what I said.

In this code, how Julia returns an error?

function fun()
    x = 1
    return
end

print(x)

or this

function fun()
    for i = 1:10
        x = 0
        y = x
    end
    return x + y
end

There is some mechanism that detects if a variable is known inside the function. Why cannot we use the same thing here?

Even if such mechanism does not exist, the compiler can easily track the references in Julia. As I said there is no scope propagation (except in raw assignment). The compiler can detect that a reference is called outside of its scope.

Mason · May 11, 2020, 5:03pm

Variable names and memory references aren’t the same thing. In particular, I don’t think it’s possible to prove that a reference doesn’t escape general julia code without restricting the language semantics quite a bit because it’s undecidable.

By contrast, local variable scoping is decidable.

mbauman · May 11, 2020, 5:07pm

Julia’s variables aren’t references. They’re just names. They don’t have locations or memory addresses.

In all your examples, you’re using things like x = 0 inside of functions. In such a case, not only will the x not have a memory address, it’s highly likely that there won’t even be memory allocated for the 0 itself, either! The case that requires memory management is things like y = [1,2,3]. And there, you’re not just tracking the y name itself but also anywhere you pass that array. It could, for example, get stored into a dictionary inside an inner function, and then if that dictionary gets returned you can no longer “clean up” y.

These two concepts are orthogonal.

Amin_Yahyaabadi · May 11, 2020, 5:08pm

For sure we need to have a layer of limitation over the program, so we can apply automatic memory management. In C++, people can easily break everything because there is no limitation. But that is user’s choice.

StefanKarpinski · May 11, 2020, 5:17pm

This is the entire problem.

Mason · May 11, 2020, 5:21pm

Okay, so Stefan’s point was that the difficult part of your proposal wasn’t the syntax, it is “how would you actually implement this in a memory safe way”. You responded to that by comparing it to variable name analysis which didn’t address the issue because variable name analysis is easily decidable in julia.

How would you go about limiting the language semantics here such that you don’t cause memory safety issues?

For instance, here’s an example with you proposal based on what @mbauman said

f(v, A) = rand([push!, vcat])(v, A)

let v = Vector{Any}[]
    Scope{:Foo} begin 
        Scope{:Foo} A = [1,2,3]
        f(v, A)
    end
    v
end

How do you make a robust mechanism that could detect that this is invalid at compile time and throw an error?

mbauman · May 11, 2020, 5:31pm

You want something like:

f(v, A) = rand([push!, vcat])(v,A)

Mason · May 11, 2020, 5:42pm

right, edited.

Amin_Yahyaabadi · May 11, 2020, 8:57pm

So I think instead of tracking the references, we can use my other option which was scope propagation:

With propagation: scope propagates in the operations unless it is explicitly overwritten by the user. The variable returned from a function applied to the arguments with different scopes will have the outer scope.

function fun5()
	Scope{:Foo} begin
		a = Var{:Foo}( rand(3) )
		b = a 		# a is also a `Var{:Foo}` now.
		c = a + 1	        # c is a `Var{:fun5_local}` now (because 1 was local)
		
		Scope{:Bar} begin
			# two different scopes
			d = Var{:Bar}( rand(3) )
			e = d .+ a 		# e is considered Var{:Foo} 
		end
		# d and e are removed here

	end
	# a, b are removed here
	return nothing
        # c is removed here
end

For instance, when a variable is passed to make a Dict, the scope is propagated to the resulting Dict. This ensures that both are deleted at the same time.

Note: I updated the original proposal.

kristoffer.carlsson · May 11, 2020, 9:04pm

It feels like you are edging (cc @mbauman) your way towards a reference counting system (Reference counting - Wikipedia). That is indeed a way of doing automatic memory management and it has the advantage that clean up happens as soon as you can no longer reference something. It has some other problems though, in general worse throughput performance than the GC strategy Julia uses and you can end up with cycles of references that are not freed.