Proposal for deterministic memory management

I think that all of the removal and propagation can be preplanned and determined by the compiler. Why would this have less performance than a runtime GC?

It cannot be preplanned because how things live and are referenced is dynamic. We donā€™t have any borrow checker like Rust and even then when you want to use certain dynamic things you opt into ref counting (Rc in std::rc - Rust).

2 Likes

To add to this, maybe worth spelling out how youā€™ll handle an example like:

store, stored = let s = Any[]
    function store(x)
        push!(s, x)
        nothing
    end

    function stored()
        s
    end

    store, stored
end

function leak()
    x = [1, 2, 3]
    store(x)
    s = sum(x)
    s
end

leak()

stored()

leak()

stored()

Itā€™s worth trying to implement that exact kind of code in Rust to get yourself thinking about the ownership issues explicitly.

9 Likes

A brief History of Time :wink: I was there when Java was a little wannabe tyke called ā€œOakā€ lol ,and I saw and worked through its growing pains including the interminable GC pauses in Java v1. Honestly it was kinda an ughly baby duckling back then but BEA Weblogic Java JRockit Real Time GC is a beautiful swan now, so I hope Julia can get through that duckling to swan phase quickly lol.

So some food for thought about Industrial strength Java JVM Deterministic garbage collection strategies in production at large financial institutions today >>

Deterministic garbage collection is the ability to specify and maintain a maximum pause time for the memory system with a high level of confidence.

It is designed to deliver short, PREDICTABLE pause times ā€“ absolutely critical for trading systems.

.

There are no ideal garbage collectors,
but the deterministic garbage collection functionality in JRockit Real Time
represents a significant improvement over traditional methods.

More Alternatives - but NOTE Real Iffy about predictable pauses i.e. NOT determinstic >>

Related to ( I believe ) Stefanā€™s previous remark about LLVM mark-and-sweep algorithm ā€¦

As of Java 5 (Note v5 is Very Old, latest is v13) , the IBM WebSphere JVM has added generational GC configuration option to its classic mark-and-sweep algorithm.

BTW @StefanKarpinski I donā€™t seek the anger of the ancient Norse gods, IOW Iā€™m NOT griping :smile:, Iā€™m very glad Julia addressed GC early on in some fashion - and I assume this thread is about if there is any need for a new approach and forward looking so just wanted to add my two cents about what I know actually works in production at scale for large high transaction volume financial institutions today so Julia can successfully grow .

HTH,

Marc

Ps> If not already, I also wonder about applying C Lint or its more advanced descendants to this issue >>

Static automatic syntax analysis

Humans arenā€™t the only ones who can read source code, of course. You should also make static syntax analysis part of your development process. Static syntax analysis is what lint , strict compilation , and several commercial products do: Scan a source text and spot items that a compiler accepts, but that are likely to be symptoms of mistakes.

Expect to make your code lint-free . While lint is old and limited, the many programmers who donā€™t bother with it (or its more advanced descendants) make a big mistake.

https://www.ibm.com/developerworks/aix/library/au-memorytechniques.html

4 Likes

Besides all the technical issues (that I do not understand), is there any hope (or expectation) that Julia can be tweaked to be deterministic in terms of allocation? I do have amazing plans to embed Julia into Cubesats for attitude control purposes. However, those problems with allocation can be a no-go.

At least we now have good ways to avoid the compilation by calling all the methods at the startup. This memory issue might be the last barrier!

2 Likes

Good question. Somewhat more focused toward your actual application, Iā€™d slightly reword to ā€œWhat can be done so Julia can reliably be used in real time control?ā€

I think the answer is ā€œLots; Julia can be a really great platform for this in the futureā€. To expand, I think thereā€™s two very different ways to attack this:

Static compilation of a restricted Julia subset to native code for hard realtime

Allow a restricted subset of Julia (simple functions with concrete type signatures) to be compiled down to native code, with compile time checking of the ways in which the Julia runtime is used. For instance, you might instruct the compiler to verify that no dynamic allocation is used, or that now exceptions are thrown, or various other things.

In some sense this kind of works already (you can see with @code_native that many Julia functions are as tightly efficient as the corresponding function written in C). However thereā€™s no convenient way to emit the native code into an external shared library. And no system for using the compiler to statically prove invariants about your code. Also I believe that Julia Tasks running hard realtime control loops still need to pause for GC on other Tasks, but this problem can be fixed in the future if the compiler can prove that the hard realtime loop allocates no memory.

I think proving invariants and having a user interface to require these invariants may be needed for the most demanding hard realtime control loops where there are time deadlines which absolutely must be met for safety and reliability. However itā€™s got the rather large downside that some core Julia constructs will not be available. Therefore Julia code written with this in mind will form its own subset of the wider Julia ecosystem.

Note that none of this requires fiddling with the language semantics or syntax. No need to add scoped lifetimes or any kind of RAII to the language ā€” as others have pointed out, these features are difficult to reconcile with Juliaā€™s semantics. Rather, you write your Julia code in a particular style, and ask the compiler to statically prove invariants about the native code which are required for your application.

Low latency GC for soft realtime

Soft realtime ā€” where GC and other latency is nearly always below some threshold ā€” seems doable in the future without changing any language semantics. Itā€™s significant engineering effort though, so I guess itā€™s a matter of time and resources. This is the approach that @Marc.Cox is referring to above: tweaks and optimizations to the Julia GC to have very low pause times on average.

The larger picture

As I understand it, control systems often have a hard realtime loop embedded within a larger soft-realtime planning layer or layers. I think Julia can and will be a really excellent platform for having all of this within one application and in one language. People like @rdeits already use it this way for robotic research applications and have for some time, but I suspect the tooling has a fair way to go before it will be widely acceptable in industry (perhaps Robin or someone else can correct me on that, though :slight_smile: )

15 Likes

Iā€™d love to see Juliaā€™s GC move in that direction where soft-realtime applications are possible.

I work in quantitative finance and already do most of my trading strategy research in Julia, but am hesitant to move my production execution systems to Julia until GC pauses can reliably be limited to no more than 1-2 milliseconds.

If Julia can do that, then it would truely be a single language solution for me.

3 Likes

The GPU people have done a lot of the work for this. In some sense, you would ā€œjustā€ need to change the target from e.g. PTX to x86-64

6 Likes

Nice! Heh, thatā€™s why I added that weasel word ā€œconvenientā€ :laughing: I had a feeling that in some sense this was already possibleā€¦ for certain people.

Hereā€™s some really interesting reading for the journey to short pause times which the Go language runtime went through: Getting to Go: The Journey of Go's Garbage Collector - The Go Programming Language. They were last targeting 0.5 ms pause times which seems pretty remarkable. Do you have a quantitative feeling for how large Juliaā€™s GC latency would be in your use case?

The Go case is interesting because I think the Go runtime is very similar to ours in the macro aspects that are relevant. (Their concurrency model is similar and they also have a non-moving GC.) It would be super interesting to follow a similar journey for Julia, I suspect more a matter of having the time/money to resource the project than any technical problem.

3 Likes

I have seen that Go presentation. Very impressive results.

I wrote a simple market data handler in Julia a while ago and did some tests that resulted in pauses of up to 20 milliseconds. My existing solution could keep the pauses under 0.8 milliseconds. Iā€™m not doing anything ultra high-frequency where Iā€™d be counting single microseconds, but 20 milliseconds is a bit too much.

I tried to be careful with allocations in my test, but didnā€™t make much of an effort at a deeper dive. Thereā€™s probably some low hanging fruit with the TCP messaging.

2 Likes

I didnā€™t want to use complex lingo to make this part of Julia another frustrating Rust. The rules I set covers the ownership:

Things coming from the outside of the scope (entry points), should not be removed inside the scope.
The variable returned from a function applied to the arguments with different scopes will have the outer scope.

Here this scope exports a function (like a small module) and is executed outside of that module, which I am not sure if we can handle this. See my global variable and generic programming section.

We may be able to handle this only if this is written in one module. When you return a function, and you expect the scoping information to be available outside, you should write everything in one module. I should note that ā€œmodule wide memory management when the code includes global variablesā€ is not the first priority of this proposal.

I think of this let that export functions as a limited global scope. It is like a small module. Since it has global variables and you are exporting functions. It is not like a function, that is callable. In my proposal, I explicitly stated that this kind of global variable should be deleted manually. Two functions with let scope are defined and returned, but called from outside of that scope. This does not change anything regarding its let_global s.

store, stored = let s = Any[]     # `s` is `Var{:let_global}`
    function store(x)             # `x` coming from outside so should not be removed here
        push!(s, x)               # `s` is still `Var{:let_global}`
        nothing
    end

    function stored()
        s     #  `s` comes from outside so it is `Var{:let_global}`
    end
   
    # let's s is an outsider to this scope. Unless removed from outside, it will be kept in the memory.
    store, stored  # these functions are executed in `{:let_global}`
end

function leak()
    x = [1, 2, 3]   # `x` is `Var{:leak_local}`
    store(x)        # executed in `let_local`. `s` gets `[1,2,3]`
    s = sum(x)      # `s` becomes `Var{:leak_local}` with propagation. now `s` and `x` should be removed together
    s               # `s` is returned so it is not removed  (same for its associated `x`).
end 

leak()    # return is not stored so leak's `s` and `x` will be removed here

stored()   # executed in `let_global`. In that scope `s` was let_global so it is not removed.

leak()    # return is not stored so leak's `s` and `x` will be removed here

stored()  # executed in `let_global`. In that scope `s` was let_global so it is not removed.

# let's `s` was global so it is not removed here. 

You are writing a kind of code I have never written in my lifetime. If it was me, instead of thinking about ownership in Rust, I would change my coding style. I am not afraid to say this system cannot handle all the valid Julia codes. Also, see the next post.

We can do this in a backward-compatible manner. By only managing the memory for the variables that are defined using Var{:scope}() for named scopes and Var() for local scopes.

We can easily set limitations for the developers then. They should not call @eval in a scope that uses Var (we can return an error), they should manage global variables manually, and any other situation that we think we cannot handle automatically.

By setting these limitations I cannot think of any code that breaks. Either someone wants automatic memory management and follows the limitations or they should use GC.

That is my goal and I hope we can help to achieve that. Exporting the native code from Julia is being done in StaticCompiler.jl. Its limitations are totally fine to me. Neither we can cover all the grounds, nor we should limit ourselves for doing that.
Even in C++, there are specific guidelines and limitations that people should follow to write a very tight program. See this for example:
Embedded Template Library - A C++ template library for embedded applications

These two sound like two different paths to me. We should follow both. Improving GC helps the desktop users that donā€™t need realtime performance. Automatic management helps the other group that wants to use Julia in the realtime applications.

1 Like

How about when you store objects created inside a scope into an object that outlives the scope (note this is a dynamic behavior and cannot be statically known)? The system needs some way to know if the memory for that object can be freed when the scope exit. This brings us back to reference counting again, see std::shared_ptr - cppreference.com for itā€™s use in yet another language.

The soundness of the system cannot depend on a certain coding style though, right?

4 Likes

When one object is stored in another object with different scope, the outer scope should be chosen for both. If one is from outside the other one should live until the outer one lives.

Yes, I am not afraid to say that this system cannot handle all the valid Julia codes. See my comment here

Okay, so you need some kind of ā€œmanagerā€ that keeps track of every objects actual ā€œscopeā€? And this manager runs at every scope exit to clean up the objects that are no longer referenced? Sounds a bit like the current GC except it runs on every scope exit. Wonā€™t that be slow?

Do you have any links or reference to any other programming language incorporating this strategy or is this a novel algorithm for automatic deterministic memory management?

3 Likes

Exactly. I think the compiler can plan this.

The runtime slowness depends on the user. If the user wants a variable to be removed when the scope exits, the compiler should obey. I can argue that overusing of scopes can be harmful. This has a similar overhead that calling a function has.

In some cases, the compiler may be able to optimize this by deferring the removal (if it sees that a lot of memory is available, and this is safe).

Not exactly. I think Beef, C++, and Rust are good places to start.

No, I meanā€¦ it is dynamic. Which is why it needs to be a ā€œmanagerā€ that run alongside the code and keep track of stuff during runtime. Unless you have a ā€œborrow checkerā€ the lifetime of objects is not statically known.

Anything particular in those languages? Rust has a borrow checker and drop (Drop in std::ops - Rust) values when they go out of scope but there is also reference counting objects to into that when needed.
C++ has stuff like RAII but for handling dynamic lifetimes you can use unique_ptr and shared_ptr which can kinda be mapped to a borrow checker (only one owner) and reference counting respectively.

None of these feel really applicable here so perhaps you could elaborate.

2 Likes