Julia for real time, worried about the garbage collector

Give me a pitfall. I’m interested. I managed to get a max 0.3 ms time jitter running octave code in real time with 80 % CPU load using normal linux. I will do more testing but if I run it for one week or month with similar results I can call it hard real time.
People running real time linux get similar results.

1 Like

Hard RTC is not about empirically demonstrating (lack of) pitfalls, but constructing the language the whole stack so that they are ruled out by construction.

Julia has no such guarantees. This does not mean that you cannot write code that is more or less predictable in practice, it’s just that this is not guaranteed. Also, technically a new version of Julia is free to introduce changes that break this (in practice, things go in the opposite direction, but again, this is not guaranteed).

If you are OK with this, fine. Very few people need hard RTC (in the sense that they are willing to put up with the inconvenience of obtaining it).

7 Likes

…double post

Interesting, it seems to be a theory vs practice thing. As an engineer I focus on the pragmatic side of things. I will test and if it works I will call it “Empirical Hard Real Time”.
There are a million examples of when practical methods beats or are equal to theoretical methods.

Thanks for your info, it was helpful.

1 Like

There are already terms for this approach, eg “soft real time”.

And there are some common practical heuristics for making it robust: eg your analysis loop can check if unprocessed frames are piling up, and then drop some and warn the user.

Note that Julia is not the only variable you have to deal with: eg on typical Linux environment (desktop, but even server to a certain extent) various processes are started all the time, potentially causing tiny lags, so if you eliminate most of Julia’s GC activity and your process is sensitive to delays these can become relatively important. The Linux kernel has some sophisticated schedulers to deal with this.

5 Likes

It’s not just a question of theory vs practice. Let’s say you have to write a program that controls a plane, do you use a language that cannot guarantee that it will not pause randomly ?

3 Likes

i use isolcpu which isolates a cpu-core from the scheduler so it is not used, there is also a command cset --shield that I haven’t got to work yet that is almost as good.
But I can manually put processes in the isolated core. So nothing can stop or interrupt the process execution.
If someone can tell me what could be problematic in this approach it would be useful.
In the example above I was running a heavy soft real time plotter on the other core and it was only a 2 core cpu.

when does unprocessed frames happen?

1 Like

why would it pause randomly?

how do you guarantee that there is not a bug in your code so that the engine of the car breaks? or a bug that shuts the jet engine off? or a bug that sprays too much fuel so that it blows up?
how do you guarantee that the landing gear doesn’t break? or the propeller break?
how do you guarantee that the cylinder doesn’t fly in through the window? in engineering you test a lot, so why can’t this very common approach also be used in hard RTC?

You need to communicate with external world, like reading file, writing to console or network socket. Kernel without hard RT guarantees can block you there for some time.

There are methods for formal program verification. There are formal proofs for some small RT kernels. No way to formally verify linux kernel - much lines of code, so it is not done yet and probably never will be.

1 Like

There are non blocking sockets in Linux.

So you are telling me there are never bugs in hard RT coding? But I have seen bugs.
F35 (fighter jet) is known as a very buggy plane.

I like that you are coming with real examples though. That’s exactly how you guarantee quality in industry, you sit down and go through everything that can go wrong and prevent it.

1 Like

For general information on getting more control on GC operation, see the docs and look at functions/macros like enable, preserve or safepoint (there may be others too):
https://docs.julialang.org/en/v1/base/base/#Internals-1

I recall seeing functions to barrier gc temporarily too.

2 Likes

Just want to contribute some things to this wonderful discussion. I am new to Julia, but I come from the field of practice, and research about (hard) real-time systems and stumbled upon this thread randomly.

A hard-real time system is a system that performs every task within a strict given time frame, and I/O is serviced strictly within a small time window, not earlier, not later. Now, what you are proposing here is merely the setup for a real-time system. What you lack is guarantees in the form of formal proofs, that the execution of the algorithms and programs will respect the hard deadlines. You can’t provide these guarantees, especially on a Linux machine that runs on a multiprocessor system with shared memory and caches (pretty much every modern general purpose PC). What I mean is, even if you isolate the execution of an algo to the CPU, it is still not isolated, since it still uses a shared memory, which means any time the other cores access that memory, your CPU may be stalled. Cache coherency and collision is another issue that can stall your CPU execution for a nondeterministic amount of time. These are things that you have NO control over, unless you design your entire stack from the ground up to be hard deadline aware (read deterministic, which is very hard). That you have some empirical evidence to support some “real-time” performance on your setup is absolutely irrelevant.

To guarantee your system is hard real-time compliant, one usually needs to:

  1. Formally verify the correctness of the program
  2. Ensure that each task satisfies the given deadlines – i.e. use static scheduling
  3. Eliminate ANY task communication interference
  4. Analyze and fix a deterministic flow of data

That is something that is handled at the very bottom of the stack (read drivers and custom hardware). Entire hardware architectures are built just for this purpose, as well as RTOS kernels. Julia is way up the layers of the stack, with little to no control over these aspects. And that’s okay, because as far as I see, it addresses a completely different market. Furthermore, its not up to the language and its infrastructure that should address this issue, it is up to the designer. The language and its infrastructure should only provide the tools and capability to design the real-time system. Julia, unfortunately, does not provide these tools.

11 Likes

In psychophysical experiments, I usually just want to show the stimulus frames without dropping any, and generally deliver stimuli with low and predictable latency. Nobody dies if you drop frames or happen to have higher latency in a couple of trials, you can always throw those out. But of course it’s ideal to have that happen as little as possible. That’s what I’d call “real-time” in my field, and I’d also be interested if Julia can be useful here. Python and Matlab are widely used in this space

1 Like

Thanks for your insight, but I’d like to question what is meant by “correctness.” Something other than “the program will work?” I can believe that one can formally verify the number of machine cycles a code segment will take, but how much more can be proven?

For example, I imagine NASA uses hard real time, and probably tries very hard to verify their control systems. Yet the Mars Climate Orbiter still failed due to metric/Imperial units mismatch. Perhaps each engineer could claim their own subroutine had been “proven” correct, but the overall program was evidently not correct.

I’m certainly a believer in algorithmic correctness. But whatever is proven still relies on humans and their assumptions, e.g. that Intel floating point calculations are always correct.

This isn’t to diminish the importance of hard real-time. It’s just that I have heard talks about “formal verification of control systems,” say for a autonomous driving, that provide a guarantee of correctness and heavily imply safety. The problem is not whether the steering system is stable (which I still question because it assumes certain conditions that are hard to guarantee in real life). It’s whether the vision system has noticed that you’re driving into a semi truck.

I’m no expert, but formal verification isn’t some black science, if you’re interested in it, this is a possible place to start research (also try searching Hacker News for “verification”):

Regarding your questions about if it is possible to prove correctness of a computer program, I think it might be helpful to consider the program as any other difficult mathematical proof: it may be possible to prove some proposition, but it still only holds under some known assumptions.

EDIT: If I had to guess, I’d say that it’s not humanly possible to verify the software that Tesla or Waymo use for their self driving systems, and furthermore it stands no chance of ever being correct, because it’s (AFAIK) based on statistical methods.

2 Likes

@apo383

TL;DR - Without indulging into a long discussion about this, which clearly does not belong here, what we mean with correctness, is the adherence of the software to a predefined set of liveness and safety properties. It then boils down to how good the models that you use to perform a formal analysis are, and how good approximations to “reality” they are.

Long version:
From a pure software perspective, verifying correctness is accomplished by constructing a mathematical model (automata, FTS, petri nets) of the software, and specifying our informal specifications in a formal language, e.g. propositional logic, temporal logic, etc. Verification and model checking are then performed to verify that a) no unwanted state is reached (safety) and b) something good eventually is done by the system (liveness). It’s related to the so-called reachability problem. Theorem provers are also used to perform this task (e.g. COQ, Isabelle HOL, LEAN).

For real-time systems in particular, real-time behaviors are modeled using e.g data-flow graphs, max-pluss algebra, LPs, real-time calculus, etc. Analysis and scheduling is done by solving optimization problems.

Obviously, however, we deal with the real world, where our software interacts with the physical environment. We have an umbrella name for such systems: Cyber-physical systems. Its a very fresh research, with a lot of activity going on in there. Here we also take real-world phenomena and systems into account to perform analysis and verification.

Indeed, you are right that the crux is how good your models approximate reality, and how much margin of error can be tolerated. This was an active topic of my research for the past 4 years, and I can assure you that it is hard problem to tackle. Simple things such as the sampling rate can be the difference between making your controller stable/unstable, and whether the real-time deadlines are feasible. Yet, clearly a certain amount of jitter and deadline misses can be tolerated, see my answers (bottom) here: https://www.researchgate.net/post/How-do-i-derive-a-relationship-between-the-sampling-time-and-the-dead-time-of-a-real-time-system
https://www.researchgate.net/post/Why-is-proper-selection-of-sampling-important

4 Likes

Based on

I thought you were saying that Julia isn’t in a position to provide this kind of support. But then you said

Because you said “unfortunately” I’m now wondering if Julia could provide such tools without compromising its value proposition, but that it simply isn’t providing them at the moment. What tools would it need to provide?

1 Like

Julia depends on a “beefy” OS, like Linux. A Julia program therefore can not be hard real time because Julia can’t run on an RTOS or bare metal. I think.

Also, the way Julia does GC is quite incompatible with hard real time, I think.

To be more precise, when I talk about “tools” and “infrastructure”, I mainly refer to the low-level access of the tool-chain to the bare-bones and hardware (the HAL). Tools that C/C++ and Rust provide (in fact, I highly recommend Rust over C++ for safety stuff). Julia as a language and semantics (and syntax) is amazing, and it would be really nice to bring this kind of high-level of programming to, say embedded systems. The problem is that, when it comes to implementation, in its current state, you don’t have a complete control of the hardware resources, but someone correct me on this. To facilitate this, you would need to change the core philosophy of the language, i.e. you would have to give up GC, JIT compilation and optimization, and turn it into a low-level, AOT compiled language. This is IMHO a step backwards, and not forwards to what the language tries to achieve.

On the other hand, a direction would be to go the MicroPython way, and design a real-time/embedded systems specific Julia “kernel”. In that regard, there has been research on real-time GC, but it is in very infant stages. Even so, in its current state, I think MicroPython is fine for small embedded projects, but it is definitely not real-time worthy.

With that said, for me personally Julia provides tools for the design and analysis of real-time systems, and specifically the modeling part, i.e. amazing optimization packages, linear algebra, abstractions, and dynamical system modeling. I think this more than justifies its use in hard real-time systems :slight_smile:

EDIT: To complement my last point a bit more, I think Julia is an excellent language for hardware design (something like MyHDL). Its semantics, and more specifically its multiple dispatch and meta-programming, fit perfectly IMHO for the specification of hardware components, and I’m surprised that there has been little to no research in that domain yet.

6 Likes

I would keep an eye on StaticCompiler.jl:

Specifically, this PR: Basic rewrite for GPUCompiler by tshort · Pull Request #46 · tshort/StaticCompiler.jl · GitHub
And this write-up: Static compilation in Julia - Google Docs