Anyone interested in coding a Julia ECS?

If you know me, you know that I often make lots of attempts at trying something, and fail at many of them, many times before I even got to coding because I realized it would be too big or I wouldn’t win or whatever.

Here goes another attempt!

If you believe or wish that Julia will be the best tool for making games, you are not alone. If so, let’s carve out the path to glory together.

Let’s start simple, with a humble thing, Entity Component System.
This ECS would derive its performance from the fused update function and, once it’s done, the LoopModel library.

Unfortunately, I still have some problems with my ideas, for example, how to delete or change the archetype of entities efficiently and so on. If you’re up to the challenge, I invite you.

If a team can be made such that the project has a chance of being viable, I’ll start coding. If it’s not, then I’ll bury this project for maybe a few years and perhaps dig it up when the circumstances have changed. You do not have to start coding right away. Once I get confirmed team member(s), I will start programming what I can as the frame and then you can add more features until we get a rich library. An ECS can be small (a few hundred LOCs) or really large (like flecs). The size of this project will depend on a number of factors like features supported, our capabilities, etc.

I may not be a very smart programmer here, but Julia gives me the best chance of making a game I want come true, so, I will push even with my puny strength.

P.S. I’m sorry for everyone who tries to support me. I may not be able to complete most things I set out to do, or had to cancel them for various reasons, but I have to keep trying.

6 Likes

Did you check GitHub - heyx3/Bplus.jl: A modern OpenGL 4.6 rendering framework, written in Julia.? Seems like they have implemented an ECS here: GitHub - heyx3/BplusTools.jl: Extra add-on systems for the Julia game framework B+, I’m not at all an expert in this field but maybe it could be a good starting point?

1 Like

Actually there are more:

what about contributing to one of the systems? My first impression is that Overseer.jl seems to be in good shape. Or do you see shortcomings in those implementations?

2 Likes

They’re good. However, Overseer is a sparse ECS, not archetypal ECS, which is what enables my optimization. The B+ ECS is explicitly (by the developer) not performance-focused. Mine, on the other hand, is so performance-conscious that I’ve yet to figure out many key features because I’ve yet to find a performant way to implement them, which is actually why I’m asking for help, to find ways to efficiently implement various features.

Anyway… let’s continue the discussion here, where it’s more relevant.

The issue is that if you just try to make it work without trying to make it fast at all, you could store every entity as a dictionary and store every update as a function that checks each element whether or not they have the right components. The reason you need a complex ECS is because you need it to be fast, otherwise, you could’ve stuck to OOP or sloppily implemented ECS.

This doesn’t sound right in my ears because fast doesn’t need to be complex necessarily. Perhaps just phrasing though.

I also agree that the order is not quite as static as presented. Of course you should strive to make a reasonable guess what architecture of your code would be able to solve your problem. A bit like a painter has a motive in mind when they start painting. But still you don’t need to the flowers in the top-level window of the fifth house in the background if you didn’t put down some sky color yet.
So what I want to say is this: These types of “mad stuff”-hyperoptimizations sound very much like they are way too much for reasonable performance. Try something simpler but reasonable first. Maybe just good ol’ runtime computation, maybe DynamicSumTypes.jl, maybe make a post and try to be as specific as possible (perhaps with some demo code even!) what the problem is you want to solve to get some more ideas. But then go and actually focus on getting something that you can run to evaluate! Perhaps you will find that your reasonable approach is godd enough already. More likely there some bugs in the implementation that need fixing first and afterwards you realize that performance is good enough. If it still isn’t and you have the numbers, then you can come back here with (a simplified version of) the code and the numbers and will find many folks willing to optimize the heck out of your code. You are not alone if you don’t want to be :slight_smile:

Okay, I’ll give you some explanation as to how I ended up here.

Imagine you want a game with as many entities as possible. (Players/modders want as many entities as possible, so there is no “enough” here, only as good as possible.)

Now, each entity can have types, status effects, etc. If you store all entities with different types in the same array, you get type instability. Maybe you could do some virtual function tricks, and that’s OOP. However, virtual function calls are expensive for things that are supposed to be like a single update, like “if on fire, deal 5 damage to the entity.” So, what’s the solution? Store entities with the same combination of traits in an archetype. Now, you can loop over the entities that have the same data structure. Really fast. You can also do SIMD updates and so on.

However, here comes another realization. If an entity is on fire and poisoned, for example, and each deal fixed 5 damage, you can combine them into a single update! This update fusion can also add performance.
Another issue is that the compiler can sometimes do SIMD optimization, however, the standard compiler isn’t quite good at that, so, you’ll need something like (the now deprecated) LoopVectorization to ensure vectorization.

And well, the entity updates are supposed to take the most time. Entity interactions/promotion/etc are sparse, so you need to update them less, but you’d still need to support them somehow, and I have yet to figure out how.

And another thing is that I’ll leave the update fusion task to the LLVM for now. However, one could realize that the rabbit hole goes pretty deep. For example, if one update is v → A1v and another update is v → A2v, where v is a vector and A1 and A2 are fixed matrices, you could also precompute (A2A1) to squeeze out performance. The rabbit hole goes pretty deep. (Perhaps I could add XLA backends or options to store some data in a CUarray (GPU) for performance?) But I’ll get to that later on. As of now, there are still a bunch of issues I’ve yet to figure out, which is why this project is still stuck. My goal is to provide the fastest ECS in order to lay a strong foundation for Julia game devs. To be honest, many game devs won’t need such speed either. Many games are bottlenecked by rendering and not entity simulation, however, for games that do need fast entity simulation (and perhaps even willing to compromise some graphics quality for), including the game I (and I believe many others)want (and currently doesn’t exist, nor will likely come out of any known major studio or indie developers out there), a fast ECS would be really valuable.

That’s not correct in general but depends on how you design your code. There are many ways around this.

Don’t think this is a reasonable approach. I think you forgot the combinatorial explosion. Say you have 10 traits and each has like 5 possible behaviors/values then you could potentiall generate 10^5 archetypes and for each archetype you’d need to compile specialized functions. This is a huge investment of RAM and also might not even be faster execution wise because the processor has a finite bandwidth for fetching instructions as well (and this competes with the memory bandwidth because the code lives in RAM just like your data).

Conversely, if you don’t comoile special version for each combinations then you just need 50 small functions. And I don’t know why you say that this approach needs to be slow. Dynamic dispatch is slow yes, but you don’t need that. Done right (e.g. with DynamicSumTypes.jl, FunctionWrappers.jl or similar approaches) your “dispatch” can be just a bunch of conditional branches. While not the cheapest instruction possible this is still rather fast (and can get essentially free if you manage to structure your things such that the branch predictor has few misses).

Yes, that’s correct. However, archetypal ECS operates under the principle that entities will not take on all possible combinations. For example, if you have 900 pixies, 2 orcs, and 1 dragon, assuming no status effect, then you’d have 3 archetypes. In the worst case, you will have as many archetypes as the entities. In practice, many entities will share the same archetype. Again, think of warriors fighting on a large battlefield or something along that line. Maybe some of them would be afflicted with status effects, but these are likely highly correlated.

Flecs is an archetypal ECS in C. It stores entities in archetypes. It doesn’t actually compile fused updates like my plan. (Though a manual fusion benchmark suggests that if possible, fusing updates would actually increase performance) Initially, I had the same apprehension as you do regarding combinatorial explosion, but in practice, Flecs works.

I had a brief look and IIUC then flecs uses these archetypes as an organizational tool for the data essentially. I does not compile specific versions of functions for every archetype. The linked blog about the archetypes explicitly mentions type erasure to make things more uniform.

I think that your plan of specializing on every archetype is bound for failure. The main reason being the huge overhead of actually compiling methods (because the compiler is slow). So you’d never want to compile new methods on the fly because that for sure would cause a lag spike. Which means you either need to figure out which archetypes can possible exists (likely impossible) or precompile every variant which is also impossible.
Additionally, I am not convinced that there is much optimization to be gained from having specialized variants for different trait combinations considering the additional resources it takes. You mention a couple of easy cases where you can save like a few instructions but a few instructions. But I think in general it will be very hard to figure out these optimizations reliably. It is also not quite necessary because it could be done by hand, if a certain combinations arises very frequently. Then the game programmer could profile their game, see that there is an optimization opportunity and introduce a new combined trait (i.e. “PoisonedAndBurning” which gets applied whenever something burning also gets poisoned or vice versa).
In summary, I think too much specialization does more harm than good. But I totally understand the appeal. From a theoretical perspective it seems so nice and clean just to offload everything to the compiler but in practice it is likely a bad idea.

I understand the issue. One problem is that you likely don’t know in advance which combination arises frequently before it happens. Your game engine might support various types of troops for example but you won’t necessarily know in advance what kind of loadout would be equipped before the game gets run or what kind of spells would be cast. Perhaps there would need to be a cost model where you only compile fused updates if a large number of entities are present. That being said, let’s say that the entities update at 60 fps and the archetype lives for 20 seconds, and there is only 1 such entity. Well, in such the worst case…
I’ve compiled a simple function using RuntimeGeneratedFunctions.jl and it cost about 5 μs, so, 5000 ns. On the other hand, adding two numbers cost me about 17 ns. If you could save 1 addition per update, that’s already worth it.

The real issue is that the function, when compiled, stays in the ram forever though, outliving its archetype.

However, this is the real issue that prevents my ECS from getting implemented. Unlike other ECSs, mine does not have an indexable ID, each entity is an anonymous entity. This means I have no easy way of expressing things like “These two entities interact with each other.”

@Tarny_GG_Channie : Which ECS Julia package did you end up using? Or did you just dev your own?

I’m taking a break from Julia to do some other projects. I think for Julia, I’ll make my own if the plan doesn’t change, but who knows what happens long-term?

As a disclaimer, I have no experience with Julia, so bear with me for using some wrong terminology. However, I have experience with archetype-based ECS and implemented Ark, which is probably the fastest one for Go.

Unfortunately, I am not allowed to post links yet, so I hope I can provide them in a second reply for convenience.

I think that your plan of specializing on every archetype is bound for failure. The main reason being the huge overhead of actually compiling methods (because the compiler is slow). So you’d never want to compile new methods on the fly because that for sure would cause a lag spike. Which means you either need to figure out which archetypes can possible exists (likely impossible) or precompile every variant which is also impossible.

@abraemer This seems based on a misconception of how (archetypal) ECS works, presumably coming from OOP habits. There is no dynamic dispatch required at all to make this work.

The key point here is that entities don’t “contain” components, but are just IDs. Components are stored in archetypes, which are “tables” with array-like columns, one for each component that is present in the archetype. See Ark’s user guide chapter on architecture for a graphical representation of the concept.

Finally, there are queries (or “systems”), which do the logic. A query iterates the matching archetypes, i.e. those that have the components the query is interested in. The query fetches (only!) the relevant component columns and either provides these columns to the user, or iterates over the entities (i.e. rows) of the archetype and provides the user access to the components at that row.

There is zero dynamic dispatch required for that, and no function specialization. Also, there is no runtime check per entity, as checking archetypes for their components is already sufficient.

This can already be super fast, particularly faster than “array of structs” (i.e., iterating an array of entity “objects”) because it is very cache friendly. Only the required data (i.e. component columns) is fetched, and components are contiguous in memory. My ECS Ark achieves 2ns per entity for the classic Position+Velocity example, i.e. fetch 2 components, and add X and Y of one (velocity) to the other (position). Julia could probably be do this faster due to more aggressive compiler optimizations compared to Go. Additionally, Juila may use SIMD…

Regarding combinatorial explosion: this is not really an issue, as it does just not occur in real use cases like games or simulations. Even with entity relationships (see Ark’s resp. user guide chapter), which create an archetype per relation target entity, this is not really a problem when not over-used mindlessly.

Some Julia-specific considerations for performance:

  • Components should be immutable, so that they actually end up stored in component arrays, and don’t escape to the heap as mutable types would. This means that after updates, components must be replaced. Not nice for the API, but well…
  • SIMD would be very useful to speed up further, but I have no idea (yet) how that could be realized in Julia
  • The “update fusion” optimization does probably not gain much, but could still be easily implemented by the user (i.e. not worth it to try to implement it in the ECS itself)

On the other hand, adding two numbers cost me about 17 ns. If you could save 1 addition per update, that’s already worth it.

:astonished_face: This should be below 1ns!

@Tarny_GG_Channie in case you still want to build an ECS, feel free to take Ark’s architecture as a basis. This was also the way to implement the high performance Mojo ECS Larecs.

I can also highly recommend reading the ECS blog post deries by Sander Mertens, the author of Flecs, on Medium.

Building a high-performance ECS is definitely something that can be done by a single developer in a few weeks, even far beyond the basic functionality (not speaking about something as sophisticated and feature-rich as Flecs though). I am considering doing this for Julia, following Ark’s architecture. In case I really start, I will let you know.

5 Likes

Links here:

  • Ark: (https)://github.com/mlange-42/ark
  • Ark’s architecture: (https)://mlange-42.github.io/ark/architecture/
  • Ark’s entity relationships: (https)://mlange-42.github.io/ark/relations/
  • Mojo ECS Larecs: (https)://github.com/samufi/larecs
  • ECS blog series by Sander Mertens: (https)://ajmmertens.medium.com/building-an-ecs-1-where-are-my-entities-and-components-63d07c7da742
4 Likes

I have a feeling that we could maybe try to join forces on this: I’m not sure if it would be possible to keep the current design we are working at in our packages, but the work @nhz2 and I are doing seems really related to ECS-style simulations, even though they were constructed for a different purpose. I would be interested in knowing if these packages could be extended to become full ECS (I’m actually guessing on the fact that everything can fit togheter), if interested, take a look at GitHub - Tortar/KeyedTables.jl: Tables with stable identifiers in Julia and GitHub - nhz2/UniqueIDs.jl: Manage IDs and indexes.

edit: actually I think that what we have in mind at the moment is a bit less structured than an ECS, but if I understand it correctly it should allow to make an ECS on top of it

Okay… I’ve seen a lot… and I still have a lot to learn… but it seems like I’m subjectively busy, as in, I got a lot of time on my hand, but no time to do these kinds of stuff because I think I don’t have the time. Or sometimes I just feel like it’s getting a bit difficult to finish so I stopped.

A bit ironic… yeah… but that’s the current situation. Sometimes I even contemplate if I should stay with Julia, go somewhere else, or even start a new programming language. So many ideas and not much capacity.

I‘m interested in this too. A port of Ark to Julia would be an interesting experiment. Some comments/questions:

  • I‘d be interested to hear your view on existing efforts like Overseer.jl and maybe ReactiveECS.jl
  • To get mutable semantics with immutables, you could use Accessors.jl under the hood. But wouldn‘t defining components and how they are transformed be something that happens in user space?
1 Like

Regarding Overseer and ReactiveECS, they both use architectures I am not convinced of for my typical use case, which are scientific agent-based/individual-based simulation models. Did not really look into the code details, but the docs give some insights.

Overseer states to be inspired by EnTT and uses sparse set ECS. This is architecture is optimized for adding and removing, but is way slower for iteration compared to archetypal ECS. Queries require a check for each entity whether it has the requested components, not per archetype as with archetypal ECS. Further, the iterated entities’ components are not densely packed in memory, so it is less cache-friendly. The performance difference that you can expect is at least one order of magnitude.

Regarding ReactiveECS, the docs are not that clear. However, the architcture docs here suggest that components aren’t densely packed (gaps for removed entities), presumably causing similar cache isues as sparse-set ECS. It also requires significantly more memory than archetypes, as all entities use up memory for all components, no matter if they really possess them. Further, my understanding of what they call “reactive” is that entities are somehow “dispatched” to systems on creation or changes. This introduces unnecessary overhead. It seems to me that this design ignores the advantages of archetypes, which make such dispatch unnecessary, as the querie’s checks are very cheap and only performed per archetype. Also, it seems to prevent ad-hoc queries, or at least would make them quite expensive.

The benchmarks further down the document are also not very convincing. I may understand them wrong (they are not really clearly explained), but numbers like 800ns for creating an entity with one component is horrendous. Ark does this in 30ns.

Regarding immutable components, sure, users would define them. But it would probably be possible somehow to check for immuatbility on component registration and panic on mutables. Or to just prompt users in the docs to use immutable components if this is not possible. I have not yet figured out how the query API for updating immutable components could look like, but there are certainly ways to do this ideomatically.

2 Likes