I would definitely like to support GPU, but I have no experience with it. So contributions or discussions in this direction are highly appreciated.
Thanks, Iāll think about how that might work. We have a physics simulation that we plan to parallelize on GPU and that could at the same time benefit from moving to an ECS paradigm. But how itās fully clear in my head yet. ![]()
Hi,
I am following this discussion with eager as I have a possibly naive question. I was recently profiling Julog.jl, because it is an essential part of planning ecosystem in Julia. The julog is notoriously type unstable. But looking at definition of structures used within the engine Julog.jl/src/structs.jl at master Ā· ztangent/Julog.jl Ā· GitHub, there is relatively few types. I therefore wonder, if there is any hope to use ECS for this. I think that the main difficulty would stem from the fact, that predicates can have arbitrary number of arguments. But possibly, if the maximum number is known in advance, this can be worked out.
Thanks for your thoughts in advance.
Components and systems can be on the GPU, there is no restriction. Are you thinking about the underlying storage and therefore query logic to be on the GPU as well? There seems to be very little work on putting the entire ECS execution on the GPU.
Yes, the idea would be to run per-entity system loops (parallelized) on GPU, and also a non-per entity (n-body-interaction) system.
I donāt know Julog.jl at all, so it is hard to say anything in this regard.
Ark.jl v0.2.0 will come with configurable storage backends per component type, currently Vector and StructArray-(like). It can already be tested on the main branch. So it is probably possible to move the entire ECS data, or data for some components, to the GPU.
Oh, that sounds great!
Ark.jl v0.2.0 is out! See the release announcement.
@simsurace Just wanted to let you know that entity relationships are now fully functional and available on the main branch. It will take some more time until we will release it officially, but if you want to give it a try, we would love to hear your feedback!
Awesome, canāt test right now but sometime in the coming weeks Iāll try some of our logistics stuff to use relations (delivery items in vehicles, vehicles in depots, etc.) and see how it goes.
@oschulz I would like to let you know that Iām trying to add GPU support to Ark.jl in this PR: Implement GPUVector by Tortar Ā· Pull Request #470 Ā· ark-ecs/Ark.jl Ā· GitHub. If anyone spots any possible improvement for an agnostic hybrid container which runs most of the things on CPUs but can also offload to GPUs (I guess that maybe it is possible to improve back-ends separately with custom stuff, which can go in extensions) or wants to comment on the interface, it would be very appreciated. For now benchmarks seem good!
using CUDA
using Ark
struct Position
x::Float32
y::Float32
end
struct Velocity
dx::Float32
dy::Float32
end
function update!(positions, velocities)
index = (blockIdx().x - 1) * blockDim().x + threadIdx().x
stride = blockDim().x * gridDim().x
@inbounds for i in index:stride:length(positions)
pos = positions[i]
vel = velocities[i]
positions[i] = Position(pos.x + sin(vel.dx), pos.y + cos(vel.dy))
end
return
end
function run_world_gpu()
world = World(
Position => Storage{GPUVector{CuVector}},
Velocity => Storage{GPUVector{CuVector}},
)
for i in 1:10^6
new_entity!(world, (Position(i, i * 2), Velocity(i, i)))
end
for i in 1:1000
for (entities, positions, velocities) in Query(world, (Position, Velocity))
gpu_pos = gpuview(positions)
gpu_vel = gpuview(velocities)
blocks = cld(length(gpu_pos), 256)
@cuda threads=256 blocks=blocks update!(gpu_pos, gpu_vel)
end
end
return world
end
function run_world_cpu()
world = World(Position, Velocity)
for i in 1:10^6
entity = new_entity!(world, (Position(i, i * 2), Velocity(i, i)))
end
for i in 1:1000
for (entities, positions, velocities) in Query(world, (Position, Velocity))
Threads.@threads for i in eachindex(entities)
@inbounds pos = positions[i]
@inbounds vel = velocities[i]
@inbounds positions[i] = Position(pos.x + sin(vel.dx), pos.y + cos(vel.dy))
end
end
end
return world
end
gets
julia> # AMD Ryzen 5 5600H
@time run_world_cpu() # 1 core
7.373623 seconds (7.53 k allocations: 141.863 MiB, 3.06% gc time)
julia> @time run_world_cpu() # 6 cores
1.576263 seconds (32.53 k allocations: 143.663 MiB, 1.89% gc time)
julia> # NVIDIA GeForce GTX 1650
@time run_world_gpu()
0.240809 seconds (19.61 k allocations: 141.952 MiB, 42.24% gc time)
Oh, nice, Iāll definitely have to try it out!
So, I finally came around to trying entity relationships. So far they seem really easy to use.
I wonder whether the order of entities that are returned by a query with relations is guaranteed to be insertion order. That would greatly simplify some things.
In my use case I use a relation to express that an item is stored in another item (vehicle). When checking for items to drop off, if they are loaded in order, I only need to check the first n, after getting to the first that is not deliverable, I can stop checking the remaining items. Up to now, I used a manual storage component. It would be nice to use entity relationships for this and not have to duplicate it with a storage container, but this requires the queried entities to be returned in the order they were inserted (their relations were set).
Any thoughts?
@simsurace No, such an ordering canāt be guaranteed. The reason is that swap-remove is used for efficiency when entities are removed, or moved between archetypes (as well as between ārelation tablesā). The order will only be preserved as long as no entities are ever removed from an archetype, except for the last one added (also applies for moves, i.e. adding or removing components or changing relation targets).
even if, as @mlange-42 says, the order canāt be guaranteed with the standard storages, you can use a custom storage type which has this property if you need to (a feature still only in the dev version of the package). A mutable linked list could be particularly suitable I think, though a new storage should be a subtype of AbstractVector so one should wrap the linked list and implement the necessary methods: the ones here https://ark-ecs.github.io/Ark.jl/dev/manual/components.html#new-component-storages
Actually, I think a mutable linked list is not the best structure and in any case to preserve insertion order we need to create an API to change the removal strategy for a new storage. Though, not really sure if it is worth it. A good structure for this could be maybe a vector with elements of type Union{Nothing, T} so that one marks things as nothing until too many removals have been done, and then do a batch remove. Clearly though iteration speed would be reduced.
If you want to support GPU backends, I would highly recommend using KernelAbstractions.jl, as it allows you to write your kernels in a non vendor specific way. Itās just an easy way to support AMD, Intel, Metal etc for āfreeā (i.e. no vendor specific code).
My first thought was a priority queue rather than linked list. But Iām not sure it is worth it, having custom components is probably more versatile for cases like this. But it is a lot of overhead compared to the relations, so the latter are definitely nice when order is not important.
yes, itās already implemented in a back-end agnostic way: e.g. in this section https://ark-ecs.github.io/Ark.jl/dev/manual/components.html#component-storage you can see the example using KernelAbstractions.jl