[ANN] Vahana.jl - Framework for large-scale agent-based models

I’m happy to announce the release of Vahana.jl, a framework for (not only) large-scale agent-based models with a focus on complex social networks.
There will also be a prerecorded talk at JuliaCon tomorrow.
The repository is here, and there is also a extensive documentation, including tutorials, guides and API references.

14 Likes

Hey there @sfuerst ,

congratulations on the release! :tada: I’ve just briefly skim read the documentation and it clearly shows that you have put a lot of effort into it. Especially the three examples in the tutorial section, the sprinkled tips throughout the API section as well as the paragraphs on performance tuning stand out here.

Now of course I couldn’t read through all of it, so please bear with me if I’ve missed something obvious but I’m wondering how Vahana.jl compares to Agents.jl. Why did you opt to design and build your own package instead of extending on the Agents.jl functionality? Was there something that you fundamentally disagreed with regarding the architecture of Agents.jl? It seems as if an Agents.ABM model with an underlying network graph as model property would achieve much of the same as Vahana.jl. Did you run into any issues for your type of application (econ/social networks) when using it? Looking forward to hear your thoughts on this. (Maybe it would even be a good idea to put a short subsection on this into the docs so that users can more easily decide on which ABM package best fits their use case and how they compare/differ?)

Anyways, great work and compliments for bringing it into v1.0 shape. :slight_smile:

6 Likes

The development of Vahana grew out of work with models simulating the entire population of Germany (with a reduced number of agents for performance reasons, but still with up to 20 million of them). Simulations of such models require parallelization of a single run, which is not met by Agents.jl. Existing frameworks for ABMs that also work in the HPC context are scarce and, from my point of view, very unsatisfactory (in general, but even more so for our applications that have a focus on social networks), so we decided that there was room for another framework here. My goal was to hide the complexity of parallel simulations as much as possible (take a look at the Repast HPC documentation and you will understand what I mean), by adding some implementation constraints given by the restriction that the model must be specified as a Graph Dynamical System.

We then evaluated the options how to implement such a HPC-framework and decided that Julia is probably the best choice (and even though I had my problems especially with the quality of the documentation of some packages, I don’t regret the decision).

I hope this explanation makes it clear why Agents.jl was not an option for us, and I have to admit that I haven’t worked with Agents.jl enough to really work out the differences. But I do think that Agents.jl is a very good framework, and for many model types it is certainly a better choice than Vahana, e.g. Vahana only supports discrete space, and as the Predator-Prey model shows, models where the agents move in this space are implementable, but certainly not Vahana’s strength. Let’s not even talk about path-finding.

When it comes to implementing very network centric models, I think Vahana has the edge :wink: . Especially because it supports very naturally different subgraphs, but also because the visualization is aimed exactly at networks.

But there is at least one different design decision. From the Agents.jl documentation: “when we have to make a choice between a simpler API or a more performant implementation, we tend to lean in favor of simplicity)”. This is not the case for Vahana.jl, performance was very much in the foreground during development, and a lot of development tim was invested on that. The various edge hints, for example, have to be taken into account in almost all places, be it in the output in the REPL, in the conversion to a DataFrame, or when saving in HDF5, and they exist only for performance reasons.

There are certainly models where performance does not play such a big role. In our case, however, where the models require dozens of core-h for each simulation, the situation is different.

This sounds interesting, I’ll try to have a look at the package when time allows! What is the current research project that you are using this tool for?

This has never actually happened though :smiley: maybe it’s time to remove this comment because to the best of my knowledge Agents.jl is the fastest general purpose open source ABM framework around while still being the simplest to learn, so this comment gives an incorrect vibe. Proof is here. Every single functionality in Agents.jl has been optimized to oblivion and beyond. I have no doubt nevertheless that specialized tools may outperform significantly Agents.jl.

Sure, but can you provide quantitatively concrete proof? How much faster are we talking about, 2x or 2000x? It makes a huge difference. We have GraphSpace as well, so hopefully there is some plot in a with x axis the number of agents and y axis the performance? and one curve for Agents.jl and on curve for Vanaha.jl?

It is completely true that Agents.jl does not support in-model parallelization. It supports parallelizing the entire model run though, e.g., when doing parameter scans, so I am not so sure it is correct to say “it lacks parallelization”. In any realistic scientific context you will for sure be running a model several times, either changing parameters or simply changing the RNG seed for concrete statistics. So a performance comparison would only make sense if this feature of Agents.jl was included, and you runned 1000 times a model with different seed each time. EDIT: Ah, but I now read in your message that in-model parallelization was extremely important for your application, so forget what I said above. (still would like to know how much faster Vanaha.jl is though!)

Regarding in-model parallelization, I am curious: how did you guys handle simultaneous write access to model properties? In several scenarios agents may modify model properties in-place, and this may lead to arbitrary type of race conditions. Do you forbid such in-place modifications of model-wide properties alltogether?

More importantly, from what you have argued so far, nothing sounds like a stopper from making this work in connection to Agents.jl (while still being an independent package) instead of being something entirely isolated. We have declared a full Model API, and e.g., we have UnremovableABM that enforces the only rule that agents cannot be removed, yet gains big performance from that. You may have constructed a more specialized model type that enforces your network type strcture and you could still allow people to use intuitive names like move_agent! etc. But I haven’t had a look yet at Vahana.jl so I am only speculating whether this would have been possible. I need to have a closer look!

I am very much looking forward to the JuliaCon talk!

2 Likes

The last project I was involved in before developing Vahana was an epidemilogical simulation, the benchmarks I will show in tomorrow’s talk are a comparison between that model and my reimplementation.

My current project is about sustainable mobility (link goes to a German text), where we will implement a new version of a model that started my HPC adventure.

I really meant implementing the model, and not the performance. The only comparision between Agents.jl and Vahana.jl is the implementation of the Hegselmann-Krause Model, where the Vahana implementation is slightly faster.

Yes, of course. But e.g. even for the “small” Berlin scenario of MATSim Episim the memory usage allowed to start only 32 simulations on a node with 384 GB Ram and 96 cores, so 1/3 of the node was not used. And for the simulation of all of Germany, the entire memory is used by one simulation.

Like a cellular automaton. All agents update their state in parallel and the new state is not available to the other agents until the next call to a transition function. Of course, there are many models that cannot be implemented this way or require an additional layer of complexity (e.g., in the Schelling model, one could add a coordinator to be contacted by all agents who want to move and who then assigns them a new position. However, since this coordinator then runs serially, the performance of a parallel simulation would probably not be very good, at least with a higher number of processes).

I don’t think that this is possible. But if you still see a way after watching the talk and having a closer look, I am happy to discuss your idea.

I expected something like that. I used to work at the MPIDS for a long time and was always in contact with the Ecobus teams. I’ll let them know of your software, maybe they are interested!

Sure, but in what sense? If I look the same implementation of Hegselmann-Krause in Agents.jl and in Vanaha.jl, the Agents.jl appears “simpler” to me: it requires defining less structs, and requires giving less instructions to the software. In what sense is the Vanaha.jl implementation advantageous here?

Hm, okay. Cellular automata can be limiting when it comes to complex ABMs that cross-update properties, either global ones or neighboring agents’ ones. Even the trivial Schelling model cannot be implemented in such a way. Each position may host at most one agent, so if you updated all agents in parallel, and all agents pick new random “empty” positions, it is statistically guaranteed that many agents will pick the same new location.

However I think you can make the fair assumption that in your application scenarios a position may hold and arbitrarily large amount of agents, in which case this limitation isn’t really a limitation.

The framework looks very interesting! thank you @sfuerst

indeed I think that the memory usage is the crux of the problem when running multiple simulations at once (for very memory-intensive models). From what I understand though, if you compare independent runs in parallel vs. one run in parallel, it seems reasonable to say that running simulations independently should be actually faster not slower (if you can utilize all cores in both situations), simply because there is no between-cores communication overhead.

With all due respect to agents.jl, I disagree here. You have an implementation of the HK model in the Agents.jl example Zoo here. Since the version in the Vahana documentation does not include the termination condition, I created a version with that feature here. The Vahana version has fewer lines of code, fewer functions, fewer instructions (e.g., no need to handle and copy the old and new opinions), and is, in my opinion, easier to understand (but, of course, all developers of a framework have a strong bias here). And most importantly, the Vahana version immediately supports other graph structures, while the Agents.jl version only works for the full graph.

However, what I also mean when I talk about the HK model being an example of what might be easier to implement in Vahana than in Agents.jl are functions like create_graphplot, which creates an interactive GraphMakie plot, or show_agent, which allows e.g. to also show the state of the agent’s neighbors, that can support the development process. Unfortunately, my presentation on Vahana was not streamed today, which hopefully clarifies everything again a little.

But to avoid some confusion: The graph in Vahana is not comparable to a GraphSpace in Agents.jl, in Vahana the agents are the nodes of the graph. In Vahana there is more the philosophy of “Thinking like a vertex” from vertex-centric graph processing tools like Pregel. And I don’t consider Vahana to be a general purpose framework, even though you can implement significantly more models than you probably think if you read through the Wikipedia entry on Graph Dynamical Systems. I really don’t want to start a competition about which framework is the better one, I think the typical use-cases are actually quite different.

1 Like

FWIW, my team is looking at ABM disease modeling. We need to set up many subgraphs of familial, work, and accidentally encounters between agents. The GraphSpace in Agents.jl doesn’t meet the need because the nodes represent locations that many agents can occupy, while what we want to represent is nodes as agents and edges as relationship between individuals. We want to build and tear down graphs quickly between steps.

I read through the docs and examples, but I didn’t see anything that really met our use case. We’ve been talking over shoehorning our design into Agents.jl or rolling our own thing, but I’m hopeful that Vahana might actually be a good fit.

I could be totally wrong and there is a great way to model this in Agents.jl, but the docs didn’t help me figure that out!

1 Like

Yes, certainly. Vahana even detects if the simulation runs in parallel and compiles optimized code for the case that this is not the case, which has the consequence that the scaling from 1 to 2 processes is most of the time worse than afterwards from 2 to 4 processes. Whereby Vahana also optimizes for the case that only one node is used, and accesses the agent state of other agents from processes via shared memory (if possible = agents are on the same node), but e.g. the creation of new edges of the graph generates overhead, since here now must be checked which process is responsible to manage the edge.

But even independent of this, it can still be very pleasant to be able to fully utilize the processor during the development phase of larger models. For example, loading data via parallel HDF5 with many processes is much faster than if only one has to take care of it.

1 Like

Your assessment is correct. To do that in Agents.jl you would make the graph a model property.

I am sure that Vanaha.jl is more suited for these applications and likely to be faster. So your team is making the correct choice.

But perhaps we can have the best of both worlds: a model type that still exists in the Agents.jl API while being the network-centric framework that you need. This way if there are research topics that require a more general abm approach (or e.g., you want to model explicit movement in a city with our open street map space) it doesn’t take such a huge investment from the scientist to switch frameworks.

Once I see the talk I’ll see if this is integration possible at all (please post link here? Haven’t found it online yet but I went through the JuliaCon talk schedule!)

CU!

George

1 Like

Do not worry at all. competition is healthy and an opportunity for everyone to improve. it is nevertheless clear that Vanaha.jl has a user audience with a specific graph-oriented application scenario in mind and is surely more suited for that. Still comparisons should be done, as that’s a major avenue for improving.

Agents.jl is very poor in parallelization in anything else beyond parallelizing the entire model run. We can gain a lot from learning what others parallelize and how. Some things are just de-facto not parallelizable in Agents.jl, but some others may be.

1 Like

In the case you didn’t saw this above, I implented an epidemiological model using Vahana. The source code documentation of that model is not really good, so if you have questions feel free to contact me.

1 Like

Unfortunately, the organizers of JuliaCon have changed the handling of the Online Only Talks at short notice, and will now show them in one piece tomorrow at JuliaCon in room 32-082, but without streaming this live as well. This last-minute change also broke the links to the presentations themselves. At some point the video should be made available on YouTube, when it is, I will post the link here.

4 Likes

The video is available at Vahana.jl - A framework for large-scale agent-based models - YouTube - don’t know if this is definitive.

1 Like

Thank you for the link. I did not know it was already uploaded.

1 Like

I watched the Vanaha.jl twice now. Congratulations on finishing this project! How many people worked on this? Was it only you @sfuerst ?

I have some comments/questions/etc:

  1. Is it really true that the agent types (i.e., Vertex types) can only be bits types? I don’t remember exactly, but doesn’t this mean that agent fields can only be numbers? What made this restriction necessary?
  2. A crucial/core difference between Agents.jl and Vanaha.jl is the representation of interactions. In Agents.jl this is specified as an arbitrary user-defined function which may access nearby_agents and facilitate a hand-coded interaction, but also may do any other interaction with the model space or model global variables. In Vanaha.jl it is specified in three ways; first by definining and edge type and connecting two agents with this edge; second, by attaching a “transition” (function) to the edge type. This is arguably the most core difference, and I’ll return to it later. Third, by applying the transition function to the simulation, and specifying agent dispatch (this is similar in Agents.jl to giving the agent_step! function to run! and adding native-Julia multiple dispatch, agent_step!(agent::Seller, ..).
  3. A limitation of this Vanaha.jl approach of transitions functions appears to be its 4th argument: you need to decide a-priori what data an agent should access. If this information is dynamic and changes during the simulation, then I guess you need to specify every single thing in your model, which may (a) negatively impact performance or (b) become tedious if you have in total 10 different agent/edge types.
  4. I did not understand fully the separation vanaha does between agent id and agent state (in the video, when presenting the calc_demand function). In Agents.jl an agent has an id, and any other number of “state variables” such as money or name or position or whatever. What is actually an agent in Vanaha? I thought it was the vertex, and that vertex carries various metadata like its id, and other state variables. I think this point can be explained better: what is an agent, a vertex, an id, and a state variable.
  5. In the video it is stated that an agent may “expand” their state space by adding self-referential edges. I don’t understand this. Edges were presented as accounting for interactions across agents. Is the case here that we create a new edge type that also has some metadata fields and create an edge from the agent to the agent? But how would a neighboring agent access this sort of information? Can an agent access the edges of other agents? Can a Seller obtain the edges of a Buyer, even if the edge type of buyers was not specified during the apply! step?
  6. The code implementation part seems very well thought out.
  7. Doesn’t the synchronous update of all agents imply that all agent-related data have to have a 2nd copy (the current and “next” version)?
  8. I have to admit that the spatial component of the framework appears very complicated. I am inclined to believe that for single-core execution Agents.jl would be much faster for spatial models due to the significantly smaller amount of data and computations necessary to handle spatial movement and neighborhoods. Additionally, spatial movement and neighborhood searches is the component of Agents.jl that is the most optimized, and has been optimized over 4 major version increments of Agents.jl. v1 was a graph-based version like the one in Vanaha.jl and from v2 onwards we moved to using arrays. I would be very interested to see data that either prove me or disprove me when it comes to spatial simulations performance. A run of the wolf-sheep-grass model should be enough. If am correct about this, Vanaha.jl probably stands to gain by using the GridSpace of Agents.jl as the management for spatial simulations: agent ids are stored in vectors of integers, each vector in each cell of the raster. Then an optimized search function based on a custom iterator efficiently iterates over all vectors in all neighboring cells. To my understanding, it would be possible to use this spatial handling instead of the network based one.

At the moment it is indeed the case that there is no straightforward way to connect Vanaha.jl and Agents.jl due to the way agent interactions are specified. The rest of the parts however I do not think that they couldn’t be done as part of both projects. I.e., I think it would have been possible to create a VanahaModel as a version of an AgentBasedModel, and had separated Agents.jl into AgentsBase.jl that only declares and exports an ABM API. This way, the same function names like move_agent! and add_agent! and replicate! etc could have been used. The problem is: it may be too late now or require too much work. If we had this discussion at the start of the project, I am confident that we could have had a more similar interface (also necessitating changes in Agents.jl of course).

2 Likes

oh, also, where does the name Vanaha comes from and what does it mean

This is not easy to answer. Vahana has evolved over time (more than 5 years, see e.g. Working Paper) in various projects with many fruitful discussions and also help, especially in HPC. When I use the word “we” in the talk and in the postings, it is not a pluralis majestatis. But the Julia implementation is completely written by me.

Yes, the agents are the vertices of the graph, the agent type defines the label/metadata of the agent/vertex like the age of the agent. Ids are temporary pointers to a vertex. By temporary I mean that the id of the same agent can be different at different times. And at different times the same id can point to different agents. This is because the id carries the information on which process and at which index of the vertices vector the agent state is stored.

Perhaps one comment before I give the answer: It is a design decision that a Vahana model is always parallelizable, even if the model developer may not plan to use this feature. That is, even if it might be possible to bypass limitations of Vahana for a single threaded run, we deliberately do not offer this option. That this is a design decision, which one does not necessarily have to share, is completely clear to me.

Now to the bits type. This restriction results mainly from the use of the Message Passing Interface (MPI) for the parallel calculations. A bits type is immutable and contains no references to other values, only primitive types and other isbitstypes are allowed. As there are no references allowed, I can move an instance of a bits type to another process and can be sure that I do not “forget” necessary information when I am doing this. The restriction to primitive types and other isbitstype give me a fixed (c-compatible) memory structure of those instances, which allow me to move the instances to other processes without any serialization as memory blocks.

The restriction also sounds stricter than it actually is. E.g. in almost all example models I found in Agents.jl, agent types are also bit types. It would be nice to be able to use strings, which in principle works with InlineStrings.jl, since composite types with InlineStrings are still bit types. But the current HDF5 implementation can’t write InlineStrings. I hope to find some time in the future to solve this problem.

The main problem I see with bits types is that in ABMs vectors are sometimes needed, e.g. for agents to remember previous decisions. Which leads to the next question:

One way to use edges are for interactions. But they can be also used for simulating vectors. Create a new edge type Foo with the fields that you want to store in the vector. Give this edge type the :IgnoreFrom hint. Create edges to the agent that needs this information in it’s agent state (i usually to this as self-referential edges, but as the source of the agent is not stored when the :IgnoreFrom hint is set, this is not important). Then the agent can access this vector by calling edgestates(sim, id, Foo) which more or less the same as sim.Foo.read[id], where depending on the :SingleType hint, read is a Dict or a Vector.

No, edges can only be accessed by the target node/agent of the edge. For the “extended” state space, however, this is not a problem at all, since in this case the agent creates edges with itself as the target.

Do you have an example of such a dynamic simulation? But yes, in a parallel run this would not be ideal, since a lot of agent/edge states have to be transferred between transition functions, just on suspicion that something might have changed.

Yes.

Yes, definitely. I compared my Predator/Prey model implementation with Netlogo performance of the same model some time ago and it was about the same, which is of course catastrophically bad :wink: We added a spatial component to Vahana mainly for two reasons:

  • We want to access GIS (raster) data and aggregate results for this cells
  • We want to create social networks depenend on the spatial distance between agents (but this is only done a single time in a simulation)

From Wikipedia: “Vahana (Sanskrit: वाहन, Vāhanam or animal vehicle, literally “that which carries, that which pulls”) denotes the being, typically an animal or mythical, a particular Hindu God is said to use as a vehicle.” I liked the “that which carries, that which pulls” as a description for a framework, and the “vehicle” element fits our research context. And Agent.jl was already taken :wink:

1 Like