I want to introduce v0.1 of ReactiveGraphs.jl. You can find the documentation here.
The purpose of this package is to facilitate the development of (currently mono-threaded) data processing applications that consume multiple asynchronous data sources. I’ve noticed that building these kinds of data processing apps quickly becomes complex and involves a lot of boilerplate code. This repository aims to provide a minimal framework to simplify the task.
The main idea is to construct a computation graph that clearly defines the data dependencies between nodes. With the graph definition in place, we can achieve the following:
- Propagate data by triggering computations of the descendants in a topological order.
- Conditionally deactivate parts of the graph based on the value of specific nodes. For instance, if a data source is lost, paused, not initialized, or contains invalid data, we may need to deactivate certain descendants. The package includes tools to accomplish this, as it tends to be where most of the boilerplate code is required.
Additionally, I believe this package can be beneficial in the following ways:
- Testing: By breaking an application into nodes that can be tested independently of the rest of the graph.
- Clarifying the overall behavior of applications by constructing a computational graph.
The focus of this package is on low-latency processing. The library is fast and allocation-free. To achieve this, the graph definition is represented by a type rather than an instance of a graph. This allows for dispatching methods that efficiently update the state of the graph.
I am looking forward to receiving feedback on the package’s API, implementation, and your experiences with data processing applications. Thank you!
What is the difference to Dagger.jl? Both seems to process a computational graph.
They both require the user to divide one computation into smaller ones, but their objectives differ, I believe.
In the case of Dagger.jl, the aim is to utilize multiple processors to accelerate the entire computation. In this scenario, the computations should be substantial enough to offset the costs associated with scheduling and data movement.
On the other hand, with ReactiveGraphs.jl, the goal is not to speed up the computations but rather to structure the application in a different (and hopefully improved) manner. I anticipate that this package would assist in handling the asynchronicity of data sources, typically relying on the graph to manage conditional evaluation.
Overall, the use cases appear to be distinct. Dagger.jl appears to be more suitable for parallelizing a single large computation at a time, whereas ReactiveGraphs.jl seems better suited for processing asynchronous data sources in a low-latency environment.
Interesting project! I was confused by the documentation starting with the second code block, because the dependencies of
node_2 don’t seem to match was in the figure at the beginning.
node_2 = map(node_1, input_1, input_2)
node_2 will depend on those arguments, no? But in the figure it looks like
node_2 should depend on only
input_2 (box B).
I wonder if they can share the same api and change just backend. That would be fun. But to be honest, I am more interested in Dagger + automatic differentiation.
Nice package, how can I get node values?
Thanks! That’s an interesting question!
First of all, you can create a node that extracts its parent node’s value and stores it in a reference or vector (that’s how I test the package).
Technically, I could also provide efficient functions to directly access the values. However, I chose not to do so because there shouldn’t be a need for it. This may be a subjective opinion, but here’s my understanding: since the package is meant to simplify the handling of edge cases (such as uninitialized or invalid data) through the
select methods, by accessing the values of the nodes “outside of the graph,” you may end up manually handling edge cases that could have been avoided. Therefore, I decided that it was better not to facilitate this usage.
I would love to hear your thoughts on this. I can certainly provide accessor methods and (optionnaly) document that they should primarily be used for debugging purposes, to make the package more user-friendly.
Hey @LeePhillips. The graph and the example are actually independent. I realize it is unclear. I will improve the documentation to avoid this confusion. Thank you!
I would expect that printing any node in the REPL shows its current value also.
Very cool package! There’s definitely been an unfulfilled need for a DAG processor which supports streaming and also is subject to cross-vertex optimizations. In the future, when Dagger gains support for streaming and for DAG optimizations, we could look at “lowering” DAGs to ReactiveGraphs for better performance.