The problem is that even when a tool is available, it’s too barebones. For example, if you want to use Kafka, there is RDKafka.jl, which technically cover the bare minimum (the ability to read and write to kafka), but for a proper data pipeline you need a lot more than the minimum. You need ways to guarantee an exactly once semantics (such as transactions, or some fine grained control of consumer commit) which you can implement with those primitives, but it’s added effort. In the same way you probably want a lot of consumers working in parallel, which you’ll have to manage by yourself, while many other libraries allow you to simply define a callback for each message/batch for the library and it will manage all the kafka consumer groups. And then I have no idea what errors/exceptions can happen and would probably have to hunt down the C library return codes.
Same thing for databases, LibPQ is a very solid wrapper, but most advanced features you’ll have to write yourself, like declaring cursors to implement database streaming, starting transactions and commit/rollbacks, using raw SQL for any kind of database reflection. And for contrast it has the Tables.jl integration that is absolutely amazing and has more features than pretty much every database library I used in terms of reading the results (and you can’t even figure that out by the documentation).
And that’s the ones that have libraries. It would be a giant effort to integrate Julia with Prometheus for metrics/alerts, since it requires creating an http endpoint that will receive information from all over the application so prometheus can probe them. It’s a similar work if you want to integrate with the kubernetes probes (liveness and readiness). This kind of architecture is much easier to implement with a solid actors library (you just create a server actor that will receive messages from all over the application and manage the endpoint), though I’m kinda biased in this because of Elixir and Scala.
If we want to be ambitious, imagine a library that abstracts Kafka in an “array” interface, and allows you to compose transformations using broadcast and the library automatically maps the events from the source topic to a target topic automatically managing the exactly once semantics and distributed processing. Or if the Kafka supported a Tables.jl interface and you could load the events directly into a DataFrame or JuliaDB and then serialize it back to another topic. For databases as well, multiple dispatch would allow for a fantastic query builder library that is much cleaner and more extensible than SQLAlchemy.
In my opinion, more than trying to support everything at once (like every pubsub including gcloud pubsub for example, or databases like oracle, bigquery, presto…), the ideal way to bootstrap Julia is supporting one of each very well (well documented, feature complete so people can do everything, good use of Julia’s strengths to sell the language) even if they initially have some infra-structure lock-in. And since Julia’s main strength is composability, then hopefully the high level tools for pubsubs or databases or endpoints can have their backends easily exchanged to support all other vendors with much less effort.
I made a prototype of a SQL builder for Julia a while ago, but I got overwhelmed with work right after. I really want to go back and make it a thing. And maybe something over the Kafka wrapper.