Hi everyone ! My team is thinking about using Julia in streaming on production - we are curious what do you think about it - are there any best practises or examples of other companies using it in streaming? We want to put Julia between two Kafka topics and do some computation.
Didn’t run it in production, but it should be pretty straightforward given that you:
Run as many workers as you have partitions and assign unique partition IDs manually.
Save offsets after each processed batch and restart from these offsets in case of failures.
Best practices of distributed applications also apply: health checker, logging, monitoring, etc. are still needed. It’s also worth to monitor lag between latest offset in a partition and current offset on a consumer side.
There are many different kinds of distributed systems, e.g.:
streaming applications, in which you are mostly concerned with 24x7 uptime and storing intermediate results;
batch applications that act rarely, but do a lot of work at once;
microservices, where you mostly think about good system decomposition reliability of each service;
MPI-like systems that give priority to the speed rather than reliability;
distributed databases whose main goals are data persistence and consistency, etc.
I don’t think there’s a single good reference text for all of them. From my experience it’s easier to start with one concrete distributed system / application and go from specific to more general tasks as needed.