@dhairyagandhi96 Manytimes in past, I wanted to use GitHub - FluxML/DaggerFlux.jl: Distributed computation of differentiation pipelines to use multiple workers, devices, GPU, etc. since Julia wasn't fast enough already for my machine learning projects, since my models naturally forms DAGs and the concurrency offered by Dagger seems nice. Also, we have recently touch this problem on slack, where we have been discussing distribution of a large language model across multiple GPUs, for which dagger seems to be nice. But I am naturally afraid to use project, which I do not understand internally. I do not need complete knowledge, but good knowledge is nice to have.
I would therefore like to ask, if someone can explain me, how the ChainRules goes together with dagger. My understanding is that as the computational model is executed, it needs to add tasks to computational graph used by dagger. That would mean that dagger can use dynamic computational graph. Or does it work differently? A some sort of explanation would be nice to have.