Horovod and Flux / Julia

Hi all,

I wonder if anyone was trying to use horovod with Flux (or Julia in general).

Thanks for a answer.
Best wishes,
Tomas

3 Likes

I had a look through and Horovod is a C++ library under the hood, no? Unless somebody has wrapped it with a C API, I imagine the only straightforward way to use it would be via PyCall (if indeed that jives with all the Distributed machinery). AFAICT the only collective communication library that does expose a C interface would be NCCL. It would be nice to have something like this though, Iā€™d rather not bother with getting CUDA-aware MPI working on my local machines.

Thanks,

wrapping c++ in c seems to be doable, pycall sshould be easy.

1 Like

Why Hovorod? It seems to me DeepSpeed (and DeeperSpeed fork of it) might be for a similar purpose, and I think it might be better. Iā€™m might be wrong but at least it has intriguing 1-bit LAMB (and before 1-bit Adam) breakthroughs:

I just think if anyone is planning to support these tools, start with the best one. Are there more similar?

DeepSpeed addresses the underlying performance difficulties and improves the speed and scale of the training with only a few lines of code change to the PyTorch model.

1 Like

Agreed, one should go for the best.

I was asking at that time, as my former colleague was promoting to use horovod, but it never went beyond talks. We never moved to action. At the moment, many other things have piled up and I would not have time to do this.

Deepspeed is a C++ extension for a Python frontend. Horovod is a C++ library with a Python wrapper. Neither is terribly relevant for distributed training in Julia at the moment. What would be relevant is trying to port over some of the techniques used in ZeRO-{1,2,3,offload}, either in https://github.com/DhairyaLGandhi/DaggerFlux.jl or some alternative library.

1 Like

Why not FastAI?
https://github.com/FluxML/FastAI.jl

The high-level abstractions in FastAI.jl are almost completely orthogonal to how distributed training is conducted. What will most likely happen is that FastAI uses the distributed functionality when it stabilizes.