Will Julia be more efficient than PyTorch

Hello All,

I am very much interested in deep learning, but I realize that I cannot buy expensive hardware that industries use to train things like GPT-NeoX and so on.I wonder if a neural network is built and trained on Pytorch, I wonder if the same is done on fluxjl, will the Julia implementation require less amount of GPU / compute?

Why I am asking this?

First I don’t know much about deep learning.

Second I want to challenge Tabnine, with a free software version of code completion assistant, I named it as GNU Ghost.

What I know is this

  1. Network of volunteer computers will be always powerful than any system put by a corporate.
  2. If I can find a way to distribute neural network training asynchronously across networked computers, then we can make better free as in freedom machine learning models.

According to point 2, training of models will be slow, but that’s okay. I wish to do such a project. Want to know if you people think I am insane or not?

1 Like

To start with: Flux will not be faster than PyTorch in most cases right now, simply due to PyTorch having more money and developer resources. I also think you’ll have a tough time beating Tabnine (and really GPT-3), which have a ton of developer resources and experience behind them.

That said, I think this is a laudable goal, and something that I would love to see happen in Julia. I think that our ML stack is rapidly getting to the tipping point where we can beat out PyTorch for certain problems, especially as projects like https://github.com/DhairyaLGandhi/ResNetImageNet.jl and https://github.com/DhairyaLGandhi/DaggerFlux.jl gain steam.

I think your first step for this project should be to get a much simpler (than GPT-3) auto-completion model setup and running in Julia, find open datasets to use for training, and setup an end-to-end demo of how to use these together to do basic text auto-completion on common hardware. This demo should be able to run without any non-Julia dependencies, and should take less than 1 week to train with the CPU (with maybe 8-16 threads) and less than 1 day to train with a common laptop or desktop GPU. The auto-completion results don’t have to be great, they just need to show some amount of sensibility. We can work on improving the model later.

If you can get this working on your hardware and achieve the above goals, I’d be willing to help port this to AMD GPUs, and also with porting to a multi-server setup. We can consider how to do distributed multi-user training securely and anonymously later (since many users will want to try out your code locally first to determine if it’s even worth investing time and compute resources on this approach). If we can get both CUDA and AMD GPUs working, I will gladly dedicate 1 full-time (AMD) GPU to training/development for this project.

9 Likes