I think it is important to note that much of the work on the Core of the language is being upstreamed.
https://forums.swift.org/t/differentiable-programming-for-gradient-based-machine-learning/42147
https://github.com/rxwei/swift-evolution/blob/autodiff/proposals/0000-differentiable-programming.md
I’m a bit of a beginner here. My understanding is:
-
a few years ago Google was looking for a new language for Tensorflow
-
they considered Swift, Julia & others
"In the end, we narrowed the list based on technical merits down to Swift, Rust, C++, and potentially Julia. We next excluded C++ and Rust due to usability concerns, and picked Swift over Julia because Swift has a much larger community, is syntactically closer to Python, and because we were more familiar with its internal implementation details - which allowed us to implement a prototype much faster."
-
they decided on Swift
-
now, they archived Swift
-
Does that mean they will move to Julia next?
Probably not. It appears that Jax is instead picking up a lot of the momentum that was behind TensorFlow
I’m confused, I thought Jax is “just” for differentiation, what are they gonna do w/ Jax?
I’m far from well informed, but I think the only reason they had to pursue an alternative language to Python was the AD and being more hardware agnostic. Jax is a Python approach that tackles both of those problems, so it’s less clear that TF needs to be rewritten in another language.
Probably wait for someone better informed than me to offer thoughts though
As @OmarElrefaei said, S4TF is heading upstream into Swift proper. Specifically, the core Swift team expressed interest in incorporating AD. Community members of S4TF responded by figuring out how to do that, and since Nov 2020 have an open proposal.
It appears that the Google Brain people are pulling the plug on the original S4TF project, but this is after the upstreaming was proposed. It’s unfortunate that they didn’t leave any pointers to the new project. (I’m assuming it’s separate people from the ones doing the upstreaming.)
TLDR: It’s hardly the end, it’s more like the parasite has subsumed the host.
I believe this AD work has since been fully upstreamed, but that still leaves the “TF” part of S4TF. It will be interesting to see what, if anything, steps in to fill that void.
I have strong doubts about this. Yes, the Tensorflow people have foisted a big complicated automatic differentiation (AD) engine onto the Swift language developers, but it’s unclear to me that this will get used.
Before, the premise was that Google’s investments in Swift4TF would foster and create a whole scientific / data analytic ecosystem in Swift that’d be using the AD engine they upstreamed. Now, with them pulling out, who exactly are going to be using (let alone maintaining!) this AD engine?
I strongly suspect this AD engine will just end up bitrotting in the base language until a major version change when it gets removed because nobody has the time, expertise or reason to maintain it.
wow! that was abrupt
I’m much more hopeful, because I am convinced that differentiable programming is the future. Right now there are two approaches:
- Build your own Tensorflow/Pytorch/Jax/etc., most likely including yet another rewrite of numpy.
- Incorporate differentiable programming from the ground up, with a sensible language and first-class support, e.g. Swift and Julia.
Right now the vast majority of the world is on Approach 1, with some companies putting in massive resources, to great success and many many users. It’s quite possible that there’s enough momentum to carry on indefinitely, but not at the grass roots level. Each new iteration gets tougher and tougher, because fundamentally it’s built on a house of cards (Python/numpy). Nevertheless, most companies and individuals are betting on this approach.
Approach 2 has gone surprisingly far with grass-roots support. Of course, S4TF was not started at that level, but its top contributors over the past couple years are surprisingly few in number, and yet they pulled off the upstreaming. (For those at Google Brain, I’m assuming they did this on their 20% time.)
The problem for both Julia and Swift is reaching enough critical mass to hit the mainstream. I agree right now it’s uncertain whether Swift is there, and perhaps it’s too complicated to continue without corporate support. As for Julia, the future is certainly bright. Both Zygote and Flux seem to consist of an absurdly small amount of code (based on casual observations), which is good for maintainability and extensibility. Even without huge corporate investment, Julia will continue to grow. Not sure if it will legitimately compete with Pytorch anytime soon though.
What should get one of Julia/Swift over the hump is the future. To oversimplify a tiny bit, all of computing is (or will be) optimization, and all of optimization benefits from differentiability. The next Google might be an outfit with a new application for differentiable, computational photography that blows everyone else out of the water. (Apologies for my naivete, these companies probably already exist.) Once this explodes, there will be a race to differentiate everything, and Approach 1 will just be too slow. I have no idea if this will be Julia or Swift or something else, but right now there are only two horses to bet on.
Advantage for Swift is that they can do AD on a phone, tablet, or PC, all with a manageable runtime. All that’s needed is one killer app, and it will take off.
Yeah, I definitely don’t want to be too negative here. I just can’t help but feel it was a mistake to bake the AD directly into the language implementation. The danger with this approach is that if a reason for the feature’s existence doesn’t materialize rather soon, the feature could end up getting removed.
On the other hand, Julia does not implement AD in the base language but instead has infrastructure for custom compiler hooks, which AD packages like Zygote plug into. This makes is so that the language level machinery has many diverse users and stakeholders for a large variety of interesting code transformation tools, not just AD.
E.g. probabilistic programming, code mocking, GPU / TPU compilation to name a few.