Swift for Tensorflow rationale


This is interesting and is reminiscent of Julia’s approach in flux.

What seems divergent is the lack of multiple dispatch (+1 for Julia) and the static memory analysis/guarantees (+1 for swift).

Any thoughts?

tagging @MikeInnes in case he doesn’t see this post given that it’s in the off topic section.


The topic of programming languages for machine learning was extensively discussed here: On Machine Learning and Programming Languages


I’m excited to see what the Swift folks come up with here. We’re obviously targeting a similar set of problems, and there are more than enough differences in philosophy and design choices to keep things interesting.

Since our blog last December, several “ML as programming”-style frameworks have arrived on the scene; Swift/TF being one, but also Myia and Fluid. Where the original frameworks were built by practising ML researchers filling their own needs (Theano, Torch, Chainer), and second-wave ones were largely industrialised clones (TensorFlow, PyTorch), this third generation increasingly being built by compiler and languages people, and it looks very, very different to what came before.

One thing that’s notably missing from the landscape is a truly new language for ML. The attempts in Swift and Julia are beginning to highlight the semantics and engineering challenges and tradeoffs involved, and I expect we’ll see much more on that front over the next few years. Exciting times!


It’s interesting also that they had considered Julia also, and had many kind things to say about it. :grinning:
Swift (and Rust) are also two languages of interest to me.


In what ways do they seem different? Are user-friendliness or expressiveness among them?

Perhaps you’ve answered this in the blog post?


Isn’t Julia that new language? Or do you foresee the need for something else?


Yeah exactly. When I saw the presentation about swift for Tensorflow https://www.youtube.com/watch?v=Yze693W4MaU, it reminded me of that blog post


I’m also interested in Swift. Both for general programming and for data science/ML. So far the data science/ML part outside of iOS/MacOS (which has coreML/Accelerate etc) has been very weak (apart from some hobby projects) but TF for Swift could change all that. It’s exciting to see a modern statically typed AOT compiled native language in the data science/ML space. I’ve been wanting this for years. I’ll keep a close eye on it.

The strong interop with Python they are building also seems exciting (especially for data science/ML).


Absolutely. The main difference right now is providing an intuitive programming model while also being able to take advantage of optimisations and new hardware accelerators easily. There’s been less in the way of new PL features that support ML so far, but projects like Myia have a good opportunity to start exploring that area.

As existing languages go, Julia and Swift are by far the best suited, but they’re ultimately still general-purpose languages that weren’t designed with ML in mind. They inherently bring engineering challenges and expressiveness issues that something more specialised might not have.

An engineering example – The Swift docs give a good idea of the challenges involved in extracting TensorFlow-compatible graphs from a program. It sounds like it should be pretty easy to turn m = Chain(Dense(10, 5, relu), ...) into a graph, for example, until you realise that a model might do m.layers[3] = Dense(...) halfway through the forward pass.

While these things are solvable, mutable data structures causes a lot of issues here as well as with AD and other optimisations, and are not even necessary for the way people code against data frames and GPUs. A new language could easily have a functional data model and simplify things hugely.

For an expressiveness issue, consider my ideal definition of the Dense (FullyConnected) layer:

Dense(in, out) = x -> W * x .+ b
  where W = randn(out, in), b = randn(out)

The Flux docs actually introduce layering this way, but in real layers we have to define a struct and a bunch of boilerplate. To actually make it work we need to be able to treat closures as data structures (to move them to the GPU, for example) and perhaps have nicer ways to name and refer to closures based on where they came from (i.e. not (::#9)). These really seem like general language-level features that just happen not to supported anywhere.

ML has a bunch of cases like this, where certain patterns seem unusual or even downright backwards from the traditional software engineering standpoint, but turn out to be valid use cases that just aren’t prioritised by mainstream languages. Novel abstractions and language features could help hugely here, which is part of what makes the field exciting for PL people.

split this topic #10

5 posts were split to a new topic: Swift string handling


Is this something you are working on lately? Could Julia handle this via metaprogramming? (i.e. embed a ML specific DSL using macro).


I’m not sure…seems like it’s a type system matter, per that use of the where clause.

@MikeInnes, thank you for the explication. I’m also curious if anything is planned for these ML specific facilities.