Hi All,
I have made a trio of packages for neural architecture search, mostly because I fell in love with the language and couldn’t think of anything more useful to do which wasn’t already done (and maybe a little bit because I’m oddly amused by the thought of a computer doing crazy stuff while I sleep).
The first member is GitHub - DrChainsaw/NaiveNASlib.jl: Relentless mutation!!
Despite NAS being part of the name, it does not do any architecture search. In fact, it does not even do neural networks (by itself). What it does do (and does quite well I must say) is figuring out how to modify the structure of a neural network so that all matrix dimensions stay consistent with each other. It also (to the largest extent possible) ensures that output neurons from one layer stay connected to the same input neurons of the next layer as before a size change.
This does perhaps not sound very impressive when thinking about simple architectures which just stack layers on top of each other. However, things get out of hand quickly when one adds elementwise operations (e.g as used in residual connections) and concatenations (e.g as used in inception modules) in the mix.
Here is the list of supported operations:
Change the input/output size of vertices
Parameter pruning/insertion (policy excluded)
Add vertices to the graph
Remove vertices from the graph
Add edges to a vertex
Remove edges to a vertex
These operations are sometimes useful even outside of the NAS context, for example when doing transfer learning or network compression (although I guess one can argue the these are some form of lightweight NAS).
Perhaps interesting to note is that all the python libraries I peeked at seemed to do this by recursively traversing the computation graph, propagating the size changes to each visited vertex. Maybe it is because I suck at graphs (and at reading python code, f**n write-only-language), but (despite some pretty deep level of sophistication) I could not get this approach to survive more than a couple of generations in the hands of an aggressive genetic algorithm.
Luckily enough my new friend JuMP (and Cbc) came to the rescue and after formulating the size change and pruning task as a MILP problem I did not see any more failures. As a bonus, it allowed me to radically shrink the code base as well. I have not seen this approach being used anywhere else so I guess this might be the only novel contribution of this work.
Next member is GitHub - DrChainsaw/NaiveNASflux.jl: Your local Flux surgeon
This package is simply dressing up the layers defined in Flux to use with the mutation capabilities of NaiveNASlib. It knows things like how to remove/add inputs to each of the layers defined in Flux through what is just a manual annotation of what dimensions of the weight arrays are input and output. It also does some generic but still implementation dependent stuff like hooking into gradients to compute pruning metrics.
Last member is GitHub - DrChainsaw/NaiveGAflux.jl: Evolve Flux networks from scratch!
This one actually does neural architecture search (despite ironically not having NAS in the name) including the obligatory model = fit(data)
gimmick. It does so through genetic optimization (a.k.a the bogosort of optimization) so it might not be the best thing one can make out of the other two packages.
I originally intended to use this package only to do usability and mutation testing (yeah I know this usually means something else) of the other two and didn’t even think I would release it. After a while it dawned to me that despite my efforts to make NaiveNASlib very easy to use (and it really is, please try it out!), there is still a significant amount of code needed to do the actual architecture search, even for something as simple as genetic optimization. As I figured people might be hesistant to try the other libs if there is not a single working example of how to actually use it for what it is intended to do, I decided to release it (with a disclaimer that it is only meant as an example).
It shall also be noted that while the first two are released as version 1.x.x, NaiveGAflux is preliminary. Most notably I have not tried to use it for anything else than image classification and that is the only thing supported by the fit
method. Suggestions on example problems in other domains are more than welcome, especially PRs!
In its current state, NaiveGAflux is probably more suited to people who want to mess around with genetic algorithms rather than people who want to outsource the model building to a computer.
Oh well, thats it. Let the tickets flow or the tumbleweed roll!!
QnA (Questions nobody Asked):
Why are the packages prefixed with the word Naive?
- Well, given that these is my first Julia packages there are probably more than a couple newbie mistakes when it comes to the design. I wanted to keep the names NAS.jl and/or AutoFlux.jl free for when someone like Mike Innes gets around to hack everthing you need into the compiler.
What kind of accuracy can one expect from NaiveGAflux?
- I have not made any large effort to tune the (hyper-)hyper-parameters, nor have I investigated how good it is at converging. It seems to be able to get about 10% test error rate on CIFAR10 after less than 100 generations (where one generation is trained on 400 batches with 32 examples per batch) using a population of 50 models. I have not investigated how reproducible this is or if it will get a better accuracy if allowed to run longer.
Is it really feasible to use the MILP formulation?
- I was indeed worried that the method would be impractical as the solver could randomly get stuck on “hard architectures”, but after som light experiments with very “transparent” architectures of up to 10000 layers my worries where relieved. I have not seen it take more than a fraction of a second in “normal” use which should be insignificant in comparison to the time it takes to train the model. Should problems surface I guess as a last resort one could just set a time limit for the solver and treat a time out in the same manner as infeasible is treated.
Are you aware that many cool state of the art NAS methods (e.g. DARTS, NEAT, DNW) don’t really use the operations provided by NaiveNASlib?
- Yes. Yet another reason why I didn’t want to call it NASlib.jl I guess.
Does it do Neural ODEs or more advanced control structures?
- Not easily as of now unfortunately. There is an issue with some suggestions posted for NaiveNASlib. Feel free to submit a PR for it