Why do some Flux models train in parallel but not others?

After actually running these models locally, turns out it was the simplest answer and I was barking up the wrong tree :slight_smile:

By default, Julia allocates but a single thread to the default thread pool (you can check with Threads.nthreads()). Because Flux conv layers use this thread pool, they end up running (mostly, more on that below) single-threaded. To make Julia use multiple threads, either pass -t [nthreads] or -t auto at startup. If you’re using VS Code, this is also exposed via the “Julia: Num Threads” option.

Now if conv layers are running single threaded, why does model1 appear to use multiple? That’s because the matrix multiplication calls in Dense layers use a separate, BLAS threadpool which is >1 by default (you can check this with using LinearAlgebra; BLAS.get_num_threads()). Because model1 has two large dense layers to model2’s one small one, it spends a lot more time here and thus a lot more time in multi-threaded code. Conv layers also use matmuls under the hood, but these are generally smaller and need the aforementioned default thread pool for any significant parallelism.

3 Likes