Flux.jl and the state of multi-processing

I try to implement a convolutional auto encoder in Flux.
The model is composed essentially of several convolutional layers, max pool, interpolation to the nearest neighborhood (using repeat(...; inner = (2,2,1,1)). The issue is that the later is not yet implemented to run on a GPU (https://github.com/JuliaGPU/GPUArrays.jl/pull/126).

However, if I try to use Flux.jl on a CPU, only one CPU is used despite setting JULIA_NUM_THREADS before starting Julia (as suggested here: Flux parallel execution):

# ...

Is this the correct approach?
Does Flux.jl support multi-processing? Or does it only rely on threaded implementations for BLAS for dens layers which are not of use for convolutional layers?

Is there any other machine learning framework able to use multiple CPUs in Julia?
Thanks a lot for any insights!


AFAIK, (almost) no julia code used by Flux is using multithreading, all parallelization comes from e.g. BLAS.


Thanks a lot for confirming.