I try to implement a convolutional auto encoder in Flux.
The model is composed essentially of several convolutional layers, max pool, interpolation to the nearest neighborhood (using repeat(...; inner = (2,2,1,1)
). The issue is that the later is not yet implemented to run on a GPU (GPU support for Base.repeat by americast · Pull Request #126 · JuliaGPU/GPUArrays.jl · GitHub).
However, if I try to use Flux.jl on a CPU, only one CPU is used despite setting JULIA_NUM_THREADS
before starting Julia (as suggested here: Flux parallel execution):
export JULIA_NUM_THREADS=2
julia
# ...
Is this the correct approach?
Does Flux.jl support multi-processing? Or does it only rely on threaded implementations for BLAS for dens layers which are not of use for convolutional layers?
Is there any other machine learning framework able to use multiple CPUs in Julia?
Thanks a lot for any insights!