Going beyond a single core execution with FluxML (no CUDA)

I tried a few examples from FluxML model-zoo and only one core is getting fully utilized on my 4 core Intel CPU with integrated GPU. Is there any cure for that, other than buying Nvidia GPU?

Optimally, I would like to have OpenCL support, which could use CPU and iGPU in my PC, but AFAIK, CLArrays are not ported to Julia 1.x yet. Is there any way to at least use all CPU cores in the meantime?

The CPU convolutions and maxpool layers implemented in Julia is AFAIU single threaded. https://github.com/FluxML/NNlib.jl/pull/67 might be worth having a look at.

I really don’t think training ML models on CPU is worth pursuing. The performance difference with GPU is too stark.

@xiaodai You are right in general case, but my models are small and my purpose is far from mainstream. It’s not a big problem, I just don’t want to miss an existing solution. Ultimately, I will use CLArrays when ready.