Flux and cpu cores

johnbb · August 21, 2020, 10:53am

When I run the following code

using Flux
n = 100_000
p = 50
x = rand(Float32, p, n)
y = rand(Float32, n)    
trdata = Flux.Data.DataLoader(x, y, batchsize=100)
m = Chain(Dense(p, 100), Dense(100,100), Dense(100,1))
loss(x, y) = Flux.mse(m(x), y)
@time Flux.@epochs 10 Flux.train!(loss, Flux.params(m), trdata, Flux.ADAM())

on my laptop, all cores/threads are working at 100% which is somewhat surprising to me. Is this as expected? Threads.nthreads() returns 1. I use Julia 1.4.1 with Flux 0.10.4 on Ubuntu 18 with an Intel(R) Core™ i7-6600U (2 cores and 2 threads/core) CPU.

Suppose I want to train my model multiple times with different random initial weights, what would be the recommended way to do this?

Tomas_Pevny · August 21, 2020, 11:52am

It is because openblas is multi-threaded

johnbb · August 21, 2020, 1:14pm

Ok, thanks. Does this imply that I should avoid Threads.@threads for ... end in order to train several nets simultaneously?

Tomas_Pevny · August 26, 2020, 2:33pm

I think that if you use multi-threadding, it would automatically set the number of threads for OpenBlas to one. But in your case, it can still be a win.

Elrod · August 26, 2020, 2:35pm

You have to do this manually. (When comparing the charts, note the different scales on the x-axis!)

johnbb · August 27, 2020, 10:21am

Thank you for the comments. I will make a basic comparison in a few days.

johnbb · September 1, 2020, 9:29am

I ran a few tests which confirm that BLAS.set_num_threads(1) should be set. On my system the code ran ~10 times faster. Please see below for codes and results

# start Julia with JULIA_NUM_THREADS=1 julia
using Flux
using BenchmarkTools
using LinearAlgebra
n = 100_000
p = 50
x = rand(Float32, p, n)
y = rand(Float32, n)    
trdata = Flux.Data.DataLoader(x, y, batchsize=100)
m = [Chain(Dense(p, 100), Dense(100,100), Dense(100,1)) for i in 1:4]
@btime for i in 1:4
    loss(x, y) = Flux.mse(m[i](x), y)
    Flux.@epochs 1 Flux.train!(loss, Flux.params(m[i]), trdata, Flux.ADAM())
end
#  6.286 s (1992500 allocations: 2.24 GiB)

# start Julia with JULIA_NUM_THREADS=4 julia
using Flux
using BenchmarkTools
using LinearAlgebra
n = 100_000
p = 50
x = rand(Float32, p, n)
y = rand(Float32, n)    
trdata = Flux.Data.DataLoader(x, y, batchsize=100)
m = [Chain(Dense(p, 100), Dense(100,100), Dense(100,1)) for i in 1:4]
@btime Threads.@threads for i in 1:4
    loss(x, y) = Flux.mse(m[i](x), y)
    Flux.@epochs 1 Flux.train!(loss, Flux.params(m[i]), trdata, Flux.ADAM())
end
#  10.864 s (1992523 allocations: 2.24 GiB)  

# start Julia with JULIA_NUM_THREADS=4 julia
using Flux
using BenchmarkTools
using LinearAlgebra
BLAS.set_num_threads(1)
n = 100_000
p = 50
x = rand(Float32, p, n)
y = rand(Float32, n)    
trdata = Flux.Data.DataLoader(x, y, batchsize=100)
m = [Chain(Dense(p, 100), Dense(100,100), Dense(100,1)) for i in 1:4]
@btime Threads.@threads for i in 1:4
    loss(x, y) = Flux.mse(m[i](x), y)
    Flux.@epochs 1 Flux.train!(loss, Flux.params(m[i]), trdata, Flux.ADAM())
end
#  1.076 s (1992515 allocations: 2.24 GiB)

Topic		Replies	Views
How can I make Flux use all my CPUs? General Usage	10	3179	March 15, 2019
Flux multiple cores New to Julia question	0	297	November 9, 2020
Flux.jl and the state of multi-processing Machine Learning	2	1635	February 27, 2019
Flux parallel execution Machine Learning flux	3	2766	March 29, 2019
Why more BLAS threads take more time Performance threads	2	498	September 9, 2022

Flux and cpu cores

Related topics