Flux: GPU slower than CPU?

question
performance
flux

#1

I have a loss function which I’ve defined using Flux. On my boring old CPU on one batch, it performs like this:

julia> @time loss(x, y)
  0.071089 seconds (402 allocations: 20.235 MiB, 4.37% gc time)
1.3092503770133925 (tracked)

When running on a machine with a fancy GPU, it performs like this:

julia> @time loss(x, y)
  1.685316 seconds (1.24 M allocations: 61.493 MiB, 0.59% gc time)
2.6197212f0 (tracked)

(and yes, I have run the precompiler before testing in both cases). What’s going on here? Why does the GPU version allocate 3000x the amount of memory and take 25x longer?


#2

What is the loss function here?


#3

I’m using the logitcrossentropy function that comes with Flux


#4

Can you post a reproducible example?


#5

Here goes:

using Flux
# using CuArrays if you are on the GPU
d = Dense(50, 10) |> gpu
n = 50000
x = rand(50, n) |> gpu
hcat([[i == j for i = 1:10] for j = rand(1:10, n)]...) .* 1. |> gpu
loss(a, b) = Flux.logitcrossentropy(d(a), b)

CPU:

julia> @time loss(x, y)
  0.331867 seconds (139.06 k allocations: 77.120 MiB, 35.57% gc time)
2.4550202681078717 (tracked)

GPU:

julia> @time loss(x, y)
 24.348809 seconds (17.95 M allocations: 892.661 MiB, 0.62% gc time)
2.5094972f0 (tracked)

[edited to remove the prompt to allow for easier copying]


#6

Also for what it’s worth:

thing I am using version of that thing
julia 0.6.4
Flux 0.5.4
CuArrays 0.6.2
ubuntu 16.04
gpu tesla k80
cuda 9.2

#7

Still stumped. Is it an issue with my setup? Can anyone else reproduce?


#8

Flux is great but many of its operations have not yet been optimized for GPU. Such as the issue https://github.com/FluxML/Flux.jl/issues/189