rkat
August 9, 2018, 9:25am
1
I have a loss function which I’ve defined using Flux. On my boring old CPU on one batch, it performs like this:
julia> @time loss(x, y)
0.071089 seconds (402 allocations: 20.235 MiB, 4.37% gc time)
1.3092503770133925 (tracked)
When running on a machine with a fancy GPU, it performs like this:
julia> @time loss(x, y)
1.685316 seconds (1.24 M allocations: 61.493 MiB, 0.59% gc time)
2.6197212f0 (tracked)
(and yes, I have run the precompiler before testing in both cases). What’s going on here? Why does the GPU version allocate 3000x the amount of memory and take 25x longer?
What is the loss function here?
rkat
August 9, 2018, 9:45am
3
I’m using the logitcrossentropy
function that comes with Flux
Can you post a reproducible example?
rkat
August 9, 2018, 10:07am
5
Here goes:
using Flux
# using CuArrays if you are on the GPU
d = Dense(50, 10) |> gpu
n = 50000
x = rand(50, n) |> gpu
hcat([[i == j for i = 1:10] for j = rand(1:10, n)]...) .* 1. |> gpu
loss(a, b) = Flux.logitcrossentropy(d(a), b)
CPU:
julia> @time loss(x, y)
0.331867 seconds (139.06 k allocations: 77.120 MiB, 35.57% gc time)
2.4550202681078717 (tracked)
GPU:
julia> @time loss(x, y)
24.348809 seconds (17.95 M allocations: 892.661 MiB, 0.62% gc time)
2.5094972f0 (tracked)
[edited to remove the prompt to allow for easier copying]
rkat
August 9, 2018, 10:24am
6
Also for what it’s worth:
thing I am using
version of that thing
julia
0.6.4
Flux
0.5.4
CuArrays
0.6.2
ubuntu
16.04
gpu
tesla k80
cuda
9.2
rkat
August 10, 2018, 12:38am
7
Still stumped. Is it an issue with my setup? Can anyone else reproduce?
Flux is great but many of its operations have not yet been optimized for GPU. Such as the issue https://github.com/FluxML/Flux.jl/issues/189