I definied the same neural network on CPU and GPU but the GPU version is slower, and it’s not due to data transfer as I have already cu
ed the data.
Can anyone shed light on this? I am on Julia 1.5rc1 and Flux 0.11 and CUDA v1.1.0 and my GPU is a reasonable RTX2080
# in this attempt, I shall play a set of games with the policy and move the policy using
# the data
using Game2048
using Flux, CUDA
policy = Chain(
board->reshape(board, 4, 4, 1, 1),
Conv((2,2), 1 => 256, relu),
Conv((2,2), 256 => 128, relu),
Conv((2,2), 128 => 64, relu),
Dense(64, 4),
) |> gpu
board = initboard()
cb = cu(Float32.(board))
using BenchmarkTools
@benchmark policy(cb)
# BenchmarkTools.Trial:
# memory estimate: 60.33 KiB
# allocs estimate: 1421
# --------------
# minimum time: 418.799 μs (0.00% GC)
# median time: 470.701 μs (0.00% GC)
# mean time: 520.493 μs (2.57% GC)
# maximum time: 36.679 ms (43.01% GC)
# --------------
# samples: 9507
# evals/sample: 1
policy_cpu = Chain(
board->reshape(board, 4, 4, 1, 1),
Conv((2,2), 1 => 256, relu),
Conv((2,2), 256 => 128, relu),
Conv((2,2), 128 => 64, relu),
Dense(64, 4),
board = initboard()
fb = Float32.(board)
@benchmark policy_cpu(fb)
# BenchmarkTools.Trial:
# memory estimate: 153.36 KiB
# allocs estimate: 211
# --------------
# minimum time: 232.399 μs (0.00% GC)
# median time: 295.501 μs (0.00% GC)
# mean time: 310.653 μs (2.14% GC)
# maximum time: 4.700 ms (91.46% GC)
# --------------
# samples: 10000
# evals/sample: 1