I find the
Flux model on CPU runs more slowly than that on GPU:
julia> m = Chain( Dense(2250, 500, σ), Dense(500, 50, tanh), Dense(50, 7, σ)); julia> X = Float32.(X); julia> size(X) (2250, 484) julia> @btime m(X); 10.718 ms (10 allocations: 2.06 MiB) julia> X_gpu = X |> gpu; julia> m_gpu = m |> gpu; julia> @btime m_gpu(X_gpu); 21.864 μs (134 allocations: 3.80 KiB)
And training the model on CPU was untolarably slow, and I am really confused.