I find the Flux
model on CPU runs more slowly than that on GPU:
julia> m = Chain(
Dense(2250, 500, σ),
Dense(500, 50, tanh),
Dense(50, 7, σ));
julia> X = Float32.(X);
julia> size(X)
(2250, 484)
julia> @btime m(X);
10.718 ms (10 allocations: 2.06 MiB)
julia> X_gpu = X |> gpu;
julia> m_gpu = m |> gpu;
julia> @btime m_gpu(X_gpu);
21.864 μs (134 allocations: 3.80 KiB)
And training the model on CPU was untolarably slow, and I am really confused.