Flux cpu() type stability

Did a quick check, also on Flux 0.12.9, and it seems to have some effect when the calculation cost is not too big. Running for larger matrices seems to diminish the difference. Couldn’t find anything by a quick search in the issues or PRs in github, so maybe worth adding there?

julia> a = randn(5, 5);

julia> b = randn(5, 5);

julia> f1(a, b) = a * b
f1 (generic function with 2 methods)

julia> f2(a, b) = a * cpu(b)
f2 (generic function with 2 methods)

julia> @code_warntype f1(a, b)
MethodInstance for f1(::Matrix{Float64}, ::Matrix{Float64})
  from f1(a, b) in Main at REPL[39]:1
Arguments
  #self#::Core.Const(f1)
  a::Matrix{Float64}
  b::Matrix{Float64}
Body::Matrix{Float64}
1 ─ %1 = (a * b)::Matrix{Float64}
└──      return %1


julia> @code_warntype f2(a, b)
MethodInstance for f2(::Matrix{Float64}, ::Matrix{Float64})
  from f2(a, b) in Main at REPL[40]:1
Arguments
  #self#::Core.Const(f2)
  a::Matrix{Float64}
  b::Matrix{Float64}
Body::Any
1 ─ %1 = Main.cpu(b)::Any
│   %2 = (a * %1)::Any
└──      return %2


julia> @btime f1($a, $b)
  190.318 ns (1 allocation: 256 bytes)
5×5 Matrix{Float64}:
  0.473592   0.450893  -2.49009   1.48155     4.01978
 -0.636979   0.493621  -1.71473   0.0855036   3.45803
  0.806991   1.10065   -3.76954   2.63519     4.33381
  1.01176    0.280512  -1.57056   2.21027    -0.41367
 -1.80214   -1.33488    4.65632  -3.00634    -1.24369

julia> @btime f2($a, $b)
  298.087 ns (3 allocations: 592 bytes)
5×5 Matrix{Float64}:
  0.473592   0.450893  -2.49009   1.48155     4.01978
 -0.636979   0.493621  -1.71473   0.0855036   3.45803
  0.806991   1.10065   -3.76954   2.63519     4.33381
  1.01176    0.280512  -1.57056   2.21027    -0.41367
 -1.80214   -1.33488    4.65632  -3.00634    -1.24369
1 Like