Flux cpu() type stability

While reviewing a loss function I wrote, I noticed that the cpu() function from Flux produced type instability:

julia> using Flux

julia> @code_warntype cpu(rand(10))
MethodInstance for Flux.cpu(::Vector{Float64})
  from cpu(x) in Flux at ~/.julia/packages/Flux/qAdFM/src/functor.jl:146
Arguments
  #self#::Core.Const(Flux.cpu)
  x::Vector{Float64}
Locals
  #136::Flux.var"#136#137"
Body::Any
1 ─      (#136 = %new(Flux.:(var"#136#137")))
│   %2 = #136::Core.Const(Flux.var"#136#137"())
│   %3 = Flux.fmap(%2, x)::Any
└──      return %3

This propagated throughout the loss function, which probably isn’t good, but I haven’t benchmarked it yet. Is this a known issue? I’m using Flux v0.12.9 just fyi.

Thanks in advance for any insight!

1 Like

Did a quick check, also on Flux 0.12.9, and it seems to have some effect when the calculation cost is not too big. Running for larger matrices seems to diminish the difference. Couldn’t find anything by a quick search in the issues or PRs in github, so maybe worth adding there?

julia> a = randn(5, 5);

julia> b = randn(5, 5);

julia> f1(a, b) = a * b
f1 (generic function with 2 methods)

julia> f2(a, b) = a * cpu(b)
f2 (generic function with 2 methods)

julia> @code_warntype f1(a, b)
MethodInstance for f1(::Matrix{Float64}, ::Matrix{Float64})
  from f1(a, b) in Main at REPL[39]:1
Arguments
  #self#::Core.Const(f1)
  a::Matrix{Float64}
  b::Matrix{Float64}
Body::Matrix{Float64}
1 ─ %1 = (a * b)::Matrix{Float64}
└──      return %1


julia> @code_warntype f2(a, b)
MethodInstance for f2(::Matrix{Float64}, ::Matrix{Float64})
  from f2(a, b) in Main at REPL[40]:1
Arguments
  #self#::Core.Const(f2)
  a::Matrix{Float64}
  b::Matrix{Float64}
Body::Any
1 ─ %1 = Main.cpu(b)::Any
│   %2 = (a * %1)::Any
└──      return %2


julia> @btime f1($a, $b)
  190.318 ns (1 allocation: 256 bytes)
5×5 Matrix{Float64}:
  0.473592   0.450893  -2.49009   1.48155     4.01978
 -0.636979   0.493621  -1.71473   0.0855036   3.45803
  0.806991   1.10065   -3.76954   2.63519     4.33381
  1.01176    0.280512  -1.57056   2.21027    -0.41367
 -1.80214   -1.33488    4.65632  -3.00634    -1.24369

julia> @btime f2($a, $b)
  298.087 ns (3 allocations: 592 bytes)
5×5 Matrix{Float64}:
  0.473592   0.450893  -2.49009   1.48155     4.01978
 -0.636979   0.493621  -1.71473   0.0855036   3.45803
  0.806991   1.10065   -3.76954   2.63519     4.33381
  1.01176    0.280512  -1.57056   2.21027    -0.41367
 -1.80214   -1.33488    4.65632  -3.00634    -1.24369
1 Like

Worth checking against an old version (v0.11, v0.12.3) and master as well. Most times it shouldn’t be that cpu is required to be called very frequently (esp in code that needs to be differentiated). It might be that for small enough arrays the instability increases inference cost at runtime. The operations would need to be runtime performance competitive when the smaller array is on the GPU then I suppose.

1 Like

I checked v0.12.3 and got the same result. It’s honestly not a big deal because I can hard-code the type to Float32 for now. In most cases I think you’d be right about cpu() not being called often, but I was following this example about variational autoencoders where cpu() or gpu() is called every time the loss function is so it might have some impact there. I’ll raise an issue on their github just as an fyi. Thank you for the help!