Flux convolutional layer not type stable

LeoKoo · May 14, 2020, 9:55am

In my spare time I am trying to implement YOLOv3 using Flux. I believe I’ve successfully defined the model, leaving me to find a way to translate the YOLOv3 pretrained weights into Flux. However, I have noticed, using random weights and input, that the the network is extremely slow on CPU, even with small inputs. I attribute this to the fact that Flux’s convolutional layers are not type-stable, on Julia 1.4.1, Flux 0.10.5:

using Flux
using BenchmarkTools
# small image batch
imgbatch = rand(Float32, 416, 416, 3, 2)
conv_layer = Conv((3,3), 3=>32)
@btime conv_layer($imgbatch);
#33.423 ms (75 allocations: 154.31 MiB) on my machine
@code_warntype conv_layer(imgbatch)
# results in:
Variables
  c::Conv{2,4,typeof(identity),Array{Float32,4},Array{Float32,1}}
  x::Array{Float32,4}
  #102::Flux.var"#102#103"
  σ::typeof(identity)
  b::Array{Float32,4}
  cdims::DenseConvDims{2,_A,_B,_C,_D,_E,_F,_G} where _G where _F where _E where _D where _C where _B where _A

Body::Any
1 ─ %1  = Base.getproperty(c, :σ)::Core.Compiler.Const(identity, false)
│   %2  = Base.getproperty(c, :bias)::Array{Float32,1}
│   %3  = Core.tuple(%2)::Tuple{Array{Float32,1}}
│         (#102 = %new(Flux.:(var"#102#103")))
│   %5  = #102::Core.Compiler.Const(Flux.var"#102#103"(), false)
│   %6  = Base.getproperty(c, :stride)::Tuple{Int64,Int64}
│   %7  = Flux.map(%5, %6)::Core.Compiler.Const((1, 1), false)
│   %8  = Core.tuple(Flux.:(:), 1)::Core.Compiler.Const((Colon(), 1), false)
│   %9  = Core._apply_iterate(Base.iterate, Flux.reshape, %3, %7, %8)::Array{Float32,4}
│         (σ = %1)
│         (b = %9)
│   %12 = (:stride, :padding, :dilation)::Core.Compiler.Const((:stride, :padding, :dilation), false)
│   %13 = Core.apply_type(Core.NamedTuple, %12)::Core.Compiler.Const(NamedTuple{(:stride, :padding, :dilation),T} where T<:Tuple, false)
│   %14 = Base.getproperty(c, :stride)::Tuple{Int64,Int64}
│   %15 = Base.getproperty(c, :pad)::NTuple{4,Int64}
│   %16 = Base.getproperty(c, :dilation)::Tuple{Int64,Int64}
│   %17 = Core.tuple(%14, %15, %16)::Tuple{Tuple{Int64,Int64},NTuple{4,Int64},Tuple{Int64,Int64}}
│   %18 = (%13)(%17)::NamedTuple{(:stride, :padding, :dilation),Tuple{Tuple{Int64,Int64},NTuple{4,Int64},Tuple{Int64,Int64}}}
│   %19 = Core.kwfunc(Flux.DenseConvDims)::Core.Compiler.Const(Core.var"#Type##kw"(), false)
│   %20 = Base.getproperty(c, :weight)::Array{Float32,4}
│         (cdims = (%19)(%18, Flux.DenseConvDims, x, %20))
│   %22 = σ::Core.Compiler.Const(identity, false)
│   %23 = Base.getproperty(c, :weight)::Array{Float32,4}
│   %24 = Flux.conv(x, %23, cdims)::AbstractArray{yT,4} where yT
│   %25 = Base.broadcasted(Flux.:+, %24, b)::Any
│   %26 = Base.broadcasted(%22, %25)::Any
│   %27 = Base.materialize(%26)::Any
└──       return %27

I have already opened an issue on Flux’s github here, because right now it takes almost 4 seconds to run my YOLOv3 implementation on the CPU, with rand(Float32, 416, 416, 3, 2) as input. Since convolutional layers are such a staple of deep learning I think fixing this issue is quite important, which is why I am writing this now. Maybe I am missing something? I am quite new to Julia.

Additionally, the memory requirements on GPU are massive. Using Chain where possible and only saving outputs when needed for the shortcut and route layers, my 2GB (not a lot I am aware) GPU is already full during inference on CuArray(rand(Float32, 416, 416, 3, 2)) as input. Using PyTorch I have an implementation that allows me to actually train YOLOv3, even on my small GPU albeit a rather small batch size as well. What are the ways of improving the memory requirements in Flux?

Topic		Replies	Views
Flux cpu() type stability Machine Learning question , flux	3	627	February 18, 2022
Flux Dense Layer Type Instability Machine Learning	7	373	May 11, 2023
`Conv` is 2x slow than pytorch `Conv` on cpu Machine Learning	9	2021	September 6, 2020
Type Stability Help Performance type-stability	5	266	February 6, 2023
Flux runs out of memory Machine Learning memory-allocation , flux	25	4325	June 1, 2023

Flux convolutional layer not type stable

Related topics