Fast CNN inference

Fabrice_Rosay · December 13, 2020, 1:31pm

Hi,
I’m currently using Knet and Flux for doing alphazero like calculations. As the botlleneck is the many inferences needed i was wondering if it would be possible to wrap TensorRT like https://github.com/zerollzeng/tiny-tensorrt (the C++ part) and how hard it would be. If not is it plausible that specific CUDA kernels for the inference could bring some acceleration, perhaps using Float16 ?
Thanks.
To give an idea 90% of the time is spent by this function, in which 90% of the time is spent on π,v=m(KnetArray(batch)) where m is a residual network. The typical size input is 8x8x8x1600 (reversi, 200 games in parallel, batch size for parallel MCTS 8) and inference time is around 150 ms for 96 layers 10 block using GTX 1070:(

function (m::resnetwork)(x::Vector{GameEnv},squashing=1f0)
    l=size(x)[1]
    batch=zeros(Float32,(sizeInput...,l))

    @threads for k in 1:l
         @views decoder(x[k],batch[:,:,:,k])
    end
   π,v=m(KnetArray(batch))
   π=softmax(squashing .*π)
   Array(π),Array(v)
end

ToucheSir · December 13, 2020, 7:32pm

Have you tried just keeping everything on CPU? That input size isn’t terribly massive, so I wonder if the back-and-forth transfer is worth the latency.

Fabrice_Rosay · December 14, 2020, 7:50am

I tried it is at least a hundred times slower. I don’t know of any reasonable implementation that does not use gpu, that is if you try anything bigger than tic tac toe.

maleadt · December 14, 2020, 8:01am

Wrapping C++ is tricky, and TensorRT doesn’t seem to have an C API either, so that won’t be easy. Maybe you can PyCall the TensorRT Python bindings.

Topic		Replies	Views
Flux on gpu and inference optimization GPU	2	344	January 17, 2023
ANN: Knet 1.4.0: accelerating CuArrays Machine Learning	26	2994	September 15, 2020
Knet.jl CNN Tutorial Speed Machine Learning performance , knet	3	1689	December 12, 2017
Knet vs MXNet for programmer new to ML Machine Learning knet	25	6711	October 6, 2018
Knet vs Flux etc Machine Learning	10	5639	November 9, 2018

Fast CNN inference

Related topics