How to efficiently evaluate a Flux.jl neural network millions of times on the GPU?

You want to input a batch into the neural network, so your input is y=CUDA.ones(32, N), where N is the number of inputs you want to process in parallel. This will be the easiest way to parallelise the execution. You should get an output matrix which is 31 by N.

2 Likes