Cheers. I have developed a model for semantic segmentation in Flux. It has around 7M parameters. Have checked its inference performance for segmenting two classes, when an array with size (512,512,3,1) is at the input.
The outcome from BenchmarkTools.jl is shown at the below table for the same pc. In the first row, gpu is disabled. It really calls the attention the memory figure in the GB range for the cpu case, while the gpu case is in KB case. I wonder if BenchmarkTools is not considering GPU memory for the metric? For the CPU case, does the metric mean the model is too expensive for running on limited devices such as IoT application processors?
As a side question: its speed on gpu is 2-3X lower than equivalent models found elsewhere on GitHub. Have already applied many recommendations to reduce allocation, with little improvement. Any hint on where to look at for improvement tips is welcome.
Maybe, maybe not. Equally if not more important than the total amount of memory allocated could be the maximum memory the model uses at any given point in time.
Side note, are you aware of the existence of Machine Learning - Julia Programming Language ? Posts that are in too general a category can get lost, because some people only check specific categories frequently.