What's the status of image convolutions on CPU & GPU?

I’ve thought about this, but the output of deeper networks often has too many channels(e.g. a 32x32x2048 array), which means the vector object is of 2048 long, I guess it’s not a good idea to fit this into a StaticVector.

PS: For people who may not be familiar with deep learning networks, here is a visualization app called netscope in which you could check out the dimension of the input and output of each layer. Popular networks are listed at the end of the page.