Sorry for lack of context. I just wanted to avoid superflous information (and I’m also a little bit worried that people will find what I’m trying to do too silly and just ignore me ).
I’m writing a little swiss-army knife kinda library of mutation operations for neural networks and I’m thinking that one of the tools should be to change the shape of the filter kernels in convolutional layers.
The weights of a 2D conv-layer is typically structured as a 4D array and in this case (flux) the first two dimensions are the kernel sizes. So given an array of size
w,h,nout,nin i want to change it into a new array of size
w',h',nout,nin where the mapping from
w',h' can be decided through some user defined policy, hopefully with a not too clunky API.
I guess this also answers @StefanKarpinski’s question about output having the same number of elements (yes, it is fine to crash if the policy does not always select the same number of elements).
I know it is probably a bad idea to try to do that to an already trained model, but I want to be able to play with evolving a population of models while they are training (which in itself might be a bad idea ofc), possibly even learning which operations “work” and which doesn’t.
I haven’t given much thought about what would be the least damaging way to shrink the kernel size, but I want to try to something like starting from the edges and remove the row/col with the smallest abssum for each kernel. This might very well be worse than just picking the one edge with lowest abssum across all kernels as it will mess up the phasing between kernels which probably means something to the activation, but I want to try it out.
Anyways, given how many nice convenience things Julia has for working with arrays, I was first hoping that there might be some CartesianIndex magic or similar which could do this for me. If the answer given the above is to preallocate the array and write to it in a for loop I think I can work it out.
I’m also aware that conv-layers seldom are large enough for an inefficient algorithm to be a significant problem (in the context I’m talking about here). I’m just out to learn something (and maybe to avoid the perhaps unlikely embarassment of someone pointing out a stupidly inefficient piece of code in my repo).