I am working on an implementation of an ML algorithm called SLIDE (sub-linear deep learning engine) from an academic paper. I’ve included a link to a gist of the code: https://gist.github.com/outlace/c81a15ab77c1c9cebb56003545ee5512

Basically the algorithm tries to make a deep learning algorithm that scales sub-linearly with respect to the size of the parameter matrices of each layer. Normally a matrix multiplication of two `n x n`

matrices has a time complexity of O(n^2), but SLIDE proposes using a node sub-sampling scheme, so that given an `n x m`

matrix, where each row (or column, depending on how you setup the matmul) is a ‘node’ in the neural network, you sub-sample a small proportion of nodes (rows) `s`

and then do a matrix multiplication using this much smaller matrix `s x m`

where `s << n`

. If you increase the size of the matrix the sub-sampled number of rows `s`

will grow sub-linearly, so that your overall processing time grows sublinearly as you scale up the matrix.

Ok, so basically to implement this I just generate a list of indices of the nodes (rows) I want to use and then do the forward pass of my simple 2-layer fully connected neural network using these smaller matrices by doing a view/slice of the original parameter matrices during training. But it appears when Zygote/Flux is doing the backward pass to compute gradients it is using the full `n x m`

matrix and not my view/slice of it because the wall time is growing linearly as I increase the `n`

dimension, and it should be relatively constant as I am only sampling about 50 nodes regardless of the size of the original parameter matrix.

Is there anyway to get Zygote/Flux to do the backward pass using a view/slice so that I can reap the performance benefit? Or perhaps there’s an issue with my implementation. Any help is greatly appreciated.