I have some code where I compute what essentially boils down to the following:
di = zeros(n,knn)
for i=1:n
di[i,:] = d[idx[i]]
end
Now the problem is that d is a tracked array, and hence it cannot be put inside di which is not tracked.
But how would I go about actually doing such an operation?
I seem to remember in Mike Innes talk at Juliacon that tracked arrays are no longer needed in the latest branch of FLuxML and/or Zygote. I could very well be blathering.
What is d? Here’s one possible interpretation, you can’t mutate a tracked array but you can construct one by indexing:
d = param(rand(2,3)) # TrackedMatrix
idx = [1,3,2,1];
di2 = d[:,idx] # Tracked
di1 = zeros(2,4); # almost your code
for i=1:4
di1[:,i] = d[:,idx[i]].data
end
di1 # same numbers, but not tracked
di3 = d.data[:,idx] # ditto
x is the input data #cifar10 images for instance
y = F(x,theta) #F is the neural network, and y is the output of x through the network
d is built from the output data, hence:
d=g(y)
and is used to create my regularization.
I need to be able to track all steps in the creation of the regularization, since this is one of the things we are testing in the research article I’m currently writing.
Regarding your suggestion for the tracking mcabbott, I’m just worried about two things:
will it preserve the chain of tracking?, it is fundamental that the tracking goes all the way back to the data.
What exactly happens when you just overwrite a tracked parameter like what you are suggesting? is the rand value in some way influential or is that no longer tracked?
Yes, di2 is tracked. You can check it’s working by calling back!(di2[1,1]) and seeing that d.grad is nonzero. (Or by wrapping this up in a function and calling gradient.)
di1 is not tracked, it reads just the .data part, which is an ordinary array. But it should have the same numbers as di2. If you were ever to write into the .data part of a tracked array, then strange things will happen, don’t do this! You have to find ways to work without writing into a fresh array with a loop, i.e. like di2 instead. (Or else you have to write a gradient for this step yourself.)
d=rand(...) is just a convenient way of making some numbers to try out. Really this should be your g(y).