I have trained a Flux Chain with Dense layers. I would like to determine the null space (the space of input vectors for which the output is zeros(s) ) for this Chain. I am looking for a Julia like way to determine the projection matrix from this chain. However, I am open to other approaches to determine this.
Hmm, interesting. I wonder how this is done in other languages.
So you are solving
f(x) = 0 where f is your neural network of dense layers.
So
act(M_n*...*act(M_3*act(M_2*act(M_1*x)))) = 0
I never thought this would be easy anywhere. I would run like a optimization algorithm
f(x)^2 and try to get it to 0 by starting at random x I can find a new points, but I don’t if it’s possible to find a “space”. Sounds very hard.
using Flux, Optim
m = Chain(
Dense(2, 8, relu),
Dense(8, 8, relu),
Dense(8, 8)
)
loss(x) = sum(x->x^2, m(x))
function make_null_sample(n)
null_space_sample = Vector{Float32}[]
for i in 1:n
result = optimize(loss, rand(2))
push!(null_space_sample, result.minimizer)
end
null_space_sample
end
@time null_space_sample = make_null_sample(1000)
null_space_sample_x = [x for (x, _) in null_space_sample]
null_space_sample_y = [y for (_, y) in null_space_sample]
using Plots
scatter(null_space_sample_x, null_space_sample_y)
Relu fixes 0, so the nullspace of the first layer embeds into the kernel of the larger chain. In addition you get anything that is completely negative and anything that’s in the nullspace of the subsequent layer. Since these layers are linear you can pull them back to the input. as a result, you can definitely compute the kernel analytically for simple networks like this, but note that it won’t be a vector space.
Let \mathbf{F}(\mathbf{x}) be the neural network function with elements f_i(\mathbf{x}). If you want a “linear” subspace then first you need a point \mathbf{x}^* in that subspace where \mathbf{F}(\mathbf{x}^*) = \mathbf{0}. Then you need to find a set of vectors \Delta \mathbf{x}_j where (\nabla_{\mathbf{x}} \mathbf{F}(\mathbf{x}^*))^T (\Delta \mathbf{x}_j) = 0 where \nabla_{\mathbf{x}} \mathbf{F} (\mathbf{x}^*) is the Jacobian of \mathbf{F} wrt \mathbf{x} at \mathbf{x}^*. This will give you the “linear nullspace” at \mathbf{x}^*. To make sure it is the nullspace at neighboring points also, a sufficient but not necessary condition is \mathbf{H}^i_{\mathbf{x} \mathbf{x}} (\mathbf{x}^*) \Delta \mathbf{x}_j = 0 for each nullspace basis \Delta \mathbf{x}_j and each function f_i where \mathbf{H}^i_{\mathbf{x} \mathbf{x}} (\mathbf{x}^*) is the Hessian of f_i wrt \mathbf{x} at \mathbf{x}^*. The RELU function is piecewise linear so the Hessians will all be 0s. So you only need a point and the nullspace of the Jacobian transpose.
Edit: the necessary condition can be written using a 3rd order tensor reduction but to avoid that I only used the sufficient condition above.