Computing null space of a Flux Chain

Jagirhussan · August 22, 2020, 10:38am

I have trained a Flux Chain with Dense layers. I would like to determine the null space (the space of input vectors for which the output is zeros(s) ) for this Chain. I am looking for a Julia like way to determine the projection matrix from this chain. However, I am open to other approaches to determine this.

Thanks
Jagir

xiaodai · August 22, 2020, 11:09am

Hmm, interesting. I wonder how this is done in other languages.

So you are solving

f(x) = 0 where f is your neural network of dense layers.

So

act(M_n*...*act(M_3*act(M_2*act(M_1*x)))) = 0

I never thought this would be easy anywhere. I would run like a optimization algorithm

f(x)^2 and try to get it to 0 by starting at random x I can find a new points, but I don’t if it’s possible to find a “space”. Sounds very hard.

xiaodai · August 22, 2020, 11:21am

Something like this works for me

using Flux, Optim

m = Chain(
    Dense(2, 8, relu),
    Dense(8, 8, relu),
    Dense(8, 8)
)

loss(x) = sum(x->x^2, m(x))

function make_null_sample(n)
    null_space_sample = Vector{Float32}[]

    for i in 1:n
        result = optimize(loss, rand(2))
        push!(null_space_sample, result.minimizer)
    end
    null_space_sample
end

@time null_space_sample = make_null_sample(1000)


null_space_sample_x = [x for (x, _) in null_space_sample]
null_space_sample_y = [y for (_, y) in null_space_sample]

using Plots

scatter(null_space_sample_x, null_space_sample_y)

Keno · August 22, 2020, 1:03pm

Relu fixes 0, so the nullspace of the first layer embeds into the kernel of the larger chain. In addition you get anything that is completely negative and anything that’s in the nullspace of the subsequent layer. Since these layers are linear you can pull them back to the input. as a result, you can definitely compute the kernel analytically for simple networks like this, but note that it won’t be a vector space.

xiaodai · August 22, 2020, 1:27pm

OK so it might be easier for relu activation. But op did not specify an activation. So I wonder if op is after a general solution

Keno · August 22, 2020, 1:28pm

Yes, of course. It’s not tractable analytically on general.

mohamed82008 · August 22, 2020, 2:33pm

Let \mathbf{F}(\mathbf{x}) be the neural network function with elements f_i(\mathbf{x}). If you want a “linear” subspace then first you need a point \mathbf{x}^* in that subspace where \mathbf{F}(\mathbf{x}^*) = \mathbf{0}. Then you need to find a set of vectors \Delta \mathbf{x}_j where (\nabla_{\mathbf{x}} \mathbf{F}(\mathbf{x}^*))^T (\Delta \mathbf{x}_j) = 0 where \nabla_{\mathbf{x}} \mathbf{F} (\mathbf{x}^*) is the Jacobian of \mathbf{F} wrt \mathbf{x} at \mathbf{x}^*. This will give you the “linear nullspace” at \mathbf{x}^*. To make sure it is the nullspace at neighboring points also, a sufficient but not necessary condition is \mathbf{H}^i_{\mathbf{x} \mathbf{x}} (\mathbf{x}^*) \Delta \mathbf{x}_j = 0 for each nullspace basis \Delta \mathbf{x}_j and each function f_i where \mathbf{H}^i_{\mathbf{x} \mathbf{x}} (\mathbf{x}^*) is the Hessian of f_i wrt \mathbf{x} at \mathbf{x}^*. The RELU function is piecewise linear so the Hessians will all be 0s. So you only need a point and the nullspace of the Jacobian transpose.

Edit: the necessary condition can be written using a 3rd order tensor reduction but to avoid that I only used the sufficient condition above.

Jagirhussan · August 22, 2020, 11:29pm

Hi Xiaodai
Thanks for the suggestion. I had this in mind but as you have mentioned this is an expensive approach to approximate the null space.

Jagirhussan · August 22, 2020, 11:32pm

Hi Keno
Thanks for sharing this observation. I am using swish, which does have this property. However, it is a nice approach to handle such problems.

Topic		Replies	Views
Null gradients with GANs Machine Learning question , flux	1	289	December 2, 2022
Help with gradient Machine Learning flux	6	1197	January 31, 2020
What is the most efficient way of obtaining the orthogonal column space of a matrix? General Usage linearalgebra	14	3242	November 29, 2021
Flux with Matrix input dimensions Machine Learning flux	2	1287	March 12, 2023
Nullspace Function Vs QR decomposition Numerics	8	558	February 24, 2024

Computing null space of a Flux Chain

Related topics