Computing null space of a Flux Chain

I have trained a Flux Chain with Dense layers. I would like to determine the null space (the space of input vectors for which the output is zeros(s) ) for this Chain. I am looking for a Julia like way to determine the projection matrix from this chain. However, I am open to other approaches to determine this.

Thanks
Jagir

Hmm, interesting. I wonder how this is done in other languages.

So you are solving

f(x) = 0 where f is your neural network of dense layers.

So

act(M_n*...*act(M_3*act(M_2*act(M_1*x)))) = 0

I never thought this would be easy anywhere. I would run like a optimization algorithm

f(x)^2 and try to get it to 0 by starting at random x I can find a new points, but I don’t if it’s possible to find a “space”. Sounds very hard.

1 Like

Something like this works for me

using Flux, Optim

m = Chain(
    Dense(2, 8, relu),
    Dense(8, 8, relu),
    Dense(8, 8)
)

loss(x) = sum(x->x^2, m(x))

function make_null_sample(n)
    null_space_sample = Vector{Float32}[]

    for i in 1:n
        result = optimize(loss, rand(2))
        push!(null_space_sample, result.minimizer)
    end
    null_space_sample
end

@time null_space_sample = make_null_sample(1000)


null_space_sample_x = [x for (x, _) in null_space_sample]
null_space_sample_y = [y for (_, y) in null_space_sample]

using Plots

scatter(null_space_sample_x, null_space_sample_y)

Relu fixes 0, so the nullspace of the first layer embeds into the kernel of the larger chain. In addition you get anything that is completely negative and anything that’s in the nullspace of the subsequent layer. Since these layers are linear you can pull them back to the input. as a result, you can definitely compute the kernel analytically for simple networks like this, but note that it won’t be a vector space.

OK so it might be easier for relu activation. But op did not specify an activation. So I wonder if op is after a general solution

Yes, of course. It’s not tractable analytically on general.

1 Like

Let \mathbf{F}(\mathbf{x}) be the neural network function with elements f_i(\mathbf{x}). If you want a “linear” subspace then first you need a point \mathbf{x}^* in that subspace where \mathbf{F}(\mathbf{x}^*) = \mathbf{0}. Then you need to find a set of vectors \Delta \mathbf{x}_j where (\nabla_{\mathbf{x}} \mathbf{F}(\mathbf{x}^*))^T (\Delta \mathbf{x}_j) = 0 where \nabla_{\mathbf{x}} \mathbf{F} (\mathbf{x}^*) is the Jacobian of \mathbf{F} wrt \mathbf{x} at \mathbf{x}^*. This will give you the “linear nullspace” at \mathbf{x}^*. To make sure it is the nullspace at neighboring points also, a sufficient but not necessary condition is \mathbf{H}^i_{\mathbf{x} \mathbf{x}} (\mathbf{x}^*) \Delta \mathbf{x}_j = 0 for each nullspace basis \Delta \mathbf{x}_j and each function f_i where \mathbf{H}^i_{\mathbf{x} \mathbf{x}} (\mathbf{x}^*) is the Hessian of f_i wrt \mathbf{x} at \mathbf{x}^*. The RELU function is piecewise linear so the Hessians will all be 0s. So you only need a point and the nullspace of the Jacobian transpose.

Edit: the necessary condition can be written using a 3rd order tensor reduction but to avoid that I only used the sufficient condition above.

1 Like

Hi Xiaodai
Thanks for the suggestion. I had this in mind but as you have mentioned this is an expensive approach to approximate the null space.

Hi Keno
Thanks for sharing this observation. I am using swish, which does have this property. However, it is a nice approach to handle such problems.