Identify important subspaces of a function without squaring the conditioning number


I would like to find the important directions in the input and output spaces for a \mu-measurable function f: \mathbb{R}^n \xrightarrow{} \mathbb{R}^d.

One way to identify these directions is to look at the eigenvalue decomposition of the integrated inner and outer product of the Jacobian of the function f integrated over the distribution \mu:

C_x = \int (\nabla_x f(x))^\top (\nabla_x f(x)) d\mu(x) \in \mathbb{R}^{n,n} and C_y = \int (\nabla_x f(x)) (\nabla_x f(x))^\top d\mu(x) \in \mathbb{R}^{d,d}

In practice, we can perform a Monte Carlo appproximation of C_x, C_y given M samples x^i \sim \mu:

C_x \approx \frac{1}{M}\sum_{i=1}^M (\nabla_x f(x^i))^\top (\nabla_x f(x^i)) and similarly for C_y

Since C_x is positive definite, an eigendecomposition gives us:

C_x = V \Lambda_x V^T with V\in \mathbb{R}^{n,n}, where V an orthonormal basis for the input space. We can truncate this basis to identify the most important directions based on the decay of energy spectrum of \Lambda_x.

Similarly, we write C_y = U \Lambda_y U^T with U\in \mathbb{R}^{d,d}. U is an orthonormal basis for the output space.

However, this computation is not recommended because it squares the conditioning number. One could perform a SVD of \frac{1}{\sqrt{M}} \left[ \nabla f(x^1), \ldots, \nabla f(x^M) \right] and \frac{1}{\sqrt{M}} \left[ \nabla f(x^1)^\top, \ldots, \nabla f(x^M)^\top \right] and extract the left singulars to get U and V. However assembling these stacked matrices require a lot of storage.

Is there a better way to do this computation without assembling these large matrices? Are you aware of other algorithms that identify these important directions?


Consider using svdl from IterativeSolvers. As for avoiding forming the matrices, you can technically use a linear operator that defines multiplication and adjoint multiplication but the real question here is how to make these operations more efficient than simply forming the matrices and multiplying. In your case, the matrices have a special meaning. They are the Jacobians which means we can use tricks from automatic differentiation here to possibly improve things.

The operator \nabla f(x=x^1)^T v is essentially the gradient of f(x)^T v wrt x at x = x^1. Often, computing this will be more efficient than computing the full Jacobian. So consider using a reverse-mode AD package such as Zygote to define this operator. Even better, if you can derive a custom “adjoint rule” in ChainRules.jl, you can use this directly.

The second operator you need is \nabla f(x=x^1) v. This is the derivative of the vector valued function f(x(u)) wrt the scalar u where x(u) is specially defined such that x(u) = x^1 and dx/du is the vector v. This is kind of a hack to then use forward-mode automatic differentiation to compute the derivative of f wrt u which will be \nabla f(x=x^1) v. Again, you can define a special frule using ChainRules.jl to specify this operator efficiently or just use ForwardDiff2.jl to compute it for your function.

Let me know if this approach works for you :slight_smile:


Thank you for your valuable feedback. For now, I can only compute the Jacobian of f with finite differences because ForwardDiff does not support complex differentiation.

I will look for now at svdl

1 Like