Create non-linear structure in higher dimensional space

Suppose I’m creating a non-linear structure in a 20-dim space. Right now I have a vector

m = rand(1)
vector = [cos(m),sin(m),rand(1),rand(1),rand(1),rand(1),rand(1),rand(1),rand(1),rand(1),rand(1),rand(1),rand(1),rand(1),rand(1),rand(1),rand(1),rand(1),rand(1),rand(1)]

I am wondering if I repeat it 85 times, getting a 20*85 matrix. Is this a 2-dim non-linear structure in 20dim space?

For one, those two lines won’t compile. m is a 1-element vector, so calls to cos(m) and sin(m) will fail.

And, if you just repeat that 20 times, you’ll only overwrite your vector 20 times.

And, also, what’s a non-linear structure?

My current code is

So what’s the question? It’s an array.

I was trying to create a 2-dim non-linear structure in 20-dim space. So I used cos(m) and sin(m) here to indicate 2-dim structure, and 18 0’s to create 20-dim space. But I’m not sure if I’m correct.

Again, what’s a non-linear structure?


Like this from In-Depth: Manifold Learning | Python Data Science Handbook

I think you’re saying you’d like to create points on a 2 dimensional nonlinear manifold embedded in 20 dimensional euclidean space?

Each point will be a 20 dimensional vector, and you will have N of them each of them represents a point on the manifold. you can create a 20xN dimensional matrix each one representing a point on the manifold.

A manifold is 2 dimensional if there are two coordinates that can define the points… like latitude and longitude on the surface of the earth…

It’s embedded in another space if it’s like the earth, a 2D surface of the earth but in a 3D world (or 4D if you include time)

If you create a vector like
[ f(t),g(t),…h(s),i(s)…]

then it’s defined by the values of t and s… that makes it 2 dimensional. but it’s embedded in N dimensional space because the vector has N entries.

Does that help?

1 Like

Yes, I really appreciate it. If I get it correctly,

[ f(t),g(t),…h(s),i(s)…]

This is a 2-dim non-linear manifold in 20-space, since there are t and s (2 variables) and the vector has 20 entries. So in my example, I can get

[ cos(t),sin(s),rand()*18]

If I want 3-dim, just add a new entry like

tan(m)

Is that correct?

1 Like

well, yes. but many times the manifold spans the other dimensions too. Like if you just take the two first entries in your vector you’ll have all the information you need. But in the case of say a sphere, if you have its surface points in (x,y,z) coordinates, taking just (x,y) doesn’t tell you everything you need to know about the surface.

So it depends on what you want. In some sense your example is trivially embedded in 20 dimensions, but sometimes it’s nontrivial. for example

[sin(t),cos(s*t),sin(s)]

is two dimensional (there is a t, and an s that determines everything) but you can’t just look at the first two entries, or any two entries, to find out everything.

2 Likes

That makes so much sense! In terms of linear or non-linear, is [cos(t), sin(t), cos(s), sin(s)] non linear?

those are nonlinear functions of t and s so yes the resulting points are not lying on some linear structure… For example if you double t you do not increase the values of the entries in your vector by a factor of 2.

1 Like

Well, no. Your vector [ cos(t),sin(s),rand()*18] gives a 20-dimensional object. Let me try to explain why…

Consider a function from \mathbb{R}^2 \to \mathbb{R}^{20}, i.e., for the sake of argument let’s take a linear one, but you can easily replace it by something non-linear, e.g., some neural network:

params = randn(20, 2)
emb(x) = params * x  # Note: 2d-input and 20d-output

Now, two-dimensional points can be mapped into the 20-dimensional space by emb:

points2D = [randn(2) for i=1:10]  # 10 2d points
points20D = [emb(x) for x in points2D]  # points mapped into 20d

Note that even though each point of points20D is 20-dimensional, it is actually determined by two values alone, namely the original 2-dimensional point which was passed through emb. In this sense it is a 2-dimensional structure as it actually only has 2 degrees of freedom, i.e., independent degrees of variation.

Now, when adding noise to that structure, i.e.,

noisy_points20D = [x .+ 0.1 * randn(20) for x in points20D]

we formally get a 20-dimensional structure as we now have 20 degrees of freedom. (When the noise is small compared to the variation induced by the original 2-dimensional points, we can still consider it close to a 2-dimensional structure though).
In any case, your original vector

v = [cos(m), sin(m), rand(1), ..., rand(1)]

defines a 19-dimensional structure as only the first two dimensions obey a constraint, i.e., v_1 = \cos(m), v_2 = \sin(m) \; \mathrm{for}\; m \in \mathbb{R}. The other 18 dimensions are all independent and unconstrained.
Thus, in order to define a 3-dimensional structure you need a function taking 3-dimensional inputs to 20-dimensional outputs.

Disclaimer: The above explanation is supposed to give an intuition. I have deliberately avoided terminology such as embedding or manifold which require quite some mathematical background in order to define and understand properly.

2 Likes

Yes, thank you, I wanted to say something about this, but wasn’t really sure what to say, your explanation was excellent.

Also this is a really good point, because often we’re talking about noisy measurements that are close to a manifold not points exactly on a manifold.

For example randn(100) will give 10 points in a 100 dimensional space. But the radius of the points from 0 will be somewhat close to constant:

julia> x = randn((100,10))
100×10 Matrix{Float64}:
  0.343178   -1.12535    -1.5343    …   0.446902    0.591551   -1.52208
  0.944187    0.172339   -0.463517     -0.0853313  -1.25919     1.20593
 -0.496066    3.04636    -0.996671     -1.32236    -0.713387    1.95042
  0.102373   -1.99624    -0.142952      0.8116      1.06179    -0.692066
  0.309483    1.96455    -1.64203      -0.121813   -0.591782    0.154822
 -0.0739672  -0.846094   -1.90576   …  -1.45947    -1.26501    -0.881332
  0.286922    2.82547    -1.25119      -0.824619    1.42905    -1.3745
 -0.384594    1.09333     0.371566      0.848218    0.397301   -0.322818
 -0.505867   -0.0398853  -0.591467     -1.12665    -0.177795   -1.41563
 -0.218978   -0.958869   -0.282         1.32257    -0.204739    0.922322
  ⋮                                 ⋱                          
 -1.03277     0.447392   -1.1539        0.240071    1.68531     0.566781
  1.11267     0.812487   -0.216495      0.993072   -0.465258    0.120498
  0.171836    0.0649757   0.317241      2.43816     0.605896    1.03793
 -0.225233   -0.897226    0.780585      1.53152    -0.0905737   0.327664
  0.0542443  -0.865843    0.375161  …  -0.96098     1.72734    -1.10211
  1.33527    -0.355094    0.800118     -0.381612   -1.1056     -1.13951
  1.24259     0.354543    1.21456       0.725418   -0.270171   -0.0556411
 -3.21898    -0.640622    1.32772       0.213248    0.227875    0.719808
  0.774443   -0.628743   -0.157023      1.69349    -0.67228    -1.44384

but

julia> norms = [norm(x[:,i]) for i in 1:10]
10-element Vector{Float64}:
  9.760822263894887
 10.612339995275105
  9.565572033048669
  9.268107843558445
 10.599943674249502
 10.07075860226165
 10.889092205690844
 10.881029608634295
 10.58396886913258
  9.850831560258094

The points are all near a radius 10 “spherical” surface in 100 dimensions.

If we want to know where the vectors are, we’ve got 99 problems but the radius ain’t one (sorry, couldn’t resist).

Yes, this should be a well known property of high-dimensional normal distributions. It seems more of a concentration of measure phenomenon though … not quite sure if or how that connects to the idea of lower-dimensional manifolds?

Basically the points are near a 99 dimensional manifold (if you give 99 angles, you know the radius approximately already) in many real world situations, the points don’t lie exactly on the lower dimensional manifold, but rather nearby. This is an example of that.

Ah yes, was thinking about manifolds with dimensions \ll N instead of N-1. Nice example though, should be more widely known.

1 Like

Yes, you run into these manifold issues a lot in sampling high dimensional probability distributions. In the example actually if you restrict to the first 50 dimensions, the same thing is true, the points are near a 49 dimensional spherical surface, with somewhat more wiggle in the radius. Same with the second 50 dimensions, so in some sense depending on how close is close, you can think of a 100 dimensional independent normal distribution as having considerably less than 100 dimensions.

If you set up a non-independent gaussian, like a gaussian process, you’ll sample smooth functions. Each one might be easy to describe as a Fourier series with only 4 or 5 coordinates, so a 100 dimensional gaussian can easily have as few as 4-5 dimensions that “matter”.