I am trying to trying to replicate Figure 8 (page 18) and Algorithm 6.1 (page 17) from this introductory ML paper for mathematicians. The figure for convenience is:

where the ten points are classified as cateogry A (red circles) or category B (blue crosses).

Here is how my flux model (complete MWE) looks like:

```
using Flux
# copy paste data points from the paper
x1 = [0.1,0.3,0.1,0.6,0.4,0.6,0.5,0.9,0.4,0.7]; # from the paper
x2 = [0.1,0.4,0.5,0.9,0.2,0.3,0.6,0.2,0.4,0.6];
y = [ones(1,5) zeros(1,5); zeros(1,5) ones(1,5)] # from the paper
# convert to be used with Flux/Julia
xs = [[x1[i], x2[i]] for i = 1:10]
ys = [y[:, i] for i = 1:10] # convert to be used with Flux
# Flux Model where input in R^2 and output in R^2 as well
m = Chain(Dense(2, 3, σ), Dense(3, 2, σ), Dense(2, 2, σ))
# apply the model to the x data points to see how well this randomly initialized model classifies the points
modresults = m.(xs)
# print the results
10-element Vector{Vector{Float64}}:
[0.6655142490442801, 0.5706965426383223]
[0.666123956314254, 0.5707553740872817]
[0.6660863149694488, 0.5707991135524418]
[0.6670868379883004, 0.5708697378084514]
[0.6659266662507914, 0.5706888975327645]
[0.6662432487462897, 0.5706847722527106]
[0.6665849850562766, 0.570792851971794]
[0.6663244954298975, 0.5705298786503195]
[0.6662136266692837, 0.5707463831017299]
[0.6667586892722243, 0.5707714286413034]
# classify the results into category A if F1(x) > F2(x), B otherwise where F1 and F2 are the first and second components of the output
class = map(r -> r[1] >= r[2] ? [1.0, 0.0] : [0.0, 1.0], modresults)
10-element Vector{Vector{Float64}}:
[1.0, 0.0]
[1.0, 0.0]
[1.0, 0.0]
[1.0, 0.0]
[1.0, 0.0]
[1.0, 0.0]
[1.0, 0.0]
[1.0, 0.0]
[1.0, 0.0]
[1.0, 0.0]
```

So obviously the randomly initialized model isn’t doing a very good job. Let’s train this model

```
datapts = zip(xs, ys) # collect the xs and ys for training purposes.
loss(x, y) = Flux.Losses.mse(m(x), y)
ps = params(m)
Flux.train!(loss, ps, datapts, Descent(0.01))
```

but this dosn’t really do much. Applying the trained model on the input data dosn’t really change much.

```
modresults = m.(xs) ## now, given the original data points, see how well it classifies them
class = map(r -> r[1] >= r[2] ? [1.0, 0.0] : [0.0, 1.0], modresults)
#modresults
10-element Vector{Vector{Float64}}:
[0.49853694974810886, 0.5012385889724634]
[0.49882430021793656, 0.5011796138586837]
[0.4990297863696163, 0.5012587103720679]
[0.49931308843249256, 0.5011119262471514]
[0.4985347432659953, 0.5011273015533314]
[0.4985718587162263, 0.5010484679398993]
[0.49898766820104984, 0.5011212037652792]
[0.49829956801599506, 0.500842015490126]
[0.4987827364676438, 0.5011429337398265]
[0.4989026390464357, 0.5010455716217703]
# classification
10-element Vector{Vector{Float64}}:
[0.0, 1.0]
[0.0, 1.0]
[0.0, 1.0]
[0.0, 1.0]
[0.0, 1.0]
[0.0, 1.0]
[0.0, 1.0]
[0.0, 1.0]
[0.0, 1.0]
[0.0, 1.0]
```

where the correct answer should be that half of them are labelled `[1, 0]`

and the other half `[0, 1]`

… what am i doing wrong?