Basics steps to build an ANN?

#1

Is there a simple example on how to build a first ANN? I have a basic understanding of ANNs. I’m not looking for impressive examples. So, I’m not interested in hand writing recognition, image recognition, etc.

Instead: suppose I need to find an ANN mapping X to Y where X might be a matrix in \mathbb{R}^{N\times n_x} or a Julia vector of n_x elements, each a vector of N elements, and Y\in \mathbb{R}^{N\times n_y} or a similar Julia vector of vectors.

So, suppose I start with generating data:

N=100
X=[range(0,2pi,length=N), range(0,2pi,length=N)]
Y = [sin(X[1][I])*cos(X[2][I]) for I in 1:N]


I know that I can create a mapping by interpolation, standard regression, etc. And that In practice, data would have noise, etc.

But suppose that I want to build an ANN.

• What are the steps I need to do to build the ANN for the above data?
• What are the steps I need to do to apply the ANN to new data?
• What if n_y>1, etc.?

OK – if I understand the above, I can start to play around with the packages.

[I’m asking because my employer is interested in machine learning, and I need an example that I can understand myself, in order to give a decent presentation. And I’d like to use Julia to do it instead of MATLAB, etc.]

#2

I really like Flux.jl and you might find it well-suited to your application. I would suggest looking at the Flux documentation, for example this section on building and training a very simple neural net: https://fluxml.ai/Flux.jl/stable/training/training/ and I would also suggest looking at the flux “model zoo” of example models, like this one: https://github.com/FluxML/model-zoo/blob/master/tutorials/60-minute-blitz.jl

#3

In the readme for the now-deprecated package Mocha.jl:

In particular, there are Knet.jl and Flux.jl for pure-Julia solutions, and MXNet.jl and Tensorflow.jl for wrapper to existing deep learning systems.

Have you looked at any of these options? This is not my area of expertise, sorry I can’t really be of more help

#4

Thanks! I’m sure Flux is good. The examples are categorized into:

What I’m looking for is more rudimentary examples. I’ll give it a check, though.

#5

I’d probably prefer Knet or Flux, since these appear to be more Julia specific. But I’ll check around.
Thanks again.

#6

…I’ll take a look at the 60 minutes blitz early next week.

#7

You may want to check out the JuliaAcademy courses.

By the way, I had to look up ANN - please don’t use abbreviations (or, rather, define them the first time that you use them).

#8

Thanks for tip – I’ll take a look at the JuliaAcademy course. I do suspect that those courses are “advanced” from my perspective, i.e., using examples from image analysis, hand writing recognition, etc. – which goes beyond my first need. But I’ll take a look.

ANN… I thought that abbreviation was as common within machine learning as ODE within differential equations, but I’ll define abbreviations in the future.

#9

MNIST is often considered the “hello world” of today’s deep learning, and deep learning networks are pretty much the standard now.

While there’s not much literature on Julia and deep learning yet, you might want to look at grokking deep learning by Andrew Trask. He only uses numpy, and those numpy examples will translate very directly to Julia

#10

The courses on JuliaAcademy start out with the basics and get steadily more involved. I think (and hope) you’ll find that they are perfect for getting started.

Disclaimer: I contributed to developing these courses.

#11

I’m back form a conference, and will try to look into JuliaAcademy. I find three courses on machine learning: one advanced one using Flux, one intermediate on the math of machine learning, and one on Knet. Which one would you think is suitable for me? I’m not interested in “feature extraction” things initially.

(I.e., I’m not interested in image recognition, hand writing recognition, etc. at this stage, but rather on simple “least squares” mappings from real inputs to real outputs. I know this can be done by choosing c_i such that y = \sum_i c_i \phi_i(x) + e where \phi_i(x) are chosen basis functions and e is some model error, and using simple linear algebra, but I’m interested in first understanding how this can be solved by “chaining” linear-combination + nonlinear output mapping layers a machine learning tool.)

#12

Try the intermediate one on the math of machine learning. It aims to explain how and why neural networks in general work.

#13

OK… I’ve had time for checking Flux. I think I’ve made it work by “cheating” and looking at other threads, but 3 questions remain in my “example” for dummies…

a. My data are generated from y_\mathrm{d}=\sin(x_\mathrm{d}) with x_\mathrm{d}\in [-\frac{\pi}{2},\frac{\pi}{2}]. I generate N=100 data using range. [Yes, I know this is stupidly simple, but it clarifies the understanding…]
b. My understanding is that Flux needs data in the following form (at least for the Dense layers…): data = [(xd,yd)] where \mathrm{xd} \in \mathbb{T}^{n_x \times N} and \mathrm{yd} \in \mathbb{T}^{n_y \times N} – here, I have used \mathbb{T} to denote the type — I use Float64.

I generate the data as follows:

# Packages
using Flux
using Plots; pyplot()
using Statistics
# Data
x_d = reshape(collect(range(-pi/2,pi/2,length=100)),1,100)
y_d = sin.(x_d)
plot(x_d',y_d')     # just to check
data = [(x_d,y_d)]


c. I want to start with a single layer, thus with n_x = 1 inputs and n_y = 1 outputs, and the \tanh nonlinear output mapping. In other words, y = \tanh(wx+b) which has 2 parameters (w,b). This should allow for an ok first example “for dummies”.
d. My impression is that with the basic layer Dense(nx,ny,sigma) where sigma is the function for the output nonlinearity, I can set up the problem as follows — including (i) model mod which is \tanh(wx+b) , (ii) parameter set par which is (w,b)par also keeps track of the model, (iii) a fitting function loss which is least squares, (iv) parameter optimization algorithm opt, and (v) updating the parameters one time:

# Set up Flux problem
mod = Dense(1,1,tanh)
par = params(mod)
loss(x, y) = mean((mod(x).-y).^2)
opt = ADAM(0.002, (0.99, 0.999))
# One update of parameters
Flux.train!(loss,par,data,opt)


e. I can set up a sequence of (say, 1000) updates with command:

@Flux.epochs 1000 Flux.train!(loss,par,data,opt)


f. At any time, I can read the parameters par and check the fitting (loss) by commands:

par
loss(x_d,y_d)


3 remaining problems

1. Flux seems to generate Float32 data.
• Can I change this to using Float64 somehow?
1. Flux responds with tracked arrays, e.g.:
julia> typeof(mod(x_d))
TrackedArray{…,Array{Float32,2}}

• … so: how can I convert this to untracked arrays so that I can plot the model mapping:
plot(x_d,mod(x_d))


(which doesn’t work because mod(x_d) is a TrackedArray…)

1. Coming from outside of the Machine Learning community, the term Epoch sounds weird.
• Is an Epoch simply a major iteration in the parameter update scheme?

OK… answers to questions 1, 2, 3 would clarify basic use of Flux, and should make it possible for me to move on to more interesting problems. […including having data with noise, splitting data between training and validation sets, chaining layers, multivariable problems, etc., etc.]

#14

@BLI I found Julia Academy to be fantastic! The self-paced video courses combined with Jupyter Notebooks are absolutely the epitome of didactic experiences!

#15

I will test it – after I’ve played around a little bit more.

#16

So… just by trial and error, I found that collect() changed a tracked array to an ordinary array. Don’t know if that works for arbitrary x_d, though. With a slight change of the function (shifting it along x_d so that a bias is necessary…):

# Packages
using Flux
using Plots; pyplot()
using Statistics
# Data
x_d = reshape(collect(range(0,pi,length=100)),1,100)
y_d = sin.(x_d.-pi/2)
data = [(x_d,y_d)]
# Set up Flux problem
mod = Dense(1,1,tanh)
# Initial fit
plot(x_d',y_d',label="data")
y_0 = Float64.(collect(mod(x_d)))
plot!(x_d',y_0',label="initial guess")
par = params(mod)
loss(x, y) = mean((mod(x).-y).^2)
opt = ADAM(0.002, (0.99, 0.999))
@Flux.epochs 3000 Flux.train!(loss,par,data,opt);
y_1k = Float64.(collect(mod(x_d)))
plot!(x_d',y_1k',label="fit @ 3000 epochs")


gives the plot:

Checking parameters and loss at 3000 epochs:

julia> par
Params([Float32[1.05148] (tracked), Float32[-1.64145] (tracked)])

julia> loss(x_d,y_d)
0.00265027057055282 (tracked)

A little bit more of playing around, but I guess I'll check out the JuliaAcademy soon.


Anyway, this is fun! And… just an initial test.

#17

You can also extract the data from a tracked array with the Flux.Tracker.data method:

julia> d = Dense(4, 2, σ)
Dense(4, 2, NNlib.σ)

julia> y = d(zeros(4))
Tracked 2-element Array{Float32,1}:
0.5f0
0.5f0

julia> Flux.Tracker.data(y)
2-element Array{Float32,1}:
0.5
0.5


As for using Float64 data, you should be able to construct a Dense layer out of any type of number you want:

ulia> d = Dense(param(randn(Float64, 2, 4)), param(zeros(Float64, 2)), σ)
Dense(4, 2, NNlib.σ)

julia> d(zeros(4))
Tracked 2-element Array{Float64,1}:
0.5
0.5


although it’s worth mentioning that Float32 is generally preferred in ML because of better GPU support.

#18

Thanks, Robin, using data looks much better than my approach. Regarding GPU – my laptop has an NVIDIA 1050 card, but I’ll probably not use that – using it consumes quite a bit of battery life. Still, I would probably try to figure out how to use it – just to know it.

#19

Couple of other simple things…

• Is there a way to suppress the printing of info during @Flux.epochs 3000 Flux.train!(...)? This “messes” up the IJulia session… (Can I do it by callback?)
• The trick Flux.Tracker.data(y) works on data, but not on parameters. Is there a way to extract parameters so that I can use them to compare mappings — without writing down the values manually?

#20

… ah… What’s the advantage of using, say, @Flux.epochs 3000 Flux.train!(...) vs.

for i in 1:3000
Flux.train!(...)
end


In the latter case, I avoid the “Info” printing.