Basics steps to build an ANN?


There is a pure manual NN example on
Also, l’ve recently tweaked that example further with matrix multiplications (got about x10 speed increase), l will make it public soon.


Flux.@epochs is just a simple loop that prints the epoch each time through, so you’re not losing anything by writing that loop yourself.


Thanks for clarification!


Slight follow-up question. From the REPL, I get:

julia> using Flux

help?> Dense
search: Dense DenseArray DenseVector DenseMatrix DenseVecOrMat DimensionMismatch codeunits ncodeunits

  Dense(in::Integer, out::Integer, σ = identity)

  Creates a traditional Dense layer with parameters W and b.

  y = σ.(W * x .+ b)

  The input x must be a vector of length in, or a batch of vectors represented as an in × N matrix. The out y will be
  a vector or batch of length out.

It seems to me that the calling:

julia> d = Dense(param(randn(Float64, 2, 4)), param(zeros(Float64, 2)), σ)
Dense(4, 2, NNlib.σ)

uses different/undocumented arguments? Or does param(randn(Float64,2,4))) insert the W matrix into the model and respond with the in dimension?


OK… a little bit more testing. Seems like I can choose the initial choice of parameters by:

julia> using Flux
julia> W = [1 2 3 4;1 -1 2 0]
julia> b = [1,2]
julia> Dense(param(W),param(b));

… if I want to — I’m not saying that this is very useful, but it is good to know of the possibility.

I tried to do the same for an RNN, but in that case, this doesn’t work…

  • Is that because the RNN doesn’t support this setting of initial parameters, or
  • Is it because the RNN has feedback from the output of the activation function, and needs an additional weight matrix W_f for this feedback?

I also tried to do:

julia> params(Dense(2,3))
Params([Float32[1.06203 0.2185; -0.234419 0.557728; -1.00303 -0.15665] (tracked), Float32[0.0, 0.0, 0.0] (tracked)])

which to me looks like a (tracked) array [W, b].

On the other hand, doing:

julia> params(RNN(2,3))

leads to what looks like a (tracked) array [W,W_f,b,b_f] — my guess stems from W being a 2\times 3 matrix, W_f being a 3\times 3 matrix, while the two additional elements are 3 element vectors.

  • The second vector is a null vector, i.e., b_f = [0,0,0].
  • I’m surprised there appears to be two bias vectors (b, b_f in my syntax). I wouldn’t think that the feedback signal around the activation function would need a bias — isn’t b + b_f the bias into the activation function?