Basics steps to build an ANN?

sairus7 · March 13, 2019, 3:57pm

There is a pure manual NN example on macOS Python with numpy faster than Julia in training neural network - Stack Overflow
Also, l’ve recently tweaked that example further with matrix multiplications (got about x10 speed increase), l will make it public soon.

dellison · March 13, 2019, 5:44pm

Flux.@epochs is just a simple loop that prints the epoch each time through, so you’re not losing anything by writing that loop yourself.

github.com

FluxML/Flux.jl/blob/master/src/optimise/train.jl#L103


      
          using ProgressLogging: @progress, @withprogress, @logprogress
          import Zygote: Params, gradient, withgradient
          
          # Add methods to Optimisers.jl's function, so that there is just one Flux.update!
          # for both explicit and implicit parameters.
          import Optimisers.update!
          
          """
              update!(opt, p, g)
              update!(opt, ps::Params, gs)
          
          Perform an update step of the parameters `ps` (or the single parameter `p`)
          according to optimiser `opt::AbstractOptimiser`  and the gradients `gs` (the gradient `g`).
          
          As a result, the parameters are mutated and the optimiser's internal state may change.
          The gradient could be mutated as well.
          
          !!! compat "Deprecated"
              This method for implicit `Params` (and `AbstractOptimiser`) will be removed from Flux 0.15.
              The explicit method `update!(opt, model, grad)` from Optimisers.jl will remain.
          """
          function update!(opt::AbstractOptimiser, x::AbstractArray, x̄)
            x̄r = copyto!(similar(x̄), x̄)  # Flux.Optimise assumes it can mutate the gradient. This is not
                                         # safe due to aliasing, nor guaranteed to be possible, e.g. Fill.
            x .-= apply!(opt, x, x̄r)
          end
          
          function update!(opt::AbstractOptimiser, xs::Params, gs)
            for x in xs
              isnothing(gs[x]) && continue
              update!(opt, x, gs[x])
            end
          end
          
          # Callback niceties
          call(f, xs...) = f(xs...)
          runall(f) = f
          runall(fs::AbstractVector) = () -> foreach(call, fs)
          
          
          batchmemaybe(x) = tuple(x)
          batchmemaybe(x::Tuple) = x
          
          """
              train!(loss, pars::Params, data, opt::AbstractOptimiser; [cb])
                  
          Uses a `loss` function and training `data` to improve the 
          model's parameters according to a particular optimisation rule `opt`.
          
          !!! compat "Deprecated"
              This method with implicit `Params` will be removed from Flux 0.15.
              It should be replaced with the explicit method `train!(loss, model, data, opt)`.
          
          For each `d in data`, first the gradient of the `loss` is computed like this:
          ```
              gradient(() -> loss(d...), pars)  # if d isa Tuple
              gradient(() -> loss(d), pars)     # otherwise
          ```
          Here `pars` is produced by calling [`Flux.params`](@ref) on your model.
          (Or just on the layers you want to train, like `train!(loss, params(model[1:end-2]), data, opt)`.)
          This is the "implicit" style of parameter handling.
          
          This gradient is then used by optimiser `opt` to update the parameters:
          ```
              update!(opt, pars, grads)
          ```
          The optimiser should be from the `Flux.Optimise` module (see [Optimisers](@ref)).
          Different optimisers can be combined using [`Flux.Optimise.Optimiser`](@ref Flux.Optimiser).
          
          This training loop iterates through `data` once.
          It will stop with a `DomainError` if the loss is `NaN` or infinite.
          
          You can use use `train!` inside a for loop to do this several times, or 
          use for instance `Itertools.ncycle` to make a longer `data` iterator.
          
          ## Callbacks
          
          [Callbacks](@ref) are given with the keyword argument `cb`.
          For example, this will print "training" every 10 seconds (using [`Flux.throttle`](@ref)):
          ```
              train!(loss, params, data, opt, cb = throttle(() -> println("training"), 10))
          ```
          
          Multiple callbacks can be passed to `cb` as array.
          """
          function train!(loss, ps::Params, data, opt::AbstractOptimiser; cb = () -> ())
            cb = runall(cb)
            itrsz = Base.IteratorSize(typeof(data))
            n = (itrsz == Base.HasLength()) || (itrsz == Base.HasShape{1}()) ? length(data) : 0
            @withprogress for (i, d) in enumerate(data)
              l, gs = withgradient(ps) do
                loss(batchmemaybe(d)...)
              end
              if !isfinite(l)
                throw(DomainError("Loss is $l on data item $i, stopping training"))
              end
              update!(opt, ps, gs)
              cb()
          
              @logprogress iszero(n) ? nothing : i / n
            end
          end

BLI · March 14, 2019, 9:51pm

Thanks for clarification!

BLI · March 18, 2019, 7:55am

Slight follow-up question. From the REPL, I get:

julia> using Flux

help?> Dense
search: Dense DenseArray DenseVector DenseMatrix DenseVecOrMat DimensionMismatch codeunits ncodeunits

  Dense(in::Integer, out::Integer, σ = identity)

  Creates a traditional Dense layer with parameters W and b.

  y = σ.(W * x .+ b)

  The input x must be a vector of length in, or a batch of vectors represented as an in × N matrix. The out y will be
  a vector or batch of length out.

It seems to me that the calling:

julia> d = Dense(param(randn(Float64, 2, 4)), param(zeros(Float64, 2)), σ)
Dense(4, 2, NNlib.σ)

uses different/undocumented arguments? Or does param(randn(Float64,2,4))) insert the W matrix into the model and respond with the in dimension?

BLI · March 18, 2019, 10:04am

OK… a little bit more testing. Seems like I can choose the initial choice of parameters by:

julia> using Flux
julia> W = [1 2 3 4;1 -1 2 0]
julia> b = [1,2]
julia> Dense(param(W),param(b));

… if I want to — I’m not saying that this is very useful, but it is good to know of the possibility.

I tried to do the same for an RNN, but in that case, this doesn’t work…

Is that because the RNN doesn’t support this setting of initial parameters, or
Is it because the RNN has feedback from the output of the activation function, and needs an additional weight matrix W_f for this feedback?

I also tried to do:

julia> params(Dense(2,3))
Params([Float32[1.06203 0.2185; -0.234419 0.557728; -1.00303 -0.15665] (tracked), Float32[0.0, 0.0, 0.0] (tracked)])

which to me looks like a (tracked) array [W, b].

On the other hand, doing:

julia> params(RNN(2,3))

leads to what looks like a (tracked) array [W,W_f,b,b_f] — my guess stems from W being a 2\times 3 matrix, W_f being a 3\times 3 matrix, while the two additional elements are 3 element vectors.

The second vector is a null vector, i.e., b_f = [0,0,0].
I’m surprised there appears to be two bias vectors (b, b_f in my syntax). I wouldn’t think that the feedback signal around the activation function would need a bias — isn’t b + b_f the bias into the activation function?

Topic		Replies	Views
First steps into Machine Learning? General Usage	4	392	April 7, 2022
A package/example for a simple neural network? Machine Learning question	4	1276	July 17, 2020
Any tutorials/examples on implementing a basic cnn in Flux? New to Julia question	1	1193	August 6, 2020
Can't replicate neural network from Python's sklearn using Flux.jl General Usage flux , machine-learning , neural-network	11	1351	June 20, 2021
Creating some neural networks General Usage question	11	813	December 8, 2020

Basics steps to build an ANN?

Related topics