There is a pure manual NN example on macOS Python with numpy faster than Julia in training neural network - Stack Overflow
Also, l’ve recently tweaked that example further with matrix multiplications (got about x10 speed increase), l will make it public soon.
Flux.@epochs
is just a simple loop that prints the epoch each time through, so you’re not losing anything by writing that loop yourself.
Thanks for clarification!
Slight follow-up question. From the REPL, I get:
julia> using Flux
help?> Dense
search: Dense DenseArray DenseVector DenseMatrix DenseVecOrMat DimensionMismatch codeunits ncodeunits
Dense(in::Integer, out::Integer, σ = identity)
Creates a traditional Dense layer with parameters W and b.
y = σ.(W * x .+ b)
The input x must be a vector of length in, or a batch of vectors represented as an in × N matrix. The out y will be
a vector or batch of length out.
It seems to me that the calling:
julia> d = Dense(param(randn(Float64, 2, 4)), param(zeros(Float64, 2)), σ)
Dense(4, 2, NNlib.σ)
uses different/undocumented arguments? Or does param(randn(Float64,2,4)))
insert the W
matrix into the model and respond with the in
dimension?
OK… a little bit more testing. Seems like I can choose the initial choice of parameters by:
julia> using Flux
julia> W = [1 2 3 4;1 -1 2 0]
julia> b = [1,2]
julia> Dense(param(W),param(b));
… if I want to — I’m not saying that this is very useful, but it is good to know of the possibility.
I tried to do the same for an RNN, but in that case, this doesn’t work…
- Is that because the RNN doesn’t support this setting of initial parameters, or
- Is it because the RNN has feedback from the output of the activation function, and needs an additional weight matrix W_f for this feedback?
I also tried to do:
julia> params(Dense(2,3))
Params([Float32[1.06203 0.2185; -0.234419 0.557728; -1.00303 -0.15665] (tracked), Float32[0.0, 0.0, 0.0] (tracked)])
which to me looks like a (tracked) array [W, b]
.
On the other hand, doing:
julia> params(RNN(2,3))
leads to what looks like a (tracked) array [W,W_f,b,b_f]
— my guess stems from W
being a 2\times 3 matrix, W_f
being a 3\times 3 matrix, while the two additional elements are 3 element vectors.
- The second vector is a null vector, i.e.,
b_f = [0,0,0]
. - I’m surprised there appears to be two bias vectors (
b
,b_f
in my syntax). I wouldn’t think that the feedback signal around the activation function would need a bias — isn’tb + b_f
the bias into the activation function?