Flux newbie: simple Markov

Note: Flux newbie here.

I am trying to learn Flux with the goal in the end to experiment with different GRU-like models,
oneHot to oneHot with crossentropy, background is NLP.

I started learning Flux by trying to set up the simplest possible Markov chain, h0 is the prediction of the first observation, seq_length = 10. So

A) input = x,
B) ht = \hat{x},
C) correct output = x.

My excercise: estimate h0 and Wxh (and check with the simple non-NN estmate). Since I am not very intelligent I failed at this simple exercise, because RNN supposes there is a direct connection from an input signal to a the output. And if I try to write some code of my own and wrap in Recurrent() it complains about initialstates() not implemented and then signature wrong when I try to implement it. Sooo …

  1. Is there any more comprehensive introduction with examples anywhere?

  2. I can see that the source code in recurrent.jl is actually readable and with lots of comments. Is a better plan to take a couple of days and reverse engineer that file to figure out what is going on?

  3. What is a good way to think about a prediction model? I mean I could take the input x[1:end-1] and the output x[2:end] but that is a little bit silly since I do want a prediction of the first time point, both in my exercise and in coming complex models.