I tried to reimplement the simple forward pass of a GRU layer in order to be able to evaluate the RNN in other languages after training in Julia. So far with no luck, and Im starting to be out of ideas. Is there maybe someone who had a similar problem in the past and could give me a hint? Or someone who has even implemented this already?
I based my implementation on the documentation of Knet.RNN and Knet.rnnparam, but the results over a single timestep are already different. Here is my minimal testing code:
using Knet # Define a recurrent layer in Knet nX = 2 # Number of inputs nH = 3 # Number of hidden states knet_gru = RNN(nX, nH; rnnType = :gru, dataType = Float32) # recurrent layer (gru) implementation in Knet rnn_params = rnnparams(knet_gru) # extract the params # Define forward pass of gru again, using information of "@doc RNN" and "@doc rnnparam" function my_gru(Wr, Wi, Wn, Rr, Ri, Rn, bWr, bWi, bWn, bRr, bRi, bRn, x, h_in) r = sigm.(Wr' * x .+ Rr' * h_in .+ bWr .+ bRr) # reset gate i = sigm.(Wi' * x .+ Ri' * h_in .+ bWi .+ bRi) # input gate n = tanh.(Wn' * x .+ r .* (Rn' * h_in .+ bRn) .+ bWn) # new gate h_out = (1 .- i) .* n .+ i .* h_in return h_out end my_gru(W, x, h) = my_gru(W..., x, h) # Compare both x = randn(Float32, nX) h = randn(Float32, nH) knet_gru.h = h #set starting state for Knet RNN res1 = knet_gru(x) res2 = my_gru(rnn_params, x, h) print("Difference between Knet and my_gru: ") println(res1-res2)