Dense(512, 128, initW=(dims...) -> Flux.kaiming_uniform(dims...; gain=sqrt(1/3)), initb=initW=(dims...) -> Flux.kaiming_uniform(dims...; gain=sqrt(1/3)))
Seems to initialize the weights similar to PyTorch. I don’t know why PyTorch is using a gain of sqrt(1/3)
but that’s what the source seems to show.
See: pytorch/linear.py at master · pytorch/pytorch · GitHub
In the future you can just do:
Dense(512, 128, initW=Flux.kaiming_uniform(gain=sqrt(1/3)), initb=Flux.kaiming_uniform(gain=sqrt(1/3)))
That will compile now, but wont work. You’ll have to wait for my Flux PR to be merged before this shorter line will work: Fix layer init functions kwargs getting overwritten by DevJac · Pull Request #1499 · FluxML/Flux.jl · GitHub
Edit: Actually, this post isn’t quite right. The initW
is correct, but there is no way to use Flux.kaiming_uniform
to initialize the biases the same way as PyTorch, as far as I can tell.