```
Dense(512, 128, initW=(dims...) -> Flux.kaiming_uniform(dims...; gain=sqrt(1/3)), initb=initW=(dims...) -> Flux.kaiming_uniform(dims...; gain=sqrt(1/3)))
```

Seems to initialize the weights similar to PyTorch. I don’t know why PyTorch is using a gain of `sqrt(1/3)`

but that’s what the source seems to show.

See: pytorch/linear.py at master · pytorch/pytorch · GitHub

In the future you can just do:

```
Dense(512, 128, initW=Flux.kaiming_uniform(gain=sqrt(1/3)), initb=Flux.kaiming_uniform(gain=sqrt(1/3)))
```

That will compile now, but wont work. You’ll have to wait for my Flux PR to be merged before this shorter line will work: Fix layer init functions kwargs getting overwritten by DevJac · Pull Request #1499 · FluxML/Flux.jl · GitHub

Edit: **Actually, this post isn’t quite right. The **`initW`

is correct, but there is no way to use `Flux.kaiming_uniform`

to initialize the biases the same way as PyTorch, as far as I can tell.