Hello,
I just do the Coursera lectures about Deep Leanring by Andrew Ng, and want to implement what I learned in Julia (lecture is in Python), so I can check my understanding. So there is a small neural net, and it takes time, and I change the numerical type from Float64 to Float32 and get like 30% speed up.
Great, I think, and try Float16. That didn’t go so well. Time went up by a factor of 100 compared to Float64.
Time. Not speed.
I tried to isolate the issue and found a penalty of factor 10 for point wise multiplication of matrices.
This is the code in my test file:
numType = Float16
A = rand(numType, 10000, 10000)
B = rand(numType, 10000, 10000)
C = Array{numType, 2}(10000,10000)
@time C .= A .* B
and the result is (without some ramp up, just start the file, so that’s just a very crude test:
First two starts with Float32
0.235396 seconds (46.17 k allocations: 2.444 MiB)
0.209013 seconds (46.17 k allocations: 2.444 MiB)
And the two with Float16.
1.848572 seconds (46.60 k allocations: 2.468 MiB)
1.847379 seconds (46.60 k allocations: 2.468 MiB)
I found in the docs that Float16 is “implemented in software”, but that’s less performant than I expected.
Am I doing something wrong?
Thank you, and kind regards, z.