You can achieve near 15% speedup without using any packages if you feed the function Log_1_plus_Exp(x)
as a first argument to sum
, it also cuts the total allocated memory in half. Using xt = transpose(x)
is just for clarity, remember that transpose
is lazy. BTW, VML.jl isn’t supported in Windows yet.
function Log_Unnorm_Probs_B(x, W, Nh)
Log_1_plus_Exp(x) = x > 50 ? x : log(1+exp(x))
b = view(W,:,1)
ww = view(W,:,2:Nh+1)
xt = transpose(x)
xt*b .+ sum(Log_1_plus_Exp, xt*ww, dims=2)
end
NNv = 300
NNh = 600
Nsamples = 4096
WW = randn(NNv+1,NNh+1)
xxx = rand([0.0,1.0], NNv+1, Nsamples)
xxx[1,:] .= 1.0
using BenchmarkTools
@btime Log_Unnorm_Probs_B(xxx, WW, NNh)
71.201 ms (16 allocations: 18.84 MiB)
Edit:
Thanks to @Amin_Yahyaabadi, Crown421/VML.jl now works nice in Windows.