Is this function well optimized for speed?

You can achieve near 15% speedup without using any packages if you feed the function Log_1_plus_Exp(x) as a first argument to sum, it also cuts the total allocated memory in half. Using xt = transpose(x) is just for clarity, remember that transpose is lazy. BTW, VML.jl isn’t supported in Windows yet.

function Log_Unnorm_Probs_B(x, W, Nh)
    Log_1_plus_Exp(x) = x > 50 ? x : log(1+exp(x))
    b  = view(W,:,1)
    ww = view(W,:,2:Nh+1)
    xt = transpose(x)
    xt*b .+ sum(Log_1_plus_Exp, xt*ww, dims=2)
end

NNv = 300
NNh = 600
Nsamples = 4096
WW  = randn(NNv+1,NNh+1)
xxx = rand([0.0,1.0], NNv+1, Nsamples)
xxx[1,:] .= 1.0

using BenchmarkTools
@btime Log_Unnorm_Probs_B(xxx, WW, NNh)
    71.201 ms (16 allocations: 18.84 MiB)

Edit:

Thanks to @Amin_Yahyaabadi, Crown421/VML.jl now works nice in Windows.

3 Likes