Softmax using Tullio

minetest2048 · November 6, 2022, 7:17am

I want to port my DIY softmax function

softmax(x) = exp.(x .- maximum(x)) / sum(exp.(x .- maximum(x)))

to Tullio einsum so that it runs faster and gives fast symbolic gradients

I split it into multiple lines, and it works:

function softmax_einsum(x)
    maxx = maximum(x)
    @tullio sumx := exp(x[i] - maxx) verbose=false
    @tullio ret[i] := exp(x[i] - maxx) / sumx verbose=false
end

It works

julia> t1 = [1.0,1,1,1]
4-element Vector{Float64}:
 1.0
 1.0
 1.0
 1.0

julia> softmax(t1)
4-element Vector{Float64}:
 0.25
 0.25
 0.25
 0.25

julia> NNlib.softmax(t1)
4-element Vector{Float64}:
 0.25
 0.25
 0.25
 0.25

julia> softmax_einsum(t1)
4-element Vector{Float64}:
 0.25
 0.25
 0.25
 0.25

But it gives different jacobian

julia> t1 = [1.0,1,1,1]
4-element Vector{Float64}:
 1.0
 1.0
 1.0
 1.0

julia> Zygote.jacobian(NNlib.softmax, t1)
([0.1875 -0.0625 -0.0625 -0.0625; -0.0625 0.1875 -0.0625 -0.0625; -0.0625 -0.0625 0.1875 -0.0625; -0.0625 -0.0625 -0.0625 0.1875],)

julia> Zygote.jacobian(softmax, t1)
([0.1875 -0.0625 -0.0625 -0.0625; -0.0625 0.1875 -0.0625 -0.0625; -0.0625 -0.0625 0.1875 -0.0625; -0.0625 -0.0625 -0.0625 0.1875],)

julia> Zygote.jacobian(softmax_einsum, t1)
([0.25 0.0 0.0 0.0; 0.0 0.25 0.0 0.0; 0.0 0.0 0.25 0.0; 0.0 0.0 0.0 0.25],)

What happened?

rveltz · November 6, 2022, 8:11am

you are computing exp(...) twice.

albheim · November 6, 2022, 9:12am

Yeah, im not sure why you get the wrong thing with tullio, but you can improve your implementation before that.

First I dont think the maximum does much if you are looking at it symbolically, it might have some numerical purpose but the answer should be the same without it.
And the as was mentioned above you calculate exp of all elements twice which is a large waste.

Topic		Replies	Views
Softmax applied to arrays? Performance question , economics , tullio	2	1508	April 14, 2021
Taking the gradient of an einsum Machine Learning cuda , zygote , tullio	4	306	August 21, 2023
Speeding up my logsumexp function Performance performance , array , tullio , loopvectorization	35	5120	October 21, 2022
Help to improve performance of gradient calculation on tensor operations Machine Learning question , performance	3	621	April 18, 2020
ForwardDiff.jacobian of softmax vs analytic Jacobian Performance question , differentiation	0	587	February 26, 2020

Softmax using Tullio

Related topics