Softmax applied to arrays?

Hi everybody,

I would like to repeatedly apply (a variation of) the softmax function to parts of an array.

I would like to know if there are ways to leverage packages like Tullio, Yeppp, and other Julianic features as well to improve this.

For example, for each individual i, there are several J vectors x_{i,j} of K choice characteristics. The probability of choosing j over 1,...,J is given by the softmax function.

s_{i,j}(\theta) = \frac{\exp(x_{i,j}^\prime \theta)}{1 + \sum_h^N \exp(x_{i,h}^\prime \theta)}

The probability of not choosing anything (j=0) is

s_{i,0}(\theta) = \frac{1}{1 + \sum_h^N \exp(x_{i,h}^\prime \theta)}

Let y_{ij} = 1 if i purchases j and be zero otherwise. The log-likelihood of observing a sample of N individuals is

\frac{1}{N}\sum_{i=1}^N \sum_{j=0}^J y_{i,j} \log\left(s_{i,j}(\theta)\right).

A MWE where I try to apply the softmax! function repeatedly over the dataset is:

using TensorCore, BenchmarkTools, Tullio

# setup
N = 1000000    # observations
J  = 4      # options (there's an extra "outside option")
K  = 20      # product characteristics

# data
θ = randn(K)            # parameters
y = rand(0:10,J+1,N)    # purchases
x = rand(K,J,N)         # product characteristics

# softmax function
function softmax!(
    r::AbstractVector,
    xθ::AbstractVector
)
    J = length(xθ)
    u = maximum(xθ)
    s = 0.
    s += exp(-u)
    @inbounds for j = 1:J
        s += (r[j] = exp(xθ[j] - u))
    end
    invs = inv(s)
    @inbounds for j = 1:J
        r[j] *= invs
    end
end

# log-likelihood function
function llk(
    y::AbstractMatrix,  # (J+1) × N
    x::AbstractArray,   # K × J × N
    θ::AbstractVector   # K-
)
    K,J,N = size(x)
    r = zeros(J)
    llk = 0.
    xθ = θ ⊡ x
    for n=1:N
        softmax!(r,xθ[:,n])
        for j=1:J
            llk += y[j,n] * log(r[j])       # likelihood of choices 1,...,J
        end
        llk += y[J+1,n] * log(1.0 - sum(r)) # likelihood of outside good
    end
    return llk/N
end

@btime llk($y,$x,$θ)

Is there any way to leverage Tullio to obtain the choice probabilities? I was thinking about something that involves the entire array…

# softmax over entire array using Tullio?
xθ = θ ⊡ x
@tullio num[j,n] := exp(xθ[j,n])
@tullio denom[n] := num[j,n]
@tullio shares[j,n] := num[j,n] / (1.0 + denom[n])
@tullio shares0[n] := 1.0 / (1.0 + denom[n])

Thank you all, Julia community is very helpful!

1 Like

BTW, how can I write centralized typeset equations here?

You may want NNlib.softmax, or more specifically something like dot(logsoftmax(θ ⊡ x; dims=1), @view y[2:end,:])/N. That’s not quite right, as you have this extra zeroth case; perhaps you can adapt it to keep the “denominator”. But working with solid arrays not slices, and directly working out the log, are likely to help.

2 Likes