I want to do a multiclass classification using Bayesian Neural Network (BNN) in Turing.jl and Flux.jl. There’s a good implementation already of the binary classification using BNN in Turing.jl, check it here. Hence, my goal is simply to extend this binary classification into multiclass. I raised this issue before in the TuringTutorials Github repository, check here, but there seems to be a problem in Flux’s softmax function (missing method).

Here’s my objective: I want to model the `iris`

dataset classifying three species: `setosa`

, `versicolor`

, and `virginica`

. In this case, the output layer is a multiclass and in Bayesian context this can be characterized by a Categorical distribution. Having said, the Categorical distribution takes probability as input, thus I added softmax on the network to normalize the output values between 0 and 1 (summing all values to 1). Here is the complete code:

```
using Flux
using RDatasets
using Measures
using StatsPlots
using PlotThemes
using Turing
pyplot()
theme(:orange)
iris = dataset("datasets", "iris");
@df iris scatter(:SepalLength, :SepalWidth, group = :Species,
xlabel = "Length", ylabel = "Width", markersize = 5,
markeralpha = 0.75, markerstrokewidth = 0, linealpha = 0,
m = (0.5, [:cross :hex :star7], 12),
margin = 5mm)
inputs = convert(Matrix, iris[1:4]);
labels = map(x -> x == "setosa" ? 0 : x == "versicolor" ? 1 : 2, iris[end]);
function weights(theta::AbstractVector)
W0 = reshape(theta[ 1:20], 5, 4); b0 = reshape(theta[21:25], 5)
W1 = reshape(theta[26:45], 4, 5); b1 = reshape(theta[46:49], 4)
WO = reshape(theta[50:61], 3, 4); bO = reshape(theta[61:63], 3)
return W0, b0, W1, b1, WO, bO
end
function feedforward(inp::Array{Float64, 2}, theta::AbstractVector)
W0, b0, W1, b1, W2, b2 = weights(theta)
model = Chain(
Dense(W0, b0, tanh),
Dense(W1, b1, tanh),
Dense(W2, b2, σ),
xs -> exp.(xs) / sum(exp.(xs))
)
return model(inp)
end
alpha = 0.09;
sigma = sqrt(1.0 / alpha);
@model bayesnn(inp, lab) = begin
theta ~ MvNormal(zeros(63), sigma .* ones(63))
preds = feedforward(inp, theta)
for i = 1:length(lab)
lab[i] ~ Categorical(preds[:, i])
end
end
steps = 5000
chain = sample(bayesnn(Array(inputs'), labels), HMC(steps, 0.05, 4));
```

The line `xs -> exp.(xs) / sum(exp.(xs))`

is simply the softmax function, but if I replace that with the Flux function, that is replacing `xs -> exp.(xs) / sum(exp.(xs))`

with `softmax`

, and run the last line: `chain = sample(bayesnn(Array(inputs'), labels), HMC(steps, 0.05, 4));`

I get the following error:

```
ERROR: MethodError: no method matching softmax!(::Array{ForwardDiff.Dual{ForwardDiff.Tag{getfield(Turing.Core, Symbol("#f#24")){Turing.Core.VarReplay.VarInfo,Turing.Model{Tuple{:theta},Tuple{:lab},getfield(Main, Symbol("###inner_function#571#13")){Array{Float64,2}},NamedTuple{(:lab,),Tuple{Array{Int64,1}}},NamedTuple{(:lab,),Tuple{Symbol}}},Turing.Sampler{HMC{Turing.Core.ForwardDiffAD{40},Any}}},Real},Float64,10},2}, ::Array{ForwardDiff.Dual{ForwardDiff.Tag{getfield(Turing.Core, Symbol("#f#24")){Turing.Core.VarReplay.VarInfo,Turing.Model{Tuple{:theta},Tuple{:lab},getfield(Main, Symbol("###inner_function#571#13")){Array{Float64,2}},NamedTuple{(:lab,),Tuple{Array{Int64,1}}},NamedTuple{(:lab,),Tuple{Symbol}}},Turing.Sampler{HMC{Turing.Core.ForwardDiffAD{40},Any}}},Real},Float64,10},2})
Closest candidates are:
softmax!(::Any) at /Users/alahmadgaidasaad/.julia/packages/NNlib/x0XUf/src/softmax.jl:28
Stacktrace:
[1] softmax(::Array{ForwardDiff.Dual{ForwardDiff.Tag{getfield(Turing.Core, Symbol("#f#24")){Turing.Core.VarReplay.VarInfo,Turing.Model{Tuple{:theta},Tuple{:lab},getfield(Main, Symbol("###inner_function#571#13")){Array{Float64,2}},NamedTuple{(:lab,),Tuple{Array{Int64,1}}},NamedTuple{(:lab,),Tuple{Symbol}}},Turing.Sampler{HMC{Turing.Core.ForwardDiffAD{40},Any}}},Real},Float64,10},2}) at /Users/alahmadgaidasaad/.julia/packages/NNlib/x0XUf/src/softmax.jl:29
[2] applychain(::Tuple{typeof(softmax)}, ::Array{ForwardDiff.Dual{ForwardDiff.Tag{getfield(Turing.Core, Symbol("#f#24")){Turing.Core.VarReplay.VarInfo,Turing.Model{Tuple{:theta},Tuple{:lab},getfield(Main, Symbol("###inner_function#571#13")){Array{Float64,2}},NamedTuple{(:lab,),Tuple{Array{Int64,1}}},NamedTuple{(:lab,),Tuple{Symbol}}},Turing.Sampler{HMC{Turing.Core.ForwardDiffAD{40},Any}}},Real},Float64,10},2}) at /Users/alahmadgaidasaad/.julia/packages/Flux/8XpDt/src/layers/basic.jl:31 (repeats 4 times)
:
:
```

However, if I use the above code as it is (not using `softmax`

). I’m getting the following error:

```
ERROR: ArgumentError: Categorical: the condition isprobvec(p) is not satisfied.
Stacktrace:
[1] macro expansion at /Users/alahmadgaidasaad/.julia/packages/Distributions/WHjOk/src/utils.jl:6 [inlined]
[2] Type at /Users/alahmadgaidasaad/.julia/packages/Distributions/WHjOk/src/univariate/discrete/categorical.jl:27 [inlined]
[3] Categorical(::Array{Float64,1}) at /Users/alahmadgaidasaad/.julia/packages/Distributions/WHjOk/src/univariate/discrete/categorical.jl:38
[4] macro expansion at /Users/alahmadgaidasaad/.julia/packages/Turing/sghu6/src/core/compiler.jl:43 [inlined]
:
:
```

Theoretically, the softmax defined by this line `xs -> exp.(xs) / sum(exp.(xs))`

should already satisfy `isprobvec`

, since the sum of the output of the network should be 1 and their values should be between 0 and 1. I tried to debug this by checking the `isprobvec`

using the following codes:

```
theta = rand(MvNormal(zeros(63), sigma .* ones(63)), 1)
W0, b0, W1, b1, W2, b2 = weights(theta[:, 1])
model = Chain(
Dense(W0, b0, tanh),
Dense(W1, b1, tanh),
Dense(W2, b2, σ),
xs -> exp.(xs) / sum(exp.(xs))
)
model(inputs[1, :])
# 3-element Array{Float64,1}:
# 0.5650137463877715
# 0.22707964752865994
# 0.2079066060835685
```

Now if we check the sum and `isprobvec`

, we have the following:

```
sum(model(inputs[1, :]))
# 0.9999999999999999
isprobvec(model(inputs[1, :]))
# true
```

Though the sum is not 1.0, but that’s already it theoretically. In fact, the `isprobvec`

confirmed it to be true already. However, if I use the `softmax`

function instead, we have:

```
theta = rand(MvNormal(zeros(63), sigma .* ones(63)), 1)
W0, b0, W1, b1, W2, b2 = weights(theta[:, 1])
model = Chain(
Dense(W0, b0, tanh),
Dense(W1, b1, tanh),
Dense(W2, b2, σ),
softmax
)
model(inputs[1, :])
# 3-element Array{Float64,1}:
# 0.5730746731083979
# 0.2157227988217102
# 0.21120252806989193
sum(model(inputs[1, :]))
# 1.0
isprobvec(model(inputs[1, :]))
# true
```

We get exactly 1.0 instead of 0.999999999. I want to use `xs -> exp.(xs) / sum(exp.(xs))`

instead of `softmax`

since I don’t have to bother with the missing method (see above error), but I’m not sure why `isprobvec`

is not satisfied for the Categorical distribution, when in fact it is already `true`

as seen in my initial debugging. Am I missing something? Thanks.