I want to do a multiclass classification using Bayesian Neural Network (BNN) in Turing.jl and Flux.jl. There’s a good implementation already of the binary classification using BNN in Turing.jl, check it here. Hence, my goal is simply to extend this binary classification into multiclass. I raised this issue before in the TuringTutorials Github repository, check here, but there seems to be a problem in Flux’s softmax function (missing method).
Here’s my objective: I want to model the iris
dataset classifying three species: setosa
, versicolor
, and virginica
. In this case, the output layer is a multiclass and in Bayesian context this can be characterized by a Categorical distribution. Having said, the Categorical distribution takes probability as input, thus I added softmax on the network to normalize the output values between 0 and 1 (summing all values to 1). Here is the complete code:
using Flux
using RDatasets
using Measures
using StatsPlots
using PlotThemes
using Turing
pyplot()
theme(:orange)
iris = dataset("datasets", "iris");
@df iris scatter(:SepalLength, :SepalWidth, group = :Species,
xlabel = "Length", ylabel = "Width", markersize = 5,
markeralpha = 0.75, markerstrokewidth = 0, linealpha = 0,
m = (0.5, [:cross :hex :star7], 12),
margin = 5mm)
inputs = convert(Matrix, iris[1:4]);
labels = map(x -> x == "setosa" ? 0 : x == "versicolor" ? 1 : 2, iris[end]);
function weights(theta::AbstractVector)
W0 = reshape(theta[ 1:20], 5, 4); b0 = reshape(theta[21:25], 5)
W1 = reshape(theta[26:45], 4, 5); b1 = reshape(theta[46:49], 4)
WO = reshape(theta[50:61], 3, 4); bO = reshape(theta[61:63], 3)
return W0, b0, W1, b1, WO, bO
end
function feedforward(inp::Array{Float64, 2}, theta::AbstractVector)
W0, b0, W1, b1, W2, b2 = weights(theta)
model = Chain(
Dense(W0, b0, tanh),
Dense(W1, b1, tanh),
Dense(W2, b2, σ),
xs -> exp.(xs) / sum(exp.(xs))
)
return model(inp)
end
alpha = 0.09;
sigma = sqrt(1.0 / alpha);
@model bayesnn(inp, lab) = begin
theta ~ MvNormal(zeros(63), sigma .* ones(63))
preds = feedforward(inp, theta)
for i = 1:length(lab)
lab[i] ~ Categorical(preds[:, i])
end
end
steps = 5000
chain = sample(bayesnn(Array(inputs'), labels), HMC(steps, 0.05, 4));
The line xs -> exp.(xs) / sum(exp.(xs))
is simply the softmax function, but if I replace that with the Flux function, that is replacing xs -> exp.(xs) / sum(exp.(xs))
with softmax
, and run the last line: chain = sample(bayesnn(Array(inputs'), labels), HMC(steps, 0.05, 4));
I get the following error:
ERROR: MethodError: no method matching softmax!(::Array{ForwardDiff.Dual{ForwardDiff.Tag{getfield(Turing.Core, Symbol("#f#24")){Turing.Core.VarReplay.VarInfo,Turing.Model{Tuple{:theta},Tuple{:lab},getfield(Main, Symbol("###inner_function#571#13")){Array{Float64,2}},NamedTuple{(:lab,),Tuple{Array{Int64,1}}},NamedTuple{(:lab,),Tuple{Symbol}}},Turing.Sampler{HMC{Turing.Core.ForwardDiffAD{40},Any}}},Real},Float64,10},2}, ::Array{ForwardDiff.Dual{ForwardDiff.Tag{getfield(Turing.Core, Symbol("#f#24")){Turing.Core.VarReplay.VarInfo,Turing.Model{Tuple{:theta},Tuple{:lab},getfield(Main, Symbol("###inner_function#571#13")){Array{Float64,2}},NamedTuple{(:lab,),Tuple{Array{Int64,1}}},NamedTuple{(:lab,),Tuple{Symbol}}},Turing.Sampler{HMC{Turing.Core.ForwardDiffAD{40},Any}}},Real},Float64,10},2})
Closest candidates are:
softmax!(::Any) at /Users/alahmadgaidasaad/.julia/packages/NNlib/x0XUf/src/softmax.jl:28
Stacktrace:
[1] softmax(::Array{ForwardDiff.Dual{ForwardDiff.Tag{getfield(Turing.Core, Symbol("#f#24")){Turing.Core.VarReplay.VarInfo,Turing.Model{Tuple{:theta},Tuple{:lab},getfield(Main, Symbol("###inner_function#571#13")){Array{Float64,2}},NamedTuple{(:lab,),Tuple{Array{Int64,1}}},NamedTuple{(:lab,),Tuple{Symbol}}},Turing.Sampler{HMC{Turing.Core.ForwardDiffAD{40},Any}}},Real},Float64,10},2}) at /Users/alahmadgaidasaad/.julia/packages/NNlib/x0XUf/src/softmax.jl:29
[2] applychain(::Tuple{typeof(softmax)}, ::Array{ForwardDiff.Dual{ForwardDiff.Tag{getfield(Turing.Core, Symbol("#f#24")){Turing.Core.VarReplay.VarInfo,Turing.Model{Tuple{:theta},Tuple{:lab},getfield(Main, Symbol("###inner_function#571#13")){Array{Float64,2}},NamedTuple{(:lab,),Tuple{Array{Int64,1}}},NamedTuple{(:lab,),Tuple{Symbol}}},Turing.Sampler{HMC{Turing.Core.ForwardDiffAD{40},Any}}},Real},Float64,10},2}) at /Users/alahmadgaidasaad/.julia/packages/Flux/8XpDt/src/layers/basic.jl:31 (repeats 4 times)
:
:
However, if I use the above code as it is (not using softmax
). I’m getting the following error:
ERROR: ArgumentError: Categorical: the condition isprobvec(p) is not satisfied.
Stacktrace:
[1] macro expansion at /Users/alahmadgaidasaad/.julia/packages/Distributions/WHjOk/src/utils.jl:6 [inlined]
[2] Type at /Users/alahmadgaidasaad/.julia/packages/Distributions/WHjOk/src/univariate/discrete/categorical.jl:27 [inlined]
[3] Categorical(::Array{Float64,1}) at /Users/alahmadgaidasaad/.julia/packages/Distributions/WHjOk/src/univariate/discrete/categorical.jl:38
[4] macro expansion at /Users/alahmadgaidasaad/.julia/packages/Turing/sghu6/src/core/compiler.jl:43 [inlined]
:
:
Theoretically, the softmax defined by this line xs -> exp.(xs) / sum(exp.(xs))
should already satisfy isprobvec
, since the sum of the output of the network should be 1 and their values should be between 0 and 1. I tried to debug this by checking the isprobvec
using the following codes:
theta = rand(MvNormal(zeros(63), sigma .* ones(63)), 1)
W0, b0, W1, b1, W2, b2 = weights(theta[:, 1])
model = Chain(
Dense(W0, b0, tanh),
Dense(W1, b1, tanh),
Dense(W2, b2, σ),
xs -> exp.(xs) / sum(exp.(xs))
)
model(inputs[1, :])
# 3-element Array{Float64,1}:
# 0.5650137463877715
# 0.22707964752865994
# 0.2079066060835685
Now if we check the sum and isprobvec
, we have the following:
sum(model(inputs[1, :]))
# 0.9999999999999999
isprobvec(model(inputs[1, :]))
# true
Though the sum is not 1.0, but that’s already it theoretically. In fact, the isprobvec
confirmed it to be true already. However, if I use the softmax
function instead, we have:
theta = rand(MvNormal(zeros(63), sigma .* ones(63)), 1)
W0, b0, W1, b1, W2, b2 = weights(theta[:, 1])
model = Chain(
Dense(W0, b0, tanh),
Dense(W1, b1, tanh),
Dense(W2, b2, σ),
softmax
)
model(inputs[1, :])
# 3-element Array{Float64,1}:
# 0.5730746731083979
# 0.2157227988217102
# 0.21120252806989193
sum(model(inputs[1, :]))
# 1.0
isprobvec(model(inputs[1, :]))
# true
We get exactly 1.0 instead of 0.999999999. I want to use xs -> exp.(xs) / sum(exp.(xs))
instead of softmax
since I don’t have to bother with the missing method (see above error), but I’m not sure why isprobvec
is not satisfied for the Categorical distribution, when in fact it is already true
as seen in my initial debugging. Am I missing something? Thanks.