This is a follow-up to a half-dozen posts in the past couple of weeks. I’ve had some amazing help from very generous members of this community; it’s allowed a newbie like me to make some progress!
The specific posts this builds on are listed below. But I think the sum is greater than the parts and I wanted to share in one place the working code that I’ve assembled, in case it’s of any use to others, or it provides some example code to others who are new to Julia.
The code fits three different outputs and plots the data, the overlaid fits, and the loss as the training runs. The final layer is a custom layer (“FlexLayer”) that is fairly unique: it can accept an array of activation functions, so that each output has a different activation function applied to it. (This feature is used in the code to enforce a non-negativity constraint on one of the outputs.) It also can accept an ancillary 2D numeric array that can be used however one sees fit within the layer. (This feature is used to “un-normalize” each output, which is useful if the target had been scaled from some min/max range to [0,1].)
The code is included in the block below. If you have any suggestions, comments, etc., I’m all ears. I still have a lot to learn. Following the code block, there’s another block that calls a different demo according to the chosen option.
using Distributions: Normal
using Tullio, ForwardDiff, Zygote
using Flux
using Flux: mse, shuffle, throttle, @treelike, Params #(Params uses Zygote)
using Base.Iterators: partition
using Plots
##### DATA #####################
num_samples = 50
x_noise_std = 0.01
y_noise_std = 0.1
function generate_data()
x = reshape(range(0, stop=1.5π, length=num_samples), num_samples, 1)
y_noise = rand(Normal(0,y_noise_std), 3, num_samples)
# 3 outputs
y = hcat(
(sin.(x).^2 .-0.2 .+ y_noise[1,:]),
cos.(x) .+ y_noise[2,:],
cos.(2x).*sin.(x) .+ y_noise[3,:]
)
x = x'
y = y'
return x, y
end
X, Y = generate_data() # Training data of shape (3,num_samples)
##### CALLBACK & PLOTS #################
LossLog = Array{Float64, 1}()
LossLog_T = Array{Int, 1}()
function evalcb()
global LossLog
global LossLog_T
loss_value = loss(X, Y)
push!(LossLog, loss_value)
push!(LossLog_T, length(LossLog))
if mod(length(LossLog_T),100)==1
live_plots()
end
end
function plot_with_fit(x,y,yfit,label)
return plot([x' x'], [y' yfit'], color=[:black :red], lw=[0 2], marker=[:circle :none],alpha=0.7,legend=false,ylabel=label)
end
function live_plots()
NumPlots = size(Y,1)
p_arr = Array{Plots.Plot{Plots.GRBackend},1}(undef, NumPlots)
for i in 1:NumPlots
p_arr[i] = plot_with_fit(X, Y[i:i,:], m(X)[i:i,:], "y"*string(i))
end
p_stack = plot(p_arr...,layout=(NumPlots,1))
p_loss = plot(LossLog_T, LossLog,yscale=:log10,legend=false,ylabel="Loss")
IJulia.clear_output(true)
p = plot(p_stack, p_loss, layout=(1,2))
display(p)
end
##### AUXILLARY FUNCTIONS #################
"Applies each element of the array fs of functions to all columns
of the corresponding row.
See https://discourse.julialang.org/t/mutating-arrays-not-supported/42123/6?u=jmurray"
rowmap(fs, x) = @tullio y[r,c] := begin
f = getindex(fs, r)
@inbounds f(x[r,c])
end grad=Dual;
"Applies softplus to rows of x for which non-negativity is required, as specified
by whether the corresponding element of nonneg is > 0. (The number of elements in nonneg
should match the number of rows in x.)"
rowwise_nonneg(nonneg, x) = @tullio y[r,c] := begin
nn_r = getindex(nonneg, r)
@inbounds nn_r > 0. ? softplus(x[r,c]) : x[r,c]
end grad=Dual;
"Applies a row-dependent normalization to each element of x such that
x_norm == (x - min) / (max - min) falls between 0 and 1. The array m
has as many rows as x and stores in its 2 columns the min and max (in that
order) value to which each value in that row is to be normalized. (Columns
in excess of 2 are allowed but ignored.)"
rowwise_norm(m, x) = @tullio y[r,c] := begin
(x[r,c] - m[r,2])/(m[r,1] - m[r,2])
end grad=Dual;
"Applies a row-dependent un-normalization to each element of x.
Inverse of rowwise_norm(); see notes for rowwise_norm.
The matrix m consists of either 2 or 3 columns; the third column, if supplied, is
interpreted as a boolean telling us whether to clip negative values
to zero for the corresponding rows. Thus if we unnormalize after applying
a nonnegativity constraint via an activation function (softplus) we still
can retain nonnegativity."
function rowwise_unnorm(m, x)
if size(m,2)==2
@tullio y[r,c] := x[r,c]*(m[r,1] - m[r,2]) + m[r,2] grad=Dual;
else
@tullio y[r,c] := begin
val = x[r,c]*(m[r,1] - m[r,2]) + m[r,2]
m[r,3]>0 && val < 0. ? 0. : val
end grad=Dual;
end
end
##### CUSTOM FLEXIBLE LAYER #########
mutable struct FlexLayer{F,S<:AbstractArray,T<:AbstractArray,fa<:Array{Function,1},num_a<:Array{Float64,2}}
W::S
b::T
σ::F
fcn_array::fa # array of functions; used for per-row activation functions
num_array::num_a # ancillary array for normalization or other purposes; must be batch-size independent!
# If output of layer has size (n_out, dataset_length), then to include a multiplier and
# offset for each output, num_array should be of size (n_out, 2). It can certainly be
# re-purposed, but as FlexLayer is written currently, num_array is used for un-normalizing
# each row and it should be of size (n_out, 2) or (n_out, 3), with the 1st and 2nd columns
# holding the min and max values to which the output should be unnormalized. A 3rd column,
# if present, should be 0 or 1 according to whether unnormalized values for that row should
# be clipped to zero if negative. (This helps retain any desired nonnegativity while unnormalizing.)
end
# Three useful "sub-classes" / special cases of FlexLayer:
#
# Apply different functions to each output, but do not do any unnormalization
#FlexLayer(W, b, fcn_array) = FlexLayer(W, b, fcn_array::Array{Function,1}, min_max::Array{Float64,2}(), identity)
FlexLayer(in::Integer, out::Integer, fcn_array) = FlexLayer(in::Integer, out::Integer, fcn_array::Array{Function,1}, min_max::Array{Float64,2}(), identity)
#
# Unnnormalize each row, without any activation function
#UnnormLayer(W, b, min_max) = FlexLayer(W, b, Array{Function,1}(), min_max::Array{Float64,2}, identity)
UnnormLayer(in::Integer, out::Integer, min_max) = FlexLayer(in::Integer, out::Integer, Array{Function,1}(), min_max::Array{Float64,2}, identity)
#
# Enforce non-negativity constraint at rows indicated by row_is_nonneg boolean array; applies softplus to those rows, identity elsewhere
# Using softplus keeps output non-negative without depressing fits to peaks (as sigmoid would do)
#NonnegLayer(W, b, row_is_nonneg::Array{Bool,1}) = FlexLayer(W, b, [ans ? softplus : identity for ans in row_is_nonneg]::Array{Function,1}, Array{Float64}(undef, 0, 2), identity)
NonnegLayer(in::Integer, out::Integer, row_is_nonneg::Array{Bool,1}) = FlexLayer(in::Integer, out::Integer,[ans ? softplus : identity for ans in row_is_nonneg]::Array{Function,1}, Array{Float64}(undef, 0, 2), identity)
function FlexLayer(in::Integer, out::Integer, fcn_array::Array{Function,1}, num_array::Array{Float64,2}, σ=identity)
return FlexLayer(randn(out, in), randn(out), σ, fcn_array, num_array)
end
Flux.@functor FlexLayer # makes trainable
"Custom dense layer having the option of different activations functions for each output.
This is especially useful for applying non-negativity constraints to certain outputs but
could be useful in other circumstances. Also included is the option to pass an ancillary 2D array,
which could be used for any purpose but is here used for per-row un-normalization of the output.
Note: If both are to be used, the relative order between un-normalization and applying activation
functions matters. If un-normalization is applied first, then the (possibly large) outputs will be
in the wings of the activation function and will 'kill the gradient', which is bad for training.
For this reason, activation functions are applied before unnormalization. However, un-normalization
will remove any non-negativity constraint enforced by the (softplus) activation function. This can
be corrected: rather than passing a max_min array with two columns, pass in one with three columns,
the third column being one or zero according, respectively, to whether negative values (after
unnormalization) should or should not be set to zero. See rowwise_unnorm()."
function (a::FlexLayer)(x::AbstractArray)
x_out = a.W * x .+ a.b
if length(a.fcn_array)==size(x_out, 1)
x_out = rowmap(a.fcn_array, x_out)
else
x_out = a.σ.(x_out)
end
if size(a.num_array,1)==size(x_out, 1)
x_out = rowwise_unnorm(a.num_array, x_out)
end
return x_out
end
loss(x, y) = mse(m(x), y)
# @treelike FlexLayer # some say to use @treelike, but it's not used in the Flux definition of Dense
"Demonstration of FlexLayer and special cases.
scale_data: if true, first row has large values (-200 to 800); otherwise, between approx. -0.2 and 0.8.
use_unnorm: if true, unnormalize scaled data in first row.
enforce_nonneg: if true, enforce nonnegativity on first row."
function FlexLayerDemo(mode::Integer)
#scale_data::Bool, use_unnorm::Bool, enforce_nonneg::Bool
scale_data, use_unnorm, enforce_nonneg, act_fcn_demo = (mode & 2^i > 0 for i in 0:8)
global X, Y, m
global LossLog, LossLog_T
LossLog = []
LossLog_T = []
X, Y = generate_data() # Training data of shape (3,num_samples)
##### MODEL & TRAINING #####################
n = 10 # # neurons
act_f = tanh
if scale_data
Y[1,:] .*= 1000.
end
unnorm_min, unnorm_max = extrema(Y[1,:])
if act_fcn_demo
last_layer = FlexLayer(n, 3, [relu, sigmoid, tanh], [0. 1.; 0. 1.; 0. 1.])
else
if use_unnorm && enforce_nonneg
# Because the non-negativity-inducing activation function is applied BEFORE un-normalization
# (so as not to kill the gradient), un-normalization will remove the non-negativity constraint.
# But by using a min_max array with *three* columns, the third column indicating whether negative
# values should be set to 0, we recover non-negativity (though it does not go to zero smoothly)
last_layer = FlexLayer(n, 3, [softplus, identity, identity], [unnorm_min unnorm_max 1.; 0. 1. 0.; 0. 1. 0.])
elseif use_unnorm
last_layer = UnnormLayer(n, 3, [unnorm_min unnorm_max; 0. 1.; 0. 1.])
elseif enforce_nonneg
last_layer = NonnegLayer(n, 3, [true, false, false])
else
last_layer = Dense(n, 3)
end
end
m = Chain(Dense(size(X, 1), n, act_f), Dense(n, n, act_f), last_layer)
opt = ADAM()
println("Data Set Size: X: ", size(X), ", Y: ", size(Y))
batchsize = 5
NumEpochs = 2000
for epoch in 1:NumEpochs
dataset = [(X[:, i], Y[:, i]) for i in partition(shuffle(1:size(X, 2)), batchsize)] # create mini-batches
Flux.train!(loss, Flux.params(m), dataset, opt; cb=throttle(evalcb, 0.1))
end
# To use DataLoader instead, declare
# using Flux.Data: DataLoader
# and define
# train_loader = DataLoader(X, Y, batchsize=batchsize, shuffle=true, partial=false)
# then replace the inner contents of the for epoch loop with:
# for (x, y) in train_loader
# Flux.train!(loss, Flux.params(m), train_loader, opt, cb=throttle(evalcb, 0.1))
# end
end
To try this out, run the code block below for different values of option
(read the comments for each option prior to running). It’s suggested that they be tried in order (0, 1, …, 5). (Note that a few options give intentionally bad results but are included to illustrate some effect.)
# A few cases of interest, described below. Set option to a value between 0 and 5. Instructive to proceed in order.
option = 0
scale_data, use_unnorm, enforce_nonneg, act_fcn_demo = false, false, false, false
if option==0
# ordinary, Dense layer
scale_data, use_unnorm, enforce_nonneg = false, false, false
elseif option==1 # INTENTIONALLY BAD
# Example of using different application functions for each output. The choice AREN'T *good* choices;
# but you can see that different functions are applied to each. In this example, the other options are ignored.
act_fcn_demo = true
elseif option==2
# Example of non-negativity constraint applied to first output.
scale_data, use_unnorm, enforce_nonneg = false, false, true
elseif option==3 # INTENTIONALLY BAD
# ordinary, Dense layer, with first-row data intentionally having values well outside range of activation
# functions; will NOT work well, so we progress to option 4...
scale_data, use_unnorm, enforce_nonneg = true, false, false
elseif option==4
# Unnorm layer, so that output of first row is unnormalized prior to calcuating loss wwith respect to
# large-valued first-row data. (In practice, because the values between -200 to 800 give absolute errors
# much larger than the absolute errors from the outputs having values between -1 and 1, training will
# prefer to fit the first output at the expense of the others.)
scale_data, use_unnorm, enforce_nonneg = true, true, false
elseif option==5
# Same as option 2, but also constrain first output to be non-negative. More discussion in code.
scale_data, use_unnorm, enforce_nonneg = true, true, true
end
mode = sum([(x ? 1 : 0)*2^(i-1) for (i, x) in enumerate([scale_data, use_unnorm, enforce_nonneg, act_fcn_demo])])
FlexLayerDemo(mode)
The posts that were instrumental in building this up are listed below, along with the usernames of many who have helped. Any mistakes, inefficiencies, or exhibitions of bad coding style are entirely my own (and again, crrections are welcome; they’ll help me learn!)
Flux: Custom Layer (6/15/20)
Thanks to @LudiWin for the initial example and @contradict for finding my bug and pointing me to Flux.DataLoader
(Here I followed up with a simple working custom layer – one which enforces non-negativity for (all) outputs.)
Array of Plot objects (6/18/20)
For an n-output Flux model, one can plot each output (overlayed on the corresponding target), for arbitrary n.
Thanks to @rdeits and @DrPapa for the syntax correction.
Flux: Custom Training and Logging (6/18/20)
Thanks to @contradict and @oxinabox for getting the docs updated for the custom training example in Flux.
Row-wise function application (different functions) (6/25/20)
Thanks to @contradict for a one-liner approach to applying an array of functions, one to each row of an array. And to @mforets for a fast solution using view
and map!
.
Mutating arrays not supported (6/26/20)
A huge thanks to @mcabbott for his Tullio
package and his extensive help that enabled me to make changes to arrays inside a custom layer (including row-wise function application). This is what allowed my FlexLayer
example to actually come together. His help with extending rowwise_unnorm
to work with a nonnegativity condition was also appreciated.
Finally, a big thanks goes to @rajnrao for getting me started in all this…and for much offline help.