Autograd in Flux

Khemraj_Shukla · March 3, 2021, 1:36am

Hello All-
I need to compute the gradient of output of a neural network w.r.t input. In my case output is vector and input is also vector. I was trying to use gradient routine from Flux package but it was complaing that out put is not scaler.I will appreciate if can help me out with this.

Thank you very much!

Raj

asyrov · March 3, 2021, 1:50am

I guess this may help:

ToucheSir · March 3, 2021, 2:27am

Hello!

It’s difficult to diagnose the issue here without a MWE, can you read through the instructions in Please read: make it easier to help you first?

Khemraj_Shukla · March 3, 2021, 6:13am

Thanks! Sure
I wrote following routine

function net_f(x, t, w, b; act=σ)
    u_nn = net_u(x, t, w, b; act=σ) # u_nn is of size (25600, 1)
    ## x, t, w and b are initialized as parameters
    Tracker.back!(u_nn, ones(size(x)[1]))
    u_x = Tracker.grad(x)
    u_t = Tracker.grad(t)
    return u_x, u_t
end

I need to compute the gradient of u_nn w.r.t. t and x, It is kind of computing partial derivative of output of neurl network wrt to input. In this implementation, I don’t know why if I am getting same values of u_x and u_t. I am not sure where I am doing the mistakes.I will appreciate your help.

Thanks!
Raj

Tomas_Pevny · March 3, 2021, 6:57am

This seems like you are using Flux with a Tracker.jl ad. Is this intentional? The current version of Flux uses Zygote AD, which has a different syntax.

Which version of Flux do you use?

Khemraj_Shukla · March 3, 2021, 7:19am

@Tomas_Pevny Yes I am using Tracker.jl and using Flux.jl. I can update that to latest version. Thanks!

Khemraj_Shukla · March 3, 2021, 7:21am

Also, The flux version 0.9.0.

Tomas_Pevny · March 3, 2021, 7:30am

Any particular reason for using 0.9.0?

Khemraj_Shukla · March 3, 2021, 7:36am

no not at all.I can update it.

Khemraj_Shukla · March 5, 2021, 7:46am

@Tomas_Pevny Any pointers towards my question. Will really appreciate. Thanks!

roflmaostc · March 5, 2021, 9:35am

As the error complains. The input for the gradient must be a scalar function (like a L2 loss eg.).

Did you have a look at the recent Flux documentation? I believe it’s well explained there.
And as mentioned above, without a MWE it’s really hard to help you

Tomas_Pevny · March 5, 2021, 10:11am

Please, add MWE.

Khemraj_Shukla · March 5, 2021, 7:35pm

@roflmaostc @Tomas_Pevny Here is my entire code.base. Thank you for your help.

using Revise
using MAT
using Flux
using Flux.Tracker
using Random
Random.seed!(1234)

push!(LOAD_PATH, "/Users/eklavya/WORK_RAJ/ML_JL/pinn_julia/src")
using pinn_subroutine
fh = matopen("/Users/eklavya/WORK_RAJ/ML_JL/pinn_julia/data/burgers_shock.mat")
x=read(fh, "x")
t=read(fh, "t")
X=x'.* ones(length(t))
T=ones(length(x))' .* t
xu = collect(Iterators.flatten(X))
tu = collect(Iterators.flatten(T))
uexact = real(read(fh, "usol"))
layers = [2, 20, 20, 1]
w,b = initialize_nn(layers)
X = hcat(xu, tu)
X = X'
u_nn = net_u(xu,tu, w, b; act=σ)

u_x, u_t =  net_f(xu, tu, w, b; act=σ)

Subroutines

function net_f(x, t, w, b; act=σ)
    u_nn = net_u(x, t, w, b; act=σ) # u_nn is of size (25600, 1)
    ## x, t, w and b are initialized as parameters
    Tracker.back!(u_nn, ones(size(x)[1]))
    u_x = Tracker.grad(x)
    u_t = Tracker.grad(t)
    return u_x, u_t
end

function xavier_init(l)
    in_dim = l[1]
    out_dim = l[2]
    xavier_stddev = sqrt(2.0/(in_dim + out_dim))
    distMethod=rand(Truncated(Normal(0.0, xavier_stddev), -Inf, Inf))
    w=[distMethod for i in 1:in_dim, j in 1:out_dim]
    b=zeros(1, out_dim)
    return w, b
end

function initialize_nn(layer)
    w=Any[]
    b=Any[]
    for i in 1:length(layer)-1
        wl, bl= xavier_init([layer[i],layer[i+1]])
        push!(w, wl)
        push!(b, bl)
    end
    return w, b
end

function net_u(x, t, w, b; act=σ)
    X = hcat(x, t)
    for i in 1:length(w)-1
        Y = act.(X*w[i] .+ b[i])
        X=Y
    end
    Y = X*w[end] .+ b[end]
    return Y
end

roflmaostc · March 5, 2021, 7:42pm

As mentioned above, you can only do a gradient of a scalar function. Second, at the moment we prefer Zygote as the AD for Flux. So you might prefer Zygote.

So what does net_f actually calculate?

It’s really hard to help here, because your problem is unclear (at least to me).

PS: By MWE we usually understand that one gets rid of unnecessary calculations and that one narrows down the problem to the core issues. I believe that many people don’t invest time to dig into several functions (which even include some data file loading) solving a complex problem.

Khemraj_Shukla · March 5, 2021, 7:52pm

@roflmaostc Thank you! Yes actually net_f computes the gradient of output of network with respect to input. Input of network is multidimesional and output of network is a vector. Like the one offered by tf.gradient . I will prepare a brief working example and post it. Thank you for looking into it.

Thanks!