Autograd in Flux

Hello All-
I need to compute the gradient of output of a neural network w.r.t input. In my case output is vector and input is also vector. I was trying to use gradient routine from Flux package but it was complaing that out put is not scaler.I will appreciate if can help me out with this.

Thank you very much!

Raj

I guess this may help:

Hello!

It’s difficult to diagnose the issue here without a MWE, can you read through the instructions in PSA: make it easier to help you first?

Thanks! Sure
I wrote following routine

function net_f(x, t, w, b; act=σ)
    u_nn = net_u(x, t, w, b; act=σ) # u_nn is of size (25600, 1)
    ## x, t, w and b are initialized as parameters
    Tracker.back!(u_nn, ones(size(x)[1]))
    u_x = Tracker.grad(x)
    u_t = Tracker.grad(t)
    return u_x, u_t
end

I need to compute the gradient of u_nn w.r.t. t and x, It is kind of computing partial derivative of output of neurl network wrt to input. In this implementation, I don’t know why if I am getting same values of u_x and u_t. I am not sure where I am doing the mistakes.I will appreciate your help.

Thanks!
Raj

This seems like you are using Flux with a Tracker.jl ad. Is this intentional? The current version of Flux uses Zygote AD, which has a different syntax.

Which version of Flux do you use?

@Tomas_Pevny Yes I am using Tracker.jl and using Flux.jl. I can update that to latest version. Thanks!

Also, The flux version 0.9.0.

Any particular reason for using 0.9.0?

no not at all.I can update it.

@Tomas_Pevny Any pointers towards my question. Will really appreciate. Thanks!

As the error complains. The input for the gradient must be a scalar function (like a L2 loss eg.).

Did you have a look at the recent Flux documentation? I believe it’s well explained there.
And as mentioned above, without a MWE it’s really hard to help you :roll_eyes:

1 Like

Please, add MWE.

1 Like

@roflmaostc @Tomas_Pevny Here is my entire code.base. Thank you for your help.

using Revise
using MAT
using Flux
using Flux.Tracker
using Random
Random.seed!(1234)

push!(LOAD_PATH, "/Users/eklavya/WORK_RAJ/ML_JL/pinn_julia/src")
using pinn_subroutine
fh = matopen("/Users/eklavya/WORK_RAJ/ML_JL/pinn_julia/data/burgers_shock.mat")
x=read(fh, "x")
t=read(fh, "t")
X=x'.* ones(length(t))
T=ones(length(x))' .* t
xu = collect(Iterators.flatten(X))
tu = collect(Iterators.flatten(T))
uexact = real(read(fh, "usol"))
layers = [2, 20, 20, 1]
w,b = initialize_nn(layers)
X = hcat(xu, tu)
X = X'
u_nn = net_u(xu,tu, w, b; act=σ)

u_x, u_t =  net_f(xu, tu, w, b; act=σ)

Subroutines

function net_f(x, t, w, b; act=σ)
    u_nn = net_u(x, t, w, b; act=σ) # u_nn is of size (25600, 1)
    ## x, t, w and b are initialized as parameters
    Tracker.back!(u_nn, ones(size(x)[1]))
    u_x = Tracker.grad(x)
    u_t = Tracker.grad(t)
    return u_x, u_t
end

function xavier_init(l)
    in_dim = l[1]
    out_dim = l[2]
    xavier_stddev = sqrt(2.0/(in_dim + out_dim))
    distMethod=rand(Truncated(Normal(0.0, xavier_stddev), -Inf, Inf))
    w=[distMethod for i in 1:in_dim, j in 1:out_dim]
    b=zeros(1, out_dim)
    return w, b
end

function initialize_nn(layer)
    w=Any[]
    b=Any[]
    for i in 1:length(layer)-1
        wl, bl= xavier_init([layer[i],layer[i+1]])
        push!(w, wl)
        push!(b, bl)
    end
    return w, b
end

function net_u(x, t, w, b; act=σ)
    X = hcat(x, t)
    for i in 1:length(w)-1
        Y = act.(X*w[i] .+ b[i])
        X=Y
    end
    Y = X*w[end] .+ b[end]
    return Y
end

As mentioned above, you can only do a gradient of a scalar function. Second, at the moment we prefer Zygote as the AD for Flux. So you might prefer Zygote.

So what does net_f actually calculate?

It’s really hard to help here, because your problem is unclear (at least to me).

PS: By MWE we usually understand that one gets rid of unnecessary calculations and that one narrows down the problem to the core issues. I believe that many people don’t invest time to dig into several functions (which even include some data file loading) solving a complex problem.

@roflmaostc Thank you! Yes actually net_f computes the gradient of output of network with respect to input. Input of network is multidimesional and output of network is a vector. Like the one offered by tf.gradient . I will prepare a brief working example and post it. Thank you for looking into it.

Thanks!