Fitting normalizing flows with Bijectors.jl

Hi!

I’d like to fit a normalizing flow to some data using Bijectors.jl. I’m a bit lost because I couldn’t find any complete examples. This is what I managed to do so far:

using Bijectors
using KernelDensity
using StatsPlots
using ReverseDiff

function test_nf()
    # some data + plotting
    data = reduce(vcat, [(x, sin(10*x) + exp(-y)) for x in 0.0:0.001:1.0, y in 0.0:0.1:1.0])
    dens = kde(([data[i][1] for i in 1:length(data)], [data[i][2] for i in 1:length(data)]))
    plot(dens)

    # normalizing flow
    base_dist = MvNormal(zeros(2), ones(2))

    layers = reduce(∘, [
        PlanarLayer(2, ReverseDiff.track)
        for i in 1:5
    ])
    flow = transformed(base_dist, layers)

    # how to fit parameters?
end

Readme of Bijectors.jl explains how to do a forward pass or obtain a gradient but not how to update the weights. Could I maybe use Flux for optimization?

Could I maybe use Flux for optimization?

Yes:)

Bijectors.jl implements Functors.functor, which is what’s being used by Flux.jl to get the parameters. Therefore you can follow their approaches to updating the parameters.

For an objective, it depends on what you want to do, e.g. in this case I’m guessing maximum likelihood is easiest.

And we should have a tutorial for this, but we’re unfortunately still not quite there with updating our Tutorial-pipeline which is needed before we start adding more :confused:

1 Like

OK, I tried writing something myself but it doesn’t seem to converge. Do you know what the problem might be? Here is my code:

using Bijectors
using KernelDensity
using StatsPlots
using ReverseDiff
using Flux
using StatsBase

function test_nf()
    d = MvNormal(zeros(2), ones(2));

    data_tuples = reduce(vcat, [(x, sin(10*x) + exp(-y)) for x in 0.0:0.001:1.0, y in 0.0:0.1:1.0])
    data = map(x -> [x...], data_tuples)
    dens = kde(([d[1] for d in data], [d[2] for d in data]))
    plot(dens)

    base_dist = MvNormal(zeros(2), ones(2))

    layers = reduce(∘, [
        PlanarLayer(2)
        for i in 1:32
    ])
    flow = transformed(base_dist, layers)

    function nf_loss(y)
        return -sum(map(x-> logpdf(flow, x), y))
    end

    # fitting
    opt = ADAM(0.1)

    # training loop
    for i in 1:10_000
        minibatch = sample(data, 32)
        # Calculate the gradients of the parameters
        # with respect to the loss function
        grads = Zygote.gradient(() -> nf_loss(minibatch), Flux.params(flow))
      
        # Update the parameters based on the chosen
        # optimiser (opt)
        Flux.Optimise.update!(opt, Flux.params(flow), grads)
    end

    data_posterior = [rand(flow) for i in 1:10_000]
    dens_posterior = kde(([dp[1] for dp in data_posterior], [dp[2] for dp in data_posterior]))
    plot(dens_posterior)
end

I tweaked some parameters and gave it more time for fitting and it works :tada: . I’ll try polishing it a bit and put the result here.

2 Likes

I’m not sure what to put here. It works but ultimately this is a poor method since gradient calculation is extremely slow due to the paritioning of parameters into lots of very small arrays. I’ll keep trying with bijectors with more parameters per input.

2 Likes

I would appreciate if you can post here. :wink:

Best,
Honza

Well, I tried Tensorflow probability and a random FFJORD-based architecture from some example was fitted to my test distribution within minutes. I’m really impressed. And they have decent documentation. Julia is much more flexible but tensorflow may be good enough for my application.