Fitting normalizing flows with Bijectors.jl

mateuszbaran · May 1, 2021, 2:57pm

Hi!

I’d like to fit a normalizing flow to some data using Bijectors.jl. I’m a bit lost because I couldn’t find any complete examples. This is what I managed to do so far:

using Bijectors
using KernelDensity
using StatsPlots
using ReverseDiff

function test_nf()
    # some data + plotting
    data = reduce(vcat, [(x, sin(10*x) + exp(-y)) for x in 0.0:0.001:1.0, y in 0.0:0.1:1.0])
    dens = kde(([data[i][1] for i in 1:length(data)], [data[i][2] for i in 1:length(data)]))
    plot(dens)

    # normalizing flow
    base_dist = MvNormal(zeros(2), ones(2))

    layers = reduce(∘, [
        PlanarLayer(2, ReverseDiff.track)
        for i in 1:5
    ])
    flow = transformed(base_dist, layers)

    # how to fit parameters?
end

Readme of Bijectors.jl explains how to do a forward pass or obtain a gradient but not how to update the weights. Could I maybe use Flux for optimization?

torfjelde · May 3, 2021, 10:01am

Could I maybe use Flux for optimization?

Yes:)

Bijectors.jl implements Functors.functor, which is what’s being used by Flux.jl to get the parameters. Therefore you can follow their approaches to updating the parameters.

For an objective, it depends on what you want to do, e.g. in this case I’m guessing maximum likelihood is easiest.

And we should have a tutorial for this, but we’re unfortunately still not quite there with updating our Tutorial-pipeline which is needed before we start adding more

mateuszbaran · May 3, 2021, 2:34pm

OK, I tried writing something myself but it doesn’t seem to converge. Do you know what the problem might be? Here is my code:

using Bijectors
using KernelDensity
using StatsPlots
using ReverseDiff
using Flux
using StatsBase

function test_nf()
    d = MvNormal(zeros(2), ones(2));

    data_tuples = reduce(vcat, [(x, sin(10*x) + exp(-y)) for x in 0.0:0.001:1.0, y in 0.0:0.1:1.0])
    data = map(x -> [x...], data_tuples)
    dens = kde(([d[1] for d in data], [d[2] for d in data]))
    plot(dens)

    base_dist = MvNormal(zeros(2), ones(2))

    layers = reduce(∘, [
        PlanarLayer(2)
        for i in 1:32
    ])
    flow = transformed(base_dist, layers)

    function nf_loss(y)
        return -sum(map(x-> logpdf(flow, x), y))
    end

    # fitting
    opt = ADAM(0.1)

    # training loop
    for i in 1:10_000
        minibatch = sample(data, 32)
        # Calculate the gradients of the parameters
        # with respect to the loss function
        grads = Zygote.gradient(() -> nf_loss(minibatch), Flux.params(flow))
      
        # Update the parameters based on the chosen
        # optimiser (opt)
        Flux.Optimise.update!(opt, Flux.params(flow), grads)
    end

    data_posterior = [rand(flow) for i in 1:10_000]
    dens_posterior = kde(([dp[1] for dp in data_posterior], [dp[2] for dp in data_posterior]))
    plot(dens_posterior)
end

mateuszbaran · May 6, 2021, 8:36pm

I tweaked some parameters and gave it more time for fitting and it works . I’ll try polishing it a bit and put the result here.

mateuszbaran · May 7, 2021, 5:25pm

I’m not sure what to put here. It works but ultimately this is a poor method since gradient calculation is extremely slow due to the paritioning of parameters into lots of very small arrays. I’ll keep trying with bijectors with more parameters per input.

Honza9723 · May 12, 2021, 12:07pm

I would appreciate if you can post here.

Best,
Honza

mateuszbaran · May 12, 2021, 9:42pm

Well, I tried Tensorflow probability and a random FFJORD-based architecture from some example was fitted to my test distribution within minutes. I’m really impressed. And they have decent documentation. Julia is much more flexible but tensorflow may be good enough for my application.

Topic		Replies	Views
Training Bijectors.jl on samples Statistics bijectors	3	49	September 21, 2024
How to estimate likelihood functions with normalizing flows in Julia General Usage statistics , flux , sciml , diffeqflux	7	520	January 3, 2023
Why doesn't this normalizing flow example work? General Usage flux , diffeqflux	2	449	December 29, 2022
TransformVariables.jl vs Bijectors.jl Tooling	17	2102	March 19, 2020
Minibatching with FFJORD Machine Learning flux , optimization , diffeqflux	1	396	December 28, 2022

Fitting normalizing flows with Bijectors.jl

Related topics