Distance optimization over a matrix of observations


#1

Given a MxN matrix A with M observations and N features, as well as a non-linear distance function f, with observations being identical (f(A[i, :], A[j, :]) == 0.0 for each i,j). I would like to disturb the observations such that they have a desired target distance D to each other with regards to f. Additionally, I would like to only optimize values in A that are not zero to begin with.

I thought it would be nice to try JuMP for this, but I’m unsure about the correct specification. I started like this:

using JuMP, Ipopt

# dummy data with 2 observations
A = [0.0 0.0 0.2 0.8;
     0.0 0.4 0.0 0.6]
m = Model(solver=IpoptSolver(print_level=0))

M, N = size(A)

#target distance
D = 0.1

# set bounds for all variables, such that variables corresponding to zeroes in A
# have no wiggle room
minval = minimum(A[.!iszero.(A)])
maxval = maximum(A)

@variable(m, x[1:M, 1:N])
for i in 1:M, j in 1:N
    if iszero(A[i,j])
        lb = ub = 0.0
    else
        lb = minval
        ub = maxval
    end
    setlowerbound(x[i,j], lb)
    setupperbound(x[i,j], ub)
end

The objective would then ideally be something like

function obj(x, D=D)
    M = size(x, 1)
    # all pairwise comparisons between observations, w/o repetitions
    obj_val = 0.0
    for i1 in 1:M-1
        for i2 in i1+1:M
            obj_val += f(x[i1, :], x[i2, :]) - D
        end
    end
    obj_val
end

JuMP.register(m, :obj, 1, obj, autodiff=true)
@NLobjective(m, :Min, obj(x))

However, this fails with the following stacktrace:

as far as I can figure because user defined functions need scalars, i.e. cannot work with expressions including arrays of variables. What would be the canonical way to express this?


#2

This is not what you asked, but you probably do not want obj_val += f(x[i1, :], x[i2, :]) - D, but rather something like obj_val += abs(f(x[i1, :], x[i2, :]) - D) or obj_val += abs2(f(x[i1, :], x[i2, :]) - D).


#3

Good catch, the current code would fail for small distances. Thanks!


#4

Why not use Ipopt.jl directly?

Or if your problem is box constrained only, you can also try Optim.jl which has a first order box constrained solver.

And if you don’t want to define the derivatives yourself, you may find DiffEqDiffTools, ForwardDiff, or ReverseDiff useful. Optim lets you use the first 2 via a simple keyword argument. Check the docs for more details.


#5

I was having an admittedly brief look at Optim.jl before and didn’t see a clear way to optimize over matrices of variables, all examples I could find were optimizing 1D arrays of parameters (hence going for JuMP). But I can have a deeper look at this if you think that’s possible, or perhaps you know an example? If I understand the Ipopt.jl example right, it also seems to optimize a 1D vector of 4 parameters, unfortunately not quite what I need.


#6

You can think and code mostly in matrices but reshape it and from to a vector at the boundaries of optimization. Reshaping is cheap because the vector and the matrix will be sharing the same data so no copying happens. This does the trick but is a bit slow.

using SparseArrays

A = Array(sprand(20, 5, 0.1))

minval = minimum(A[.!iszero.(A)])
maxval = maximum(A)

M, N = size(A)
D = 0.1

X0 = copy(A)
lb = copy(A)
ub = copy(A)
for i in 1:M, j in 1:N
    if iszero(A[i,j])
        lb[i,j] = -sqrt(eps(0.0))
        ub[i,j] = sqrt(eps(0.0))
    else
        lb[i,j] = minval - sqrt(eps(0.0))
        ub[i,j] = maxval + sqrt(eps(0.0))
    end
end

using Distances

f(x1, x2) = evaluate(Euclidean(), x1, x2)

function obj(x_vec, D=D)
	global M, N
    X = reshape(x_vec, (M, N))
    # all pairwise comparisons between observations, w/o repetitions
    obj_val = 0.0
    for i1 in 1:M-1
        for i2 in i1+1:M
            @views obj_val += f(X[i1, :], X[i2, :]) - D
        end
    end
    obj_val
end

using Optim

result = optimize(obj, vec(lb), vec(ub), vec(X0), Fminbox(LBFGS()))
X = reshape(result.minimizer, (M, N))

f(X, A)

#7

I see, reshaping is the way to go. Thanks for this nice solution! I got rid of the performance-critical type instability in obj through a function barrier, added the previously mentioned abs and had to simplify the boundary conditions for now because it refused to reliably converge, but now I pretty much got what I need. I will play a bit more with the bounds, but may do without them after all.

Here is the final code for future reference, using the distance function I was interested in (BrayCurtis):

A = [0.0 0.0 0.2 0.8;
     0.0 0.4 0.0 0.6]

M, N = size(A)
minval = 0.0
maxval = 1.0

M, N = size(A)
D = 0.1

X0 = copy(A)
lb = copy(A)
ub = copy(A)
for i in 1:M, j in 1:N
    lb[i,j] = minval - sqrt(eps(0.0))
    ub[i,j] = maxval + sqrt(eps(0.0))
end

using Distances

f(x1, x2) = evaluate(BrayCurtis(), x1, x2)

function obj_sub(X, D)
    M = size(X, 1)
    obj_val = 0.0
    # all pairwise comparisons between observations, w/o repetitions
    for i1 in 1:M-1
        for i2 in i1+1:M
            @views obj_val += abs(f(X[i1, :], X[i2, :]) - D)
        end
    end
    obj_val
end

function obj(x_vec)
    global M, N, D
    X = reshape(x_vec, (M, N))
    obj_sub(X, D)
end

using Optim

@time result = optimize(obj, vec(lb), vec(ub), vec(X0), Fminbox(LBFGS()))
X = reshape(result.minimizer, (M, N))
obj(result.minimizer), f(X[1, :], X[2, :])