Convex.jl: slow in creating constraints?

glad that I could help;

according to your description Convex.jl needs a perfect hash function, and it’s more understandable why it had shipped its own implementation of Base.hash(::Array{AbstractExpr}, ...).
However but I doubt the old implementation is a perfect hash (the one based on Base.hash(::AbstractArray, h::UInt) certainly isn’t).

@riccardomurri Base.hash(a::AbstractExpr, h::UInt) = h should(™) work if Convex does not implement Base.hash(::Array{AbstractExpr}, h::UInt); Unless there is a different definition of hash for Array{<:AbstractExpr} in use. Take this with a grain of salt, as I don’t know the codebase of Convex, really.

Also note that hash(a::Array{AbstractExpr}, h::UInt) = rand(UInt) does not satisfy the basic requirement that equal elements have equal hash:

julia> Base.hash(x::MyType, h::UInt) = rand(UInt);

julia> Dict(MyType(1)=>i for i in 1:100)
Dict{MyType,Int64} with 66 entries:
  MyType(1) => 35
 (...)

julia> Base.hash(x::MyType, h::UInt) = h

julia> Dict(MyType(1)=>i for i in 1:100)
Dict{MyType,Int64} with 1 entry:
  MyType(1) => 100

@riccardomurri could You also benchmark

Base.hash(a::AbstractExpr, h::UInt) = hash((a.id_hash, a.head), h)
function Base.hash(a::Array{AbstractExpr}, h::UInt)
    h = xor(h, hashaa_seed, size(a))
    return reduce(xor, a, init=h)
end

(or maybe provide a MWE:)

btw, since AbstractExpr is well… abstract type :wink: wouldn’t Base.hash(a::AbstractExpr, h::UInt) = hash((typeof(a), a.id_hash, a.head), h) be more correct?

Also the other signature I believe should read function Base.hash(a::Array{<:AbstractExpr}, h::UInt) since at the moment an array of homogeneous AE<:AbstractExpr uses the Base definition for hashing arrays (which might or might not be a problem). @ericphanson?

1 Like

Hello @abulak

thans for taking the time to write an extensive comment with suggestions!

But that is not a perfect hash function, isn’t it? So even if faster, it will fail later, when Convex.jl builds the dictionary of expressions or introduce bugs wherein different expressions would be treated as being one and the same because they hash equal?

Yes, I know. It was just to prove the point that the slowness comes from hash(a::Array{AbstractExpr}, h::UInt): if its definition is replaced with a fast one, running time drops significantly. (Also, in the very particular case of my code it would even be almost always correct since all subexpressions are distinct – but of course this does not apply in the general case.)

I had to edit your suggestion:

  • use length(a) instead of size(a) which is a tuple;
  • map x->x.id_hash over a because otherwise xor() is not defined for objects of type Convex.AbstractExpr

So I benchmarked this code:

function Base.hash(a::Array{AbstractExpr}, h::UInt)
    h = h ⊻ hashaa_seed ⊻ length(a)
    return reduce(⊻, map(x->x.id_hash, a), init=h)
end

Results are comparable with my last-but-one attempt (the one with reduce(..., map(...)); I will repeat the test tomorrow since now I’m on battery but I ran the other tests on AC power and that may very well affect performance.

(or maybe provide a MWE:)

I’ll see if I can put together one tomorrow.

btw, since AbstractExpr is well… abstract type :wink: wouldn’t Base.hash(a::AbstractExpr, h::UInt) = hash((typeof(a), a.id_hash, a.head), h) be more correct?

My understanding is that a.head is a symbol and is unique to the type of atom being created so it carries the same info as typeof(a).

yes, Base.hash(a::AbstractExpr, h::UInt) = h is by no means a perfect hash :stuck_out_tongue: Given what @ericphanson wrote I don’t recommend using it.

Sorry for the non-runable code, this is what happens when one codes in a browser :stuck_out_tongue:
I meant xor(h, hashaa_seed, size(a)...), but now I think it’s better to use size which carries information on the dimension. Maybe (since it’s a constant cost per hash) use xor(hashaa_seed, hash(size(a), h))?

The point of reduce was to make sure hash is non-allocating, I haven’t checked this though.

EDIT:

abstract type AbstractExpr end

mutable struct MyBinaryExpr <: AbstractExpr
    head::Symbol
    children::Tuple{AbstractExpr, AbstractExpr}
    size::Tuple{Int, Int}
    id_hash::UInt64
end

mutable struct DummyExpr <: AbstractExpr end

MyBinaryExpr(head::Symbol) = MyBinaryExpr(head, (DummyExpr(), DummyExpr()), (rand(1:10), rand(1:10)), rand(UInt))

MyBinaryExpr() = MyBinaryExpr(Symbol(tempname()[end-5: end]))

import Base.hash
const hashaa_seed = 0x3ff1c2db0c02c500 # hash(AbstractExpr)

id_hash(x::MyBinaryExpr) = getfield(x, :id_hash)
head(x::MyBinaryExpr) = getfield(x, :head)

Base.hash(ex::AbstractExpr, h::UInt) = xor(id_hash(ex), hash(head(ex), h))::UInt

function Base.hash(A::Array{<:AbstractExpr}, h::UInt)
    h = xor(hashaa_seed, hash(size(A), h))
    return mapreduce(hash, xor, A, init=h)
end

using BenchmarkTools

exprs = [MyBinaryExpr() for _ in 1:10_000]
hash(exprs); @time hash(exprs)

@btime hash($(first(exprs)), $(rand(UInt)))
@btime hash($(exprs), $(rand(UInt)))

giving

  0.000227 seconds (5 allocations: 176 bytes)
7.557 ns (0 allocations: 0 bytes)
59.170 μs (0 allocations: 0 bytes)

Seems like this version fails Convex.jl’s some of tests (I pushed a commit with this version to the hashing branch, https://github.com/JuliaOpt/Convex.jl/pull/369 so once CI finishes you should be able to see that). I put in const hashaa_seed = hash(AbstractExpr) directly, could that cause a problem? By the way, what is the point of the hashaa_seed? We have that in Convex’s current version too, but I don’t know what it’s for.

I guess hashaa_seed (btw. what is the origin of the name?) was used to “mix” into hash the information that we’re hashing AbstractExprs and not e.g. an array of tuples (id_hash, head)s.
I think the more correct usage of it would be to mix it directly into hash of a single AbstractExpr, e.g.

Base.hash(ex::AbstractExpr, h::UInt) = xor(hashaa_seed, id_hash(ex), hash(head(ex), h))::UInt

As to why tests fail, I have no idea :stuck_out_tongue: it’s this code that fails:

@add_problem sdp function sdp_kron_atom(handle_problem!, ::Val{test}, atol, rtol, ::Type{T}) where {T, test}
    id = eye(4)
    X = Semidefinite(4)
    W = kron(id, X)
    p = maximize(tr(W), tr(X) ≤ 1; numeric_type = T)

    if test
        @test vexity(p) == AffineVexity()
    end
    handle_problem!(p)
    if test
        @test p.optval ≈ 4 atol=atol rtol=rtol
    end
end

do you have an idea how does it interact with the changes in hash?
btw. what is the actual optimization problem here?

I’m not sure; git blame tracks it back to add custom hash function for expressions · jump-dev/Convex.jl@948b893 · GitHub which was all the way back in v0.0.5.

The problem is

using Convex, LinearAlgebra
id = I(4)
X = Semidefinite(4)
W = kron(id, X)
p = maximize(tr(W), tr(X) ≤ 1)

i.e. maximize \operatorname{tr}(I \otimes X) subject to X \succeq 0, \operatorname{tr}(X) \leq 1. Since \operatorname{tr}(I \otimes X) = \operatorname{tr}(I) \operatorname{tr}(X) = 4 \operatorname{tr}(X), the solution is trivially 4, but this problem tests that we implemented kron correctly (well, correctly enough that you get 4 here, at least).

The hashes are coming into play in how Convex represents the problem.

julia> Convex.MAXDEPTH[] = 20 # print more
20

julia> p
maximize
└─ sum (affine; real)
   └─ diag (affine; real)
      └─ transpose (affine; real)
         └─ hcat (affine; real)
            ├─ transpose (affine; real)
            │  └─ transpose (affine; real)
            │     └─ hcat (affine; real)
            │        ├─ transpose (affine; real)
            │        │  └─ transpose (affine; real)
            │        │     └─ hcat (affine; real)
            │        │        ├─ transpose (affine; real)
            │        │        │  └─ hcat (affine; real)
            │        │        │     ├─ hcat (affine; real)
            │        │        │     │  ├─ hcat (affine; real)
            │        │        │     │  │  ├─ * (affine; real)
            │        │        │     │  │  │  ├─ true
            │        │        │     │  │  │  └─ 4×4 real variable (id: 140…445)
            │        │        │     │  │  └─ * (affine; real)
            │        │        │     │  │     ├─ false
            │        │        │     │  │     └─ 4×4 real variable (id: 140…445)
            │        │        │     │  └─ * (affine; real)
            │        │        │     │     ├─ false
            │        │        │     │     └─ 4×4 real variable (id: 140…445)
            │        │        │     └─ * (affine; real)
            │        │        │        ├─ false
            │        │        │        └─ 4×4 real variable (id: 140…445)
            │        │        └─ transpose (affine; real)
            │        │           └─ hcat (affine; real)
            │        │              ├─ hcat (affine; real)
            │        │              │  ├─ hcat (affine; real)
            │        │              │  │  ├─ * (affine; real)
            │        │              │  │  │  ├─ false
            │        │              │  │  │  └─ 4×4 real variable (id: 140…445)
            │        │              │  │  └─ * (affine; real)
            │        │              │  │     ├─ true
            │        │              │  │     └─ 4×4 real variable (id: 140…445)
            │        │              │  └─ * (affine; real)
            │        │              │     ├─ false
            │        │              │     └─ 4×4 real variable (id: 140…445)
            │        │              └─ * (affine; real)
            │        │                 ├─ false
            │        │                 └─ 4×4 real variable (id: 140…445)
            │        └─ transpose (affine; real)
            │           └─ hcat (affine; real)
            │              ├─ hcat (affine; real)
            │              │  ├─ hcat (affine; real)
            │              │  │  ├─ * (affine; real)
            │              │  │  │  ├─ false
            │              │  │  │  └─ 4×4 real variable (id: 140…445)
            │              │  │  └─ * (affine; real)
            │              │  │     ├─ false
            │              │  │     └─ 4×4 real variable (id: 140…445)
            │              │  └─ * (affine; real)
            │              │     ├─ true
            │              │     └─ 4×4 real variable (id: 140…445)
            │              └─ * (affine; real)
            │                 ├─ false
            │                 └─ 4×4 real variable (id: 140…445)
            └─ transpose (affine; real)
               └─ hcat (affine; real)
                  ├─ hcat (affine; real)
                  │  ├─ hcat (affine; real)
                  │  │  ├─ * (affine; real)
                  │  │  │  ├─ false
                  │  │  │  └─ 4×4 real variable (id: 140…445)
                  │  │  └─ * (affine; real)
                  │  │     ├─ false
                  │  │     └─ 4×4 real variable (id: 140…445)
                  │  └─ * (affine; real)
                  │     ├─ false
                  │     └─ 4×4 real variable (id: 140…445)
                  └─ * (affine; real)
                     ├─ true
                     └─ 4×4 real variable (id: 140…445)
subject to
└─ <= constraint (affine)
   ├─ sum (affine; real)
   │  └─ diag (affine; real)
   │     └─ 4×4 real variable (id: 140…445)
   └─ 1

status: `solve!` not called yet

So this is the tree of atoms and their children representing the objective and the constraints. Each object in this tree can be represented as a “conic form template”, meaning as an affine expression of some variables (possibly vector-valued) subject to conic constraints of the form “an affine function of the variables is a member of a cone” for a given set of cones (SDP, SOCP, exp, non-negative, I believe). By recursively computing the conic form template for each child and subsituting them in each other, you end up with a single such conic form, an affine function of variables subject to these affine conic constraints.

What Convex does to save time is each time it goes to compute a conic form template, it checks if one has already been computed within the same problem. E.g. in the tree, we see

 * (affine; real)
 ├─ false
 └─ 4×4 real variable (id: 140…445)

show up in several places, and its conic form template would be only computed once. The way it does that is it saves these templates in a dictionary keyed by the tuple (a.head, a.id_hash), and id_hash’s are computed via hash (usually the hash of the children).

So what’s going wrong? I think the line

Base.hash(ex::AbstractExpr, h::UInt) = xor(id_hash(ex), hash(head(ex), h))::UInt

is causing hash collisions. If I remove the line, the problem is solved correctly. Moreover, with this method, an HcatAtom of the form

hcat (affine; real)
├─ * (affine; real)
│  ├─ true
│  └─ 4×4 real variable (id: 145…035)
└─ * (affine; real)
   ├─ false
   └─ 4×4 real variable (id: 145…035)

and one of the form

hcat (affine; real)
├─ hcat (affine; real)
│  ├─ hcat (affine; real)
│  │  ├─ * (affine; real)
│  │  │  ├─ true
│  │  │  └─ 4×4 real variable (id: 145…035)
│  │  └─ * (affine; real)
│  │     ├─ false
│  │     └─ 4×4 real variable (id: 145…035)
│  └─ * (affine; real)
│     ├─ false
│     └─ 4×4 real variable (id: 145…035)
└─ * (affine; real)
   ├─ false
   └─ 4×4 real variable (id: 145…035)

end up with the same hash. In fact, there are 6 such collisions in this tree that aren’t present when that line is commented out. These two should definitely have different hashes because they have different children (children of different types, even).

As an aside, to find that, I used the script


using Convex, LinearAlgebra, AbstractTrees
id = I(4)
X = Semidefinite(4)
W = kron(id, X)
p = maximize(tr(W), tr(X) ≤ 1)

node_pairs = []
node_hashes = []

for (i, node1) in enumerate(AbstractTrees.PostOrderDFS(p))
    node1 isa Convex.AbstractExpr || continue
    for (j, node2) in enumerate(AbstractTrees.PostOrderDFS(p))
    node2 isa Convex.AbstractExpr || continue
    i < j || continue
    if node1.id_hash == node2.id_hash
        push!(node_pairs, (i,j))
        push!(node_hashes, node2.id_hash)
    end
    end
end

with and without that line commented (separate Julia sessions; Revise wasn’t enough for some reason). Then I @show’d the node_pairs, copied them into the same session, setdiff’d them, and found six pairs that collided with the hash that did not collide without the hash method.

Ok, so why are we getting collisions?

The problem seems to be a self-similarity kind of thing, and the invertability of xor. Note that there is a copy of the first in in the second, looking at the two AbstractExpr’s that collided. Let’s call the first node1 and the second node2, and call

* (affine; real)
├─ false
└─ 4×4 real variable (id: 145…035)

node3. Then node2 = hcat( hcat( node1, node3), node3 ). So node2.id_hash = hash( (hcat( node1, node3), node3) ), since for an HcatAtom it’s just the hash of the children (same for *)…

Ok, it’s getting late so I’m going to stop stepping through this, but I think basically the problem is each of these id_hash’s just lowers to the children, we do this 3 times, and xor is its own inverse, so after 3 times we get the same as 1 time, which gives us node1.

Without this hash method, we use the Base hash method which uses objectid so this nesting gives distinct things.

Hope that makes sense.

That makes me worry that the Base.hash(A::Array{<:AbstractExpr}, h::UInt) could have problems too, we just don’t have the exact right problem in the set of tests to trigger an issue. What do you think?

1 Like

Oh wow, that’s a serious analysis :wink: yes, probably we were too greedy to only xor stuff in Base.hash(a::AbstractExpr, h::UInt), there is a reason why

julia> hash(UInt(0), UInt(0))
0x77cfa1eef01bca90

(it uses 3a + b map composed with this :wink:

this combination:

Base.hash(ex::AbstractExpr, h::UInt) = hash(id_hash(ex), hash(head(ex), h))::UInt

function Base.hash(A::Array{<:AbstractExpr}, h::UInt)
    return mapreduce(hash, xor, A, init=xor(hashaa_seed, hash(size(A), h)))
end

is slower (11.208 ns (0 allocations: 0 bytes) and 122.697 μs (0 allocations: 0 bytes)) but has the “added benefit” of tests passing :wink:

1 Like

Hello all, here is a simplified version of my code for reproducing this slowness issue.

using Convex;

using SparseArrays: sprand;
using Dates: now;


# parameters controlling the size of the problem
const N = 10;
const r = 10;
const m = 2;


const s = 4*r + 1;
x = [Variable(m*s+2r, m*s+2r) for _ in 1:N];
y = zeros(s,s,N);
# for demoing, we can assume `y` is constant in time
for t = 1:N
    y[:, :, t] .= sprand(Float64, s, s, 0.15);
end

Q = zeros(2r+1,2r+1);
for i=1:(2r+1), j=1:(2r+1)
    d = (i-r-1)^2 + (j-r-1)^2
    f = r/3;
    Q[i,j] = exp(-4*log(2)*d/f^2)
end

function conv2d(Q, x)
    rows_Q, cols_Q = size(Q);
    rows_x, cols_x = size(x);
    i_min = rows_Q + 1;  # such that i-h>=1 when i=i_min and h=rows_Q
    j_min = cols_Q + 1;  # such that j-k>=1 when j=j_min and k=cols_Q
    i_max = rows_x + 1;  # such that i-h<=rows_x when i=i_max and h=1
    j_max = rows_x + 1;  # such that j-k<=rows_x when j=j_max and k=1
    return [
        reduce(+, (Q[h,k]*x[i-h, j-k]
                   for h in 1:rows_Q, k in 1:cols_Q))
        for i in i_min:i_max, j in j_min:j_max
    ];
end

± = [-1.0, +1.0];  # well, Julia does allow crazy variable
totals = Array{Any}(undef, N);
for t in 2:N
    @info "Starting computation of constraint A[$(t)] at $(now()) ..."
    # 1. introduce sign perturbation `R`
    a1 = (rand(±, size(x[t])) .* x[t]) + x[1];
    # 2. 2D-convolution with PSF matrix `Q`
    a2 = conv2d(Q, a1);
    # 3. project into lo-res space
    a3 = [
        reduce(+, (a2[I,J]
                   for I in (1+m*(i-1)):m*i,
                   J in (1+m*(j-1)):m*j))
        for i in 1:s, j in 1:s
    ];
    # fetch the corresponding lo-res vector out of `y`
    a4 = a3 - y[:, :, t];
    # now add into the effective constraint
    totals[t] = reduce(+, (a4[i,j] * a4[i,j] for i in 1:s, j in 1:s));
end;  # for t

# build and return final constraint
total = reduce(+, totals);

Thanks for the code; I might be a chance to try it later, but just looking at it, it looks like totals[1] never gets assigned, is that an error? Also, I think sum will be faster than reduce(+, ...) (assuming it’s not a methoderror or something).

Hello @ericphanson

It’s an error: the for loop on line 44 should read for t in 1:N
(In the real code, t=1 is a special case but that does not matter here.)

I think sum will be faster than reduce(+, ...)

I had tried both and didn’t notice any significant speedup *but I wasn’t looking there either). I settled on reduce(+, ...) because in some cases I was getting some undefined methods with sum(). I am not sure what the difference between the two is supposed to be…

1 Like

Hi @riccardomurri,

I think I have good news :slight_smile:. I was able to optimize your 2D convolution quite a bit, and now it seems pretty fast. It’s a bit unfortunate that while in Julia generally, naive loops are fast, in Convex.jl things are implemented in a vectorized way and sometimes one needs to employ tricks to get good performance. Now with N=10, m=10, r=2, the code runs in 15 seconds on my laptop, whereas before it did not finish in 10 minutes. But of course, we aren’t even solving anything yet, so maybe the performance will still not be good enough. I also haven’t improved the hashing time yet; I’m actually a bit more interested in the long-term solution there of not needing special hashing in the first place, though if there is a good short term fix, I’d take that too!

Step 1: Write a 2D convolution in terms of a 1D one

We will start with a numerical expression that doesn’t yet work with Convex.jl variables.

# From `test/definitions.jl` in Convex.jl
# a 1D convolution
function conv1D(h, x)
    m = length(h)
    n = length(x)
    zero_pad_x(i) = 1 <= i <= n ? x[i] : 0
    [ sum(h[j]*zero_pad_x(i-j+1) for j = 1:m) for i = 1:m+n-1  ]
end

# Perform a 2D convolution in terms of a 1D one
# Following https://sites.ualberta.ca/~mostafan/Files/Papers/md_convolution_TLE2009.pdf
function conv2D(X::AbstractMatrix, Y::AbstractMatrix)
    M, N = size(X)
    K, L = size(Y)
    Z_rows = M + K - 1
    Z_cols = N + L -1

    X_padded = zeros(Z_rows, Z_cols)
    X_padded[1:size(X, 1), 1:size(X,2)] .= X
    X_vector = vec(X_padded)[1: (N-1)*(M+K-1) + M]

    Y_padded = zeros(Z_rows, Z_cols)
    Y_padded[1:size(Y, 1), 1:size(Y,2)] .= Y
    Y_vector = vec(Y_padded)[1: ( L-1)*(M+K-1) + K]

    Z_vector = conv1D(X_vector, Y_vector)
    Z = reshape(Z_vector, Z_rows, Z_cols)
    Z[M:K, N:L]
end

Now, let’s check this works against your conv2d:


function _conv2D(Q, x)
    rows_Q, cols_Q = size(Q);
    rows_x, cols_x = size(x);
    i_min = rows_Q + 1;  # such that i-h>=1 when i=i_min and h=rows_Q
    j_min = cols_Q + 1;  # such that j-k>=1 when j=j_min and k=cols_Q
    i_max = rows_x + 1;  # such that i-h<=rows_x when i=i_max and h=1

    # this was corrected from `j_max = rows_x + 1`
    j_max = cols_x + 1;
    #

    return [
        reduce(+, (Q[h,k]*x[i-h, j-k]
                   for h in 1:rows_Q, k in 1:cols_Q))
        for i in i_min:i_max, j in j_min:j_max
    ];
end

using Test
@testset "conv2D implementation" begin
    for (M,N,K,L) in [ rand(2:100, 4) for _ = 1:5]
        Q = rand(M, N)
        x = rand(K, L)
        @test _conv2D(Q, x) ≈ conv2D(Q,x)
    end
end
Test Summary:         | Pass  Total
conv2D implementation |    5      5

Step 2: modify our 2D convolution so it can be used with Convex.jl variables

using Convex

"""
    Convex_conv2D(X::AbstractMatrix, Y::Convex.AbstractExpr) -> Z, constraints

Produces the 2D convolution `Z` of `X` and `Y`, along with a set
of constraints that must be added to the problem.
"""
function Convex_conv2D(X::AbstractMatrix, Y::Convex.AbstractExpr)
    M, N = size(X)
    K, L = size(Y)
    Z_rows = M + K - 1
    Z_cols = N + L -1


    X_padded = zeros(Z_rows, Z_cols)
    X_padded[1:size(X, 1), 1:size(X,2)] .= X
    X_vector = vec(X_padded)[1: (N-1)*(M+K-1) + M]

    Y_padded = Variable(Z_rows, Z_cols)
    constraints = [
        Y_padded[1:size(Y, 1), 1:size(Y,2)] == Y,
        Y_padded[size(Y,1)+1:end, :] == 0,
        Y_padded[1:size(Y,1), size(Y,2)+1:end] == 0,
    ]
    Y_vector = vec(Y_padded)[1: ( L-1)*(M+K-1) + K]

    Z_vector = Convex.conv(X_vector, Y_vector)
    Z = reshape(Z_vector, Z_rows, Z_cols)
    Z[M:K, N:L], constraints
end

Note our 2D convolution produces the expression, but also a vector of constraints we need to add to the problem later on.

Step 3: Optimization

We notice that the 1D convolution in Convex.jl’s source creates a sparse matrix based on the first argument of the convolution. Since this doesn’t change over the course of the loop, we can just create it once and reuse it. We can also create it much faster than is done in Convex.jl’s source.

using SparseArrays

# The following is modified from Convex.jl's source
"""
    conv1D_matrix(v::AbstractVector, n::Int) -> SparseMatrixCSC

Create a sparse matrix `A` such that if `x` has length `n`,
then we have `A * x ≈ conv1D(v, x)`.
"""
function conv1D_matrix(v::AbstractVector, n::Int)
    m = length(v)
    # It is much more efficient to construct sparse matrices
    # this way rather than starting from `spzeros` and indexing into it.
    # I should make a PR to Convex.jl to fix this there.
    I = Int[]
    J = Int[]
    V = eltype(v)[]
    for i = 1:n
        for (ind,j) = enumerate(i:m+i-1: i)
            push!(I, i)
            push!(J, j)
            push!(V, v[ind])
        end
    end
    A = sparse(I,J,V, m + n - 1, n)
    return A
end

# A helper to create the correct matrix that performs
# the 1D convolution
function make_A(Q, sz)
    M, N = size(Q)
    K, L = sz
    Z_rows = M+K-1
    Z_cols = N + L -1
    Qpad = zeros(Z_rows, Z_cols)
    Qpad[1:size(Q, 1), 1:size(Q,2)] .= Q
    Qv = vec(Qpad)[1: (N-1)*(M+K-1) + M]
    conv1D_matrix(Qv, (L-1)*(M+K-1) + K)
end

# Use the `A` matrix. Note we only depend on the
# size of the first argument now, not the value,
# since that is encoded in the `A` matrix.
function Convex_conv2D_matrix(X_size::Tuple, Y::Convex.AbstractExpr, A::AbstractMatrix)
    M, N = X_size
    K, L = size(Y)
    Z_rows = M + K - 1
    Z_cols = N + L -1


    Y_padded = Variable(Z_rows, Z_cols)
    constraints = [
        Y_padded[1:size(Y, 1), 1:size(Y,2)] == Y,
        Y_padded[size(Y,1)+1:end, :] == 0,
        Y_padded[1:size(Y,1), size(Y,2)+1:end] == 0,
    ]
    Y_vector = vec(Y_padded)[1: ( L-1)*(M+K-1) + K]

    Z_vector = A * Y_vector
    Z = reshape(Z_vector, Z_rows, Z_cols)
    Z[M:K, N:L], constraints
end

Step 4: put it all together

In the following, I’ve kept my TimerOutputs logging which I used originally to see that the convolution was the bottleneck.

using SparseArrays: sprand;
using Dates: now;
using TimerOutputs

function make_constraints(N, r, m)
    to = TimerOutput()

    s = 4*r + 1;
    x = [Variable(m*s+2r, m*s+2r) for _ in 1:N];
    y = zeros(s,s,N);
    # for demoing, we can assume `y` is constant in time
    for t = 1:N
        y[:, :, t] .= sprand(Float64, s, s, 0.15);
    end

    @timeit to "make Q" begin
        Q = zeros(2r+1,2r+1);
        for i=1:(2r+1), j=1:(2r+1)
            d = (i-r-1)^2 + (j-r-1)^2
            f = r/3;
            Q[i,j] = exp(-4*log(2)*d/f^2)
        end
    end
    @timeit to "make A" A = make_A(Q, (m*s+2r, m*s+2r))


    ± = [-1.0, +1.0];  # well, Julia does allow crazy variable
    totals = Array{Any}(undef, N);
    constraints = Constraint[]
    extra_constraints = Constraint[]
    for t in 1:N
        @info "Starting computation of constraint A[$(t)] at $(now()) ..."
        # 1. introduce sign perturbation `R`
        @timeit to "a1" a1 = (rand(±, size(x[t])) .* x[t]) + x[1];

        @timeit to "a2" a2, cs = Convex_conv2D_matrix(size(Q), a1, A);
        append!(constraints, cs)

        # 3. project into lo-res space
        @timeit to "a3" a3 = [
            sum( (a2[I,J]
                    for I in (1+m*(i-1)):m*i,
                    J in (1+m*(j-1)):m*j))
            for i in 1:s, j in 1:s
        ];
        # fetch the corresponding lo-res vector out of `y`
        @timeit to "a4" a4 = a3 - y[:, :, t];
        # now add into the effective constraint
        @timeit to "totals" totals[t] = sum((a4[i,j] * a4[i,j] for i in 1:s, j in 1:s));
    end;  # for t

    # build and return final constraint
    @timeit to "total" total = sum(totals);
    print_timer(to)
    return total, constraints
end

Note that this produces total, like the original code, but also a vector of constraints that must be added to the problem.

Step 5: compare vs original

Lastly, we can compare versus the original version

function make_constraints_original(N, r, m)
    s = 4*r + 1;
    x = [Variable(m*s+2r, m*s+2r) for _ in 1:N];
    y = zeros(s,s,N);
    # for demoing, we can assume `y` is constant in time
    for t = 1:N
        y[:, :, t] .= sprand(Float64, s, s, 0.15);
    end

    Q = zeros(2r+1,2r+1);
    for i=1:(2r+1), j=1:(2r+1)
        d = (i-r-1)^2 + (j-r-1)^2
        f = r/3;
        Q[i,j] = exp(-4*log(2)*d/f^2)
    end

    function conv2d_original(Q, x)
        rows_Q, cols_Q = size(Q);
        rows_x, cols_x = size(x);
        i_min = rows_Q + 1;  # such that i-h>=1 when i=i_min and h=rows_Q
        j_min = cols_Q + 1;  # such that j-k>=1 when j=j_min and k=cols_Q
        i_max = rows_x + 1;  # such that i-h<=rows_x when i=i_max and h=1
        j_max = rows_x + 1;  # such that j-k<=rows_x when j=j_max and k=1
        return [
            reduce(+, (Q[h,k]*x[i-h, j-k]
                       for h in 1:rows_Q, k in 1:cols_Q))
            for i in i_min:i_max, j in j_min:j_max
        ];
    end

    ± = [-1.0, +1.0];  # well, Julia does allow crazy variable
    totals = Array{Any}(undef, N);
    for t in 1:N
        @info "Starting computation of constraint A[$(t)] at $(now()) ..."
        # 1. introduce sign perturbation `R`
        a1 = (rand(±, size(x[t])) .* x[t]) + x[1];
        # 2. 2D-convolution with PSF matrix `Q`
        a2 = conv2d_original(Q, a1);
        # 3. project into lo-res space
        a3 = [
            reduce(+, (a2[I,J]
                       for I in (1+m*(i-1)):m*i,
                       J in (1+m*(j-1)):m*j))
            for i in 1:s, j in 1:s
        ];
        # fetch the corresponding lo-res vector out of `y`
        a4 = a3 - y[:, :, t];
        # now add into the effective constraint
        totals[t] = reduce(+, (a4[i,j] * a4[i,j] for i in 1:s, j in 1:s));
    end;  # for t

    # build and return final constraint
    total = reduce(+, totals);
end

Then

N = 2
r = 5
m = 2
@time make_constraints_original(N, r, m);
@time make_constraints(N, r, m);

gives me 17 seconds vs 2.5. More importantly, setting N=r=10, the second completes in 15 seconds.

Now, I hope I haven’t made any errors during this optimization. I tested my numerical 2D convolution, as shown above, but I don’t have a full optimization problem to compare the two more fully. So please test it (on small problems) against your original code and let me know if it works.

If it does work, it might be nice to add something like this to Convex.jl. The bit where there’s extra constraints to carry around makes for a slightly awkward API, but luckily `AbstractVariable` interface; allow variables to carry constraints by ericphanson · Pull Request #358 · jump-dev/Convex.jl · GitHub will solve that by allowing us to tell Z to carry its own constraints. This actually makes for some nice motivation for that feature.

3 Likes

@ericphanson many many thanks for this alternate implementation of convolution: with your code, running time goes down from ~14 hours to 11 minutes! Amazing!

(I am still using my modified hash code, but it’s probably not impacting much now.)

Thank you, thank you, thank you very much!

Awesome, I’m glad to hear it!

It looks to me there is a typo in function conv1D_matrix: line for (ind,j) = enumerate(i:m+i-1: i) always iterates only once (from i
to i).

I have been using this implementation instead, which I also think is
marginally faster:

"""
conv1D_matrix(h::AbstractVector, n::Int) -> SparseMatrixCSC

Create a sparse matrix `A` such that if `x` has length `n`,
then we have `A * x ≈ conv1d(h, x)`.
"""
function conv1D_matrix(h::AbstractVector, n::Int)
    m = length(h);

    # It is much more efficient to construct sparse matrices
    # this way rather than starting from `spzeros` and indexing into it.
    Is = Int[];
    Js = Int[];
    Vs = eltype(h)[];
    # build matrix by columns
    for j in 1:n
        append!(Is, j:(j+m-1));
        append!(Js, repeat([j], m));
        append!(Vs, h);
    end
    return sparse(Is, Js, Vs, m+n-1, n);
end  # make_conv1D_matrix

Oh! Drats, I hope that didn’t waste too much of your time; I should’ve tested that. I think the final :I is spurious there. Your solution looks better though anyway! Thanks for letting me know.