I’m trying to solve a matrix factorization problem with alternating minimization. My data matrix has missing entries and because of that I can not simply write my objective function as:
# A is a n x m data matrix
n, m = size(A)
k = 2
X = Variable(n, k)
Y = Variable(k, m)
problem = minimize(sumsquares(A - X*Y), (X*Y)>=0)
I have come up with two ways to address this issue, but one is very slow and uses to much memory, while the other is giving NaN results in a lot of experiments. I’m going to describe both procedures for writing the objective function below:
Approach 1
objective = 0
for o in observations
objective += square(A[o...] - (X*Y)[o...])
end
where observations is an Array of tuples, with each tuple containing the indices of an entry in matrix A that was indeed observed, i.e., non-missing.
This approach uses all the RAM in my computer and is really slow when calling problem = minimize(objective, constraints) and every time solve!(problem, SCSSolver()) is called.
Approach 2
B = coalesce.(A, 0.0) # replace missing entries with 0.0 for .* to work
objective = sumsquares(M .* B - M .* (X*Y))
where M is a BitArray{2}, which 1s representing observed values and 0s the missing ones.
This approach is faster, I suppose because the DCP problem tree has a significant less branches. The issue is that running my program using this approach, using the same data and random.seed, often gives me a Y matrix of NaNs or an overall bad approximation.
Does anyone know a better way of writing this problem using Convex.jl?
Both are variables and their multiplication is a matrix, but since I’m using alternating minimization one is fixed while the other is being optimized at their respective iteration steps. My code is more or less like the example in the documentation: Fixing-and-freeing-variables
The difference is that I’m constraining X*Y to be positive while in the example both have to be positive individually.
I’m also using regularizers that I omitted, I thought that part was not relevant for the question:
Another way of stating my main question is: What is optimal way of writing an objective function in Convex.jl when there is missing data, since logical indexing is not supported.
Ok, thanks for clarifying . I am not sure, I haven’t dealt with missing data before. If we find a good solution in this thread we could try to upstream it to Convex.jl to do the transformation automatically.
The scalar-indexing way of doing it in Approach 1 will be slow for Convex.jl as you’ve seen; approach 2 looks reasonable to me, and also looks like it should be doing the exact same thing as Approach 1, so I don’t really understand why it gives you worse results. It seems like the terms that are missing in the objective should just get set to zero in the optimization due to your regularization and the constraints.
Does it help if you constraint them to be zero explicitly? E.g. (I - M).*(X*Y) == 0 ? In other words, that constraint says if an entry of the mask isn’t 1, the corresponding element of X*Y must be zero.
I’m trying to achieve something like what is done in the package LowRankModels but I need to constrain (x*y)>=0 which i believe is not currently possible with what they offer.
While the random seed is the same I don’t know if I both results should be identical here, since there are a lot of floating point operations involved…
Huh, those do look fairly different; e.g., the [5,1] entry looks weird in the second case (44 instead of like 80). Are you sure the mask is correct? Also, shouldn’t just B be enough in the objective instead of M .* B? (Since B is already the masked version of A). I guess B ≈ M .* B could be a test to check your mask.
I think the main difference between the approaches is that in the first one, for the loss function, the solver completely ignores what is not in observations, while in the second those terms exist in the function but are always being multiplied by 0. I don’t know if with Floats this could affect stuff.
So ideally we would want something like the first approach but with the speed of the second. I tried logical indexing, but the package doesn’t support it.
I think that really shouldn’t affect things unless there are infinite terms. Internally, Convex.jl indexes by masking too— which is why it’s slow when you have to do it for each term in the matrix, instead of just once like in your approach 2.
So I’m not really sure what’s wrong. Maybe with the large problems, the solver isn’t converging to tolerance, or thinks the problem is infeasible, etc, causing the NaN’s. That wouldn’t explain the difference in the small problem though.
I would try to make the two solves as comparable as possible, for the small problem between approach 1 and 2. For example, I’d compare the results after just 1 iteration of your alternating procedure and try to figure out why they are giving different results; maybe see how many iterations the solver is doing in each case, etc. And double-check the starting values are the same too.
Regarding your question before, the mask also considers the validation entries, so I have really need to do M .* B. I edited my previous reply to that quote because of that.
However, after implementing in Matlab+CVX, Python+CVXPY and Julia+Convex.jl, I think that
in Julia, having to perform
norm(vec(M .* A - M .* (X*Y)), 2)
vs
cp.Minimize(cp.norm(A[M] - cp.matmul(X, Y)[M]))
in Python, and vs
expression Z
Z + (X*Y)
minimize(norm(A(M) - Z(M), 'fro'))
in Matlab, has a significant negative impact in memory consumption for Julia. To the point that for the same data, I can run the algorithm using Matlab or Python on a computer with 8GB of RAM while being unable to do it using Julia on a 16GB computer.