In fact, this was not correct. Minimizing \Vert Y - Z B X^T \Vert (or the equivalent expression with Kronecker products) looks at all the elements of Z B X^T, when in fact you only want to minimize the differences in the diagonal elements.
So, @mikmoore’s solution is actually not applicable here because it is solving the wrong problem (my fault), in addition to being too slow (because \operatorname{diagm}(y) is too big here).
The iterative algorithm (above) as well as the direct algorithm (above) solve the correct problem, though.