Covariance Optimization for Joint estimation using UKF

Yes. You could learn the best-fitting correlation, or you could assume that the correlation that you’d get in a double-integrator with continuous noise input holds, that is, that you have a block entry of
\sigma_i^2 \begin{pmatrix} \frac{T_s^3}{3} & \frac{T_s^2}{2} \\ \frac{T_s^2}{2} & T_s \end{pmatrix}
for each z_i,v_i component. Whether or not the \sigma_i are unique for each component, or you use a single \sigma for all of them, depends on the specifics.

It depends on what you assume about the evolution of the parameters A,B,C. If they are constant but changing at random times, or if they just slowly drift randomly, there does not have to be any additional correlations. On the other hand, if they change due to changes in z,v, maybe there “should be”. I say “should be” since figuring out what number to attribute such a correlation might be hard, and the performance gain for finding the right one might be small.

Correct, the parameter in that case is the \sigma_i from above.

The blog post effectively uses number 1, the
\sigma^2 \begin{pmatrix} 0 & 0 \\ 0 & 1 \end{pmatrix}
is for a continuous-time process, which when discretized appears as either the expression above, or the very similar-looking one in the blog when assuming piece-wise constant noise inputs.