The problem with minimizing the sum is that it’s numerically the same as
minimize(sum(model_intensity) - sum(data_obs))
So you’re effectively collapsing each image down into a single scalar and then comparing those scalars, which is not very informative. You probably want to minimize the mean squared error (corresponds to a Normal likelihood) or mean absolute error (corresponds to a Laplace likelihood).
Looks correct to me. MSE is the negative log-likelihood of a normal distribution with fixed variance. Your code says that each data_obs[i] is normally distributed with the expected value given by your model, which is correct.