To nitpick, ρ^2 * Wapp^-1 is not the same as Wapp \ ρ^2, but should be translated to ρ^2 / Wapp. Inside the determinant, it doesn’t matter much, though it leads to some floating point differences.
For performance (in addition to my doubts about the parallelization), there are many things that can be sped up, the most glaring one is perhaps
where the products J' * Si * J etc. are performed over and over, instead of doing them once outside the loop. Also slicing along rows is slower than along columns, and the slices also allocate unnecessary temporary arrays, and the entire matrices X11, etc. are zeroed out for each iteration, even though only one row has been touched.
So there’s good news: at least GHT should be possible to speed up a lot ![]()