Found the bug.
There was an extra 0.5 multiplication in two places which caused the CN code to deviate from the MOL code.
This line
rhs[1] = (Double64(1.0) + dt_half * L_T1_coeff_T1) * T[1] + (dt_half * L_T1_coeff_T2) * T[2] + dt_half * L_T1_const
should be
rhs[1] = (Double64(1.0) + dt_half * L_T1_coeff_T1) * T[1] + (dt_half * L_T1_coeff_T2) * T[2] + dt * L_T1_const
and similarly, this line
rhs[n_r] = (dt_half * L_Tn_coeff_Tm1) * T[n_r-1] + (Double64(1.0) + dt_half * L_Tn_coeff_Tn) * T[n_r] + dt_half * L_Tn_const
should be
rhs[n_r] = (dt_half * L_Tn_coeff_Tm1) * T[n_r-1] + (Double64(1.0) + dt_half * L_Tn_coeff_Tn) * T[n_r] + dt * L_Tn_const
The results are now identical to machine precision.