Test 40xN matmuls. N would be the batch size in the sampling, or number of samples. So like, 40x100.
There is no maximum order, though physical equations don’t tend to have above 4 and I think that’s where most of the hard-coded extra optimizations would stop. There is not a limit on the number of equations, and you generally scale linearly with that. You just scale quadratically with larger size (or think cubically as the samples grow), which is a fundamental limitation because of the matmuls in the neural networks.
GridTraining does not overcome curse of dimensionality, does not hit random points so it tends to not give great results between grid points, and has slower convergence than a quasi-random low discrepancy sampler.