The base case is completely unrelated to the cache size. You want the base case to be just large enough that the recursion overhead is negligible in comparison, but much smaller than any cache.
Probably it should be m * n * p <= something
, therefore, since the relevant factor is the cost of the base case, which scales like \Theta(mnp). But since I was looking mostly at square matrices it didn’t really matter too much exactly how we implemented the criterion as long as we did a little tuning of the cutoff value.