It depends on the kind of problem. Containers would help with the OS or software versions changing as long as the performance characteristics of the underlying hardware/VM/OS don’t change too much. So that would address problems like CI’s copy of
gfortran suddenly changing to something we don’t support, which has happened (and is a big pain) but isn’t that common. So containers would help a bit for Linux CI. On the other hand, I don’t know of any widely used container solutions that run Windows or macOS inside the container. Yes, you can run Linux containers on those OSes (with significant overhead), but that defeats the purpose of running CI jobs on those operating systems, which is to test that Julia works correctly inside of those environments. Maybe there’s something better to be done there. If so, suggestions are welcomed.
We’ve also had many issues that containers would not have helped with. Services change VMs and our CI jobs suddenly don’t have enough memory, or don’t have enough real cores to finish before the time limit, or the time limit is simply reduced without warning. For a long time (possibly still) we were running on free tier VMs shared with other open source projects even though we are a paying customer because CI services aren’t designed to run paid CI on public GitHub repos. In other cases, it has been because we get an open source discount so we pay but we pay a bit less and because of that we get shitty free, shared VMs.
There was one case where some kernel configuration on the underlying machine was changed so that it could supposedly run more concurrent CI jobs but the result for Julia’s CPU-intensive test suite was that we started timing out every time and almost never finishing CI. This caused CI to get restarted a lot, which, of course, uses more compute, not less, over all. Jameson carefully diagnosed the issue—I’m not sure how, it was impressive sleuth work—and we reported it to their support staff but they couldn’t/wouldn’t change the kernel setting. I don’t recall how this got resolved. Probably with us paying them more money. Yet, as this thread shows, paying more money does not seem to solve these problems for long.