libLLVM was compiled against a GCC configured without thread local storage on Windows. Upon updating GCC to GCC 12 we found missing symbols (e.g. _ZSt14__once_functor) needed by libLLVM as documented in the following pull request.
To restore those symbols, we had to explicitly disable thread local storage, in part due to an upstream change in GCC 12. However, by explicitly disabling thread local storage in GCC 12 we may have broken libgomp or perhaps how many other libraries (e.g. XGBoost) interact with libgomp. So far it appears that substituing a libgomp built with thread local storage appears to resolve the issue for those packages.
One potential solution, still unproven but complicated to test, is perhaps we should just enable GCC’s thread local storage on Windows, which actually does have native support for thread local storage. This in part would involve having to rebuild libLLVM against GCC with thread local storage enabled.
Is there a reason we have disabled thread local storage for LLVM on Windows? (My guess is no. That’s just how CompilerSupportLibraries_jll was built in the past)
Could we enable thread local storage support on Windows and still have libLLVM function correctly?
To move that in this direction, some additional work is needed to prove that thread local storage support is actually the issue, but that is the current lead suspect here.
I don’t have answers for those question, my best guess is probably?
It sounds like we compiled libLLVM against a libstdc++ from GCC and that baked in the fact that TLS was disabled. So yes a “simple” recompilation might work?
In the absence of understanding the precise issue, what do you think about the utility of bumping the version of the MinGW-w64 release from v7 (2019-11-10) to v11 (2023-04-29)?
I’m using IDA to do debugging, and it’s assigning the critical error to different location: _ZNSt12__shared_ptrIN7xgboost10SparsePageELN9__gnu_cxx12_Lock_policyE2EEC2ISaIS1_EJEEESt19_Sp_make_shared_tagRKT_DpOT0_.isra.751+0xE6
Interestingly, this path goes through 00007FF92124323F libgcc_s_seh-1.dll libgcc_s_seh-1___emutls_get_address+1EF again suggesting possible involvement of TLS.
We need someone to show if the bug can recreated in C++ using the libraries shipprd with Julia. If we can show this is not specifically a Julia issue, we stand a better chance of getting help from upstreamm