Well for one, I’d recommend not benchmarking in global scope.
Performance critical code should be inside a function
Any code that is performance critical should be inside a function. Code inside functions tends to run much faster than top level code, due to how Julia’s compiler works.
Your code also seems to allocate a lot of (I think) unnecessary intermediate variables, which I think is usually a little more expensive on windows. I think in part that was due to syscalls in windows being a little more expensive. It’s hard to say though, since the timings are relatively close.
This doesn’t explain the difference you observe of course, but it’s very possible that the allocation and internal behavior of malloc on windows makes a difference here.