Why is `TaskLocalRNG` faster than `Xoshiro` with multiple threads?

They are basically the same, and btw I don’t think TaskLocalRNG is faster, if you look at min time, explicit Xoshiro is faster