Multithreading using FLoops for updating discrete distribution

Yeah, JuliaFolds APIs are meant to produce deterministic results by default if you use the same nthreads (or manually specify basesize). There are also options to weaken this if you need the last bit of performance.

The main “algorithm” is stupid-simple https://github.com/JuliaFolds/Transducers.jl/blob/ed41897d3078f0ae622b271380e426b30d34ea8b/src/reduce.jl#L165-L186