I want help about combination problem

I have array of 1-38 I want to choose 11 of them, so it will have 1203322288 possibility. and I want to do some calculation on all combination.
If I loop one by one through all combination It will take very long time.
How can I speedup my process.
multithread or cuda is ok but I don’t have any idea about it.

Thank

How complicated is your calculation? Just looping through all combinations is not so bad:

julia> using SmallCollections, SmallCollections.Combinatorics, Chairmarks

julia> @b sum(minimum, subsets(38, 11))
2.203 s (without a warmup)
3 Likes

Maybe you can also just run on a sample to speed your computation up? Building on @matthias314 example, you could do this with

julia> using SmallCollections, SmallCollections.Combinatorics, StreamSampling

julia> n = 10000;

julia> iter = StreamSample{SmallCollections.SmallBitSet{UInt64}}(subsets(38, 11), n);

e.g. in this case the estimation easily approximates the correct result

julia> sum(minimum, subsets(38, 11)) / (sum(minimum, iter) * binomial(38, 11) / n)
1.0014791076050782

This could be sensible if the work per element in your computations is more expensive that the cost to generate combinations.

2 Likes

EDIT: improved code, added another version using Atomic

Here is a naive and two multi-threaded versions of a function that sums up the sum of the elements in every combination:

using SmallCollections, SmallCollections.Combinatorics
using Base.Threads
using Random: shuffle!

function f(n, m)
    s = 0
    for c in subsets(n, m)
        s += sum(c)
    end
    return s
end

function fthread1(n, m)
    s = 0
    l = m ÷ 2  # this seems to be the best choice
    rl = ReentrantLock()
    @threads for a in shuffle!(collect(subsets(SmallBitSet(m-l+1:n), l)))
        t = 0
        for b in subsets(minimum(a)-1, m-l)
            c = a ∪ b
            t += sum(c)
        end
        @lock rl s += t
    end
    return s
end

function fthread2(n, m)
    s = Atomic{Int}(0)
    l = m ÷ 2  # this seems to be the best choice
    @threads for a in shuffle!(collect(subsets(SmallBitSet(m-l+1:n), l)))
        t = 0
        for b in subsets(minimum(a)-1, m-l)
            c = a ∪ b
            t += sum(c)
        end
        atomic_add!(s, t)
    end
    return s[]
end
julia> using Chairmarks

julia> n, m = 38, 11;

julia> @b f($n, $m)
19.718 s (without a warmup)

julia> @b fthread1($n, $m)  # 12 threads
2.584 s (371356 allocs: 7.211 MiB, without a warmup)

julia> @b fthread2($n, $m)  # 12 threads
2.008 s (66 allocs: 1.543 MiB, without a warmup)

You can convert the sets (SmallBitSet) to vectors with collect or with SmallVector or FixedVector:

julia> s = SmallBitSet(1:4)
SmallBitSet{UInt64} with 4 elements:
  1
  2
  3
  4

julia> SmallVector{16,Int8}(s)
4-element SmallVector{16, Int8}:
 1
 2
 3
 4

julia> FixedVector{4,Int8}(s)  # length must match length of s
4-element FixedVector{4, Int8}:
 1
 2
 3
 4
2 Likes

Sampling an array of 38 floats and getting 11 of their combinations to take square roots doesn’t take all that long on a decently provisioned machine, and doing it once does not benefit from multithreading because of the additional overhead if doing it only a few hundred time. For a thousand plus, threading gives a speed-up, and for more than 10K, chunked threading works very well. So you have to consider the trade offs between problem size and thread count.

Here’s a script and the results, using 32 threads

println("- Each thread needs its own RNG to avoid synchronization overhead")
ulia> n = 38
38

julia> random_floats = rand(n)
38-element Vector{Float64}:
 0.1735745757945074
 0.32166161915780656
 0.25858546995315457
 0.16643864408566544
 0.5270150071089016
 0.48302213696845187
 0.3906633890864917
 ⋮
 0.7219827350406843
 0.3723466293304377
 0.030185550039208087
 0.07933388944734077
 0.6637579955379869
 0.49205817172341115

julia> println("Generated $n random floats")
Generated 38 random floats

julia> println("Using $(nthreads()) threads for parallel computation\n")

       # Original single-threaded approach
Using 32 threads for parallel computation


julia> function original_approach(floats::Vector{Float64}, num_combinations::Int)
           results = Vector{Tuple{Vector{Float64}, Float64, Float64}}()
           
           for i in 1:num_combinations
               combo_size = rand(1:length(floats))
               indices = sample(1:length(floats), combo_size, replace=false)
               combo = floats[indices]
               product = prod(combo)
               sqrt_result = product >= 0 ? sqrt(product) : NaN
               push!(results, (combo, product, sqrt_result))
           end
           
           return results
       end

       # Multithreaded approach
original_approach (generic function with 1 method)

julia> function multithreaded_approach(floats::Vector{Float64}, num_combinations::Int)
           # Pre-allocate results array
           results = Vector{Tuple{Vector{Float64}, Float64, Float64}}(undef, num_combinations)
           
           # Use @threads to parallelize the main loop
           @threads for i in 1:num_combinations
               # Each thread gets its own random state to avoid conflicts
               local_rng = MersenneTwister(rand(UInt32))
               
               combo_size = rand(local_rng, 1:length(floats))
               indices = sample(local_rng, 1:length(floats), combo_size, replace=false)
               combo = floats[indices]
               product = prod(combo)
               sqrt_result = product >= 0 ? sqrt(product) : NaN
               
               results[i] = (combo, product, sqrt_result)
           end
           
           return results
       end

       # Chunk-based multithreaded approach (better for large workloads)
multithreaded_approach (generic function with 1 method)

julia> function chunked_multithreaded_approach(floats::Vector{Float64}, num_combinations::Int)
           # Calculate chunk size per thread
           chunk_size = max(1, div(num_combinations, nthreads()))
           
           # Pre-allocate results
           results = Vector{Tuple{Vector{Float64}, Float64, Float64}}(undef, num_combinations)
           
           @threads for thread_id in 1:nthreads()
               # Calculate range for this thread
               start_idx = (thread_id - 1) * chunk_size + 1
               end_idx = min(thread_id * chunk_size, num_combinations)
               
               if start_idx <= num_combinations
                   # Each thread gets its own RNG
                   local_rng = MersenneTwister(thread_id * 12345)
                   
                   for i in start_idx:end_idx
                       combo_size = rand(local_rng, 1:length(floats))
                       indices = sample(local_rng, 1:length(floats), combo_size, replace=false)
                       combo = floats[indices]
                       product = prod(combo)
                       sqrt_result = product >= 0 ? sqrt(product) : NaN
                       
                       results[i] = (combo, product, sqrt_result)
                   end
               end
           end
           
           return results
       end

       # Benchmark with different problem sizes
chunked_multithreaded_approach (generic function with 1 method)

julia> println("=== Performance Comparison: Single vs Multi-threaded ===")
=== Performance Comparison: Single vs Multi-threaded ===

julia> test_sizes = [11, 100, 1000, 10000, 100000]
5-element Vector{Int64}:
     11
    100
   1000
  10000
 100000

julia> for num_combinations in test_sizes
           println("\n--- Testing with $num_combinations combinations ---")
           
           # Warmup runs
           original_approach(random_floats, min(num_combinations, 10))
           multithreaded_approach(random_floats, min(num_combinations, 10))
           chunked_multithreaded_approach(random_floats, min(num_combinations, 10))
           
           println("Single-threaded:")
           single_time = @belapsed original_approach($random_floats, $num_combinations)
           println("  Time: $(round(single_time * 1000, digits=3)) ms")
           
           println("Multi-threaded (@threads):")
           multi_time = @belapsed multithreaded_approach($random_floats, $num_combinations)
           println("  Time: $(round(multi_time * 1000, digits=3)) ms")
           println("  Speedup: $(round(single_time/multi_time, digits=2))x")
           
           println("Chunked multi-threaded:")
           chunked_time = @belapsed chunked_multithreaded_approach($random_floats, $num_combinations)
           println("  Time: $(round(chunked_time * 1000, digits=3)) ms")
           println("  Speedup: $(round(single_time/chunked_time, digits=2))x")
           
           # Memory allocation comparison
           println("Memory allocations:")
           println("  Single: ", @allocated original_approach(random_floats, num_combinations), " bytes")
           println("  Multi: ", @allocated multithreaded_approach(random_floats, num_combinations), " bytes")
           println("  Chunked: ", @allocated chunked_multithreaded_approach(random_floats, num_combinations), " bytes")
       end

       # Detailed timing for the original request (11 combinations)

--- Testing with 11 combinations ---
Single-threaded:
  Time: 0.001 ms
Multi-threaded (@threads):
  Time: 0.036 ms
  Speedup: 0.03x
Chunked multi-threaded:
  Time: 0.036 ms
  Speedup: 0.03x
Memory allocations:
  Single: 10176
  Multi: 447024
  Chunked: 446752

--- Testing with 100 combinations ---
Single-threaded:
  Time: 0.01 ms
Multi-threaded (@threads):
  Time: 0.103 ms
  Speedup: 0.1x
Chunked multi-threaded:
  Time: 0.058 ms
  Speedup: 0.17x
Memory allocations:
  Single: 84416
  Multi: 3916096
  Chunked: 1316480

--- Testing with 1000 combinations ---
Single-threaded:
  Time: 0.109 ms
Multi-threaded (@threads):
  Time: 0.667 ms
  Speedup: 0.16x
Chunked multi-threaded:
  Time: 0.079 ms
  Speedup: 1.39x
Memory allocations:
  Single: 837744
  Multi: 38992896
  Chunked: 2012688

--- Testing with 10000 combinations ---
Single-threaded:
  Time: 1.143 ms
Multi-threaded (@threads):
  Time: 18.288 ms
  Speedup: 0.06x
Chunked multi-threaded:
  Time: 0.233 ms
  Speedup: 4.91x
Memory allocations:
  Single: 8595344
  Multi: 389888480
  Chunked: 9017024

--- Testing with 100000 combinations ---
Single-threaded:
  Time: 12.779 ms
Multi-threaded (@threads):
  Time: 206.039 ms
  Speedup: 0.06x
Chunked multi-threaded:
  Time: 1.191 ms
  Speedup: 10.73x
Memory allocations:
  Single: 81110064
  Multi: 3898453008
  Chunked: 78800624

julia> println("\n=== Detailed Timing for 11 Combinations ===")

=== Detailed Timing for 11 Combinations ===

julia> println("Single-threaded approach:")
Single-threaded approach:

julia> @btime single_results = original_approach($random_floats, 11)
  1.171 μs (56 allocations: 8.50 KiB)
11-element Vector{Tuple{Vector{Float64}, Float64, Float64}}:
 ([0.6349805161807821, 0.05529822631508752, 0.1735745757945074, 0.5802116426919226, 0.8120185311491867, 0.030185550039208087, 0.9354809757450431, 0.9786385378290976, 0.5270150071089016, 0.25858546995315457  …  0.32166161915780656, 0.07933388944734077, 0.09443144857141617, 0.8958377556815078, 0.6226472050312687, 0.2112283719951349, 0.3906633890864917, 0.5447580773835046, 0.02698475249996979, 0.6637579955379869], 3.743206062518859e-15, 6.118174615454237e-8)
 ([0.5447580773835046, 0.5802116426919226, 0.49205817172341115, 0.6349805161807821, 0.5044435708562829, 0.8230694654507019, 0.8120185311491867, 0.9786385378290976, 0.7219827350406843, 0.18342915487003242, 0.9354809757450431, 0.4339142195043123, 0.32166161915780656, 0.02698475249996979, 0.2112283719951349, 0.1735745757945074], 5.574351735350786e-7, 0.0007466158138795874)
 ([0.49205817172341115, 0.18342915487003242, 0.6349805161807821, 0.5447580773835046], 0.031221149706112963, 0.17669507550045915)
 ([0.9807576556964709, 0.1735745757945074, 0.02698475249996979, 0.9786385378290976, 0.07266057198023657, 0.9354809757450431, 0.48302213696845187, 0.07933388944734077, 0.9893116874624179, 0.25858546995315457, 0.35041553136394354, 0.5802116426919226, 0.8120185311491867, 0.837335454222923, 0.4339142195043123, 0.49205817172341115, 0.3906633890864917, 0.5044435708562829, 0.3723466293304377, 0.16643864408566544], 1.0798360509977635e-9, 3.2860858951003756e-5)
 ([0.48302213696845187, 0.02698475249996979, 0.5270150071089016, 0.35041553136394354, 0.6226472050312687, 0.3906633890864917, 0.0915850139744725, 0.25858546995315457, 0.18342915487003242, 0.05529822631508752  …  0.030185550039208087, 0.7219827350406843, 0.5802116426919226, 0.32166161915780656, 0.5447580773835046, 0.9354809757450431, 0.837335454222923, 0.6637579955379869, 0.802762551279973, 0.4339142195043123], 1.1643965416955636e-13, 3.4123255145070256e-7)
 ([0.9893116874624179, 0.8230694654507019, 0.9354809757450431, 0.2112283719951349, 0.802762551279973, 0.6637579955379869, 0.7219827350406843, 0.1735745757945074, 0.6349805161807821, 0.837335454222923  …  0.030185550039208087, 0.49205817172341115, 0.02698475249996979, 0.32166161915780656, 0.3475083889757653, 0.8958377556815078, 0.5447580773835046, 0.5270150071089016, 0.07933388944734077, 0.48302213696845187], 2.53748210800122e-11, 5.0373426605713654e-6)
 ([0.07266057198023657, 0.16643864408566544, 0.4339142195043123, 0.18342915487003242, 0.7219827350406843, 0.9893116874624179, 0.1735745757945074, 0.07933388944734077, 0.5447580773835046, 0.802762551279973  …  0.09443144857141617, 0.32166161915780656, 0.49205817172341115, 0.05529822631508752, 0.2112283719951349, 0.8958377556815078, 0.5044435708562829, 0.3723466293304377, 0.5270150071089016, 0.6226472050312687], 1.0396270891383442e-11, 3.2243248737345687e-6)
 ([0.4339142195043123, 0.5044435708562829, 0.07933388944734077, 0.9354809757450431], 0.01624464332715782, 0.12745447550854314)
 ([0.16643864408566544, 0.25858546995315457, 0.5044435708562829, 0.07266057198023657, 0.18342915487003242, 0.5270150071089016, 0.8230694654507019, 0.6349805161807821, 0.2112283719951349, 0.802762551279973, 0.8958377556815078, 0.3475083889757653, 0.6226472050312687], 2.619593972213026e-6, 0.0016185159783619766)
 ([0.02698475249996979, 0.8230694654507019, 0.16643864408566544, 0.9807576556964709, 0.25858546995315457, 0.3723466293304377, 0.9354809757450431, 0.5044435708562829, 0.8958377556815078, 0.18342915487003242], 2.7068714326916342e-5, 0.005202760260372983)
 ([0.8958377556815078, 0.837335454222923, 0.6226472050312687, 0.3723466293304377, 0.9807576556964709, 0.0915850139744725, 0.6599599239949302, 0.05529822631508752, 0.4339142195043123, 0.07266057198023657  …  0.6349805161807821, 0.9786385378290976, 0.2112283719951349, 0.030185550039208087, 0.18342915487003242, 0.7219827350406843, 0.9893116874624179, 0.48302213696845187, 0.1735745757945074, 0.5044435708562829], 1.4345990586807404e-18, 1.1977474937067247e-9)

julia> println("Multi-threaded approach:")
Multi-threaded approach:

julia> @btime multi_results = multithreaded_approach($random_floats, 11)
  32.084 μs (489 allocations: 432.30 KiB)
11-element Vector{Tuple{Vector{Float64}, Float64, Float64}}:
 ([0.05529822631508752, 0.9786385378290976, 0.07933388944734077, 0.6349805161807821, 0.5044435708562829, 0.9893116874624179, 0.48302213696845187, 0.837335454222923, 0.5270150071089016, 0.5802116426919226  …  0.18342915487003242, 0.3723466293304377, 0.030185550039208087, 0.02698475249996979, 0.6637579955379869, 0.3475083889757653, 0.4339142195043123, 0.07266057198023657, 0.09443144857141617, 0.32166161915780656], 5.355720320487991e-18, 2.314242925988538e-9)
 ([0.5270150071089016, 0.1735745757945074, 0.25858546995315457, 0.3906633890864917, 0.4339142195043123, 0.9354809757450431, 0.6599599239949302, 0.6226472050312687, 0.8230694654507019, 0.030185550039208087  …  0.49205817172341115, 0.3475083889757653, 0.07266057198023657, 0.8958377556815078, 0.9807576556964709, 0.0915850139744725, 0.6349805161807821, 0.32166161915780656, 0.09443144857141617, 0.5802116426919226], 1.3544033311393233e-13, 3.6802219106180587e-7)
 ([0.4339142195043123, 0.16643864408566544], 0.07222009434378755, 0.2687379659515707)
 ([0.8230694654507019, 0.6637579955379869, 0.07266057198023657, 0.07933388944734077, 0.8120185311491867, 0.02698475249996979, 0.49205817172341115, 0.4339142195043123, 0.5447580773835046, 0.802762551279973  …  0.5802116426919226, 0.8958377556815078, 0.5270150071089016, 0.0915850139744725, 0.05529822631508752, 0.030185550039208087, 0.2112283719951349, 0.35041553136394354, 0.9786385378290976, 0.9807576556964709], 4.339631388231082e-15, 6.587587865244062e-8)
 ([0.6599599239949302, 0.5802116426919226, 0.0915850139744725, 0.8230694654507019, 0.16643864408566544, 0.9807576556964709, 0.030185550039208087, 0.7219827350406843, 0.2112283719951349, 0.837335454222923, 0.4339142195043123, 0.3723466293304377, 0.05529822631508752, 0.07933388944734077, 0.8958377556815078, 0.02698475249996979], 3.1119132547311717e-10, 1.7640615790643964e-5)
 ([0.2112283719951349, 0.16643864408566544, 0.48302213696845187, 0.07933388944734077, 0.9354809757450431, 0.3475083889757653, 0.05529822631508752, 0.5270150071089016, 0.25858546995315457, 0.8230694654507019, 0.1735745757945074, 0.9786385378290976, 0.9807576556964709, 0.8958377556815078, 0.6599599239949302, 0.5447580773835046, 0.3906633890864917, 0.0915850139744725], 5.215009630990478e-9, 7.221502358228845e-5)
 ([0.3723466293304377, 0.5270150071089016, 0.8958377556815078, 0.4339142195043123, 0.16643864408566544, 0.5044435708562829, 0.802762551279973, 0.8120185311491867, 0.837335454222923, 0.6226472050312687, 0.35041553136394354, 0.02698475249996979, 0.32166161915780656, 0.8230694654507019, 0.5802116426919226], 3.1614691015289295e-6, 0.0017780520525364069)
 ([0.1735745757945074, 0.18342915487003242, 0.49205817172341115, 0.7219827350406843, 0.8120185311491867, 0.09443144857141617, 0.6637579955379869, 0.3475083889757653, 0.25858546995315457, 0.5270150071089016], 2.726355532217744e-5, 0.005221451457418468)
 ([0.6226472050312687, 0.09443144857141617, 0.5447580773835046, 0.8120185311491867, 0.2112283719951349, 0.9786385378290976, 0.0915850139744725, 0.25858546995315457, 0.05529822631508752, 0.3475083889757653, 0.1735745757945074, 0.837335454222923, 0.9354809757450431, 0.07266057198023657, 0.07933388944734077], 1.9177236307351358e-9, 4.3791821505106814e-5)
 ([0.837335454222923, 0.9807576556964709, 0.9354809757450431, 0.3906633890864917, 0.35041553136394354, 0.7219827350406843, 0.16643864408566544, 0.32166161915780656, 0.02698475249996979, 0.8120185311491867  …  0.9786385378290976, 0.8230694654507019, 0.25858546995315457, 0.0915850139744725, 0.05529822631508752, 0.9893116874624179, 0.6226472050312687, 0.030185550039208087, 0.4339142195043123, 0.8958377556815078], 5.420014269395063e-14, 2.328092409977547e-7)
 ([0.09443144857141617, 0.49205817172341115, 0.07933388944734077, 0.6349805161807821, 0.802762551279973, 0.8230694654507019, 0.02698475249996979, 0.5044435708562829, 0.16643864408566544, 0.6226472050312687, 0.8958377556815078, 0.9807576556964709, 0.6599599239949302, 0.7219827350406843, 0.6637579955379869, 0.2112283719951349, 0.07266057198023657, 0.5447580773835046, 0.35041553136394354, 0.3723466293304377], 6.613537357512376e-10, 2.5716798707289318e-5)

julia> println("Chunked multi-threaded approach:")
Chunked multi-threaded approach:

julia> @btime chunked_results = chunked_multithreaded_approach($random_floats, 11)

       # Run for actual results display
  31.292 μs (493 allocations: 436.25 KiB)
11-element Vector{Tuple{Vector{Float64}, Float64, Float64}}:
 ([0.07266057198023657, 0.16643864408566544], 0.012093527078879469, 0.10997057369532756)
 ([0.3475083889757653, 0.3906633890864917, 0.05529822631508752, 0.48302213696845187, 0.802762551279973, 0.25858546995315457, 0.6349805161807821, 0.5044435708562829, 0.9354809757450431, 0.6637579955379869  …  0.07933388944734077, 0.8230694654507019, 0.9807576556964709, 0.02698475249996979, 0.6226472050312687, 0.030185550039208087, 0.18342915487003242, 0.35041553136394354, 0.7219827350406843, 0.9786385378290976], 5.938517195012207e-12, 2.4369073012759854e-6)
 ([0.030185550039208087, 0.3475083889757653, 0.3723466293304377], 0.003905816302316442, 0.06249653032222223)
 ([0.5044435708562829, 0.07266057198023657, 0.32166161915780656, 0.48302213696845187, 0.18342915487003242, 0.4339142195043123, 0.3723466293304377, 0.8230694654507019, 0.2112283719951349, 0.35041553136394354, 0.6226472050312687, 0.05529822631508752, 0.5802116426919226, 0.8958377556815078, 0.5270150071089016, 0.9893116874624179, 0.802762551279973, 0.9807576556964709], 7.553416366493642e-8, 0.00027483479340312136)
 ([0.837335454222923, 0.09443144857141617, 0.6349805161807821, 0.5802116426919226, 0.2112283719951349, 0.48302213696845187, 0.5044435708562829, 0.35041553136394354, 0.9786385378290976, 0.49205817172341115  …  0.9354809757450431, 0.32166161915780656, 0.6226472050312687, 0.05529822631508752, 0.030185550039208087, 0.8120185311491867, 0.802762551279973, 0.07266057198023657, 0.18342915487003242, 0.6637579955379869], 1.9255830968611886e-12, 1.3876538101634674e-6)
 ([0.48302213696845187, 0.32166161915780656, 0.49205817172341115, 0.9893116874624179, 0.3475083889757653, 0.5447580773835046, 0.05529822631508752, 0.25858546995315457, 0.3723466293304377, 0.2112283719951349  …  0.6349805161807821, 0.5802116426919226, 0.16643864408566544, 0.35041553136394354, 0.8120185311491867, 0.1735745757945074, 0.6599599239949302, 0.18342915487003242, 0.6226472050312687, 0.9354809757450431], 1.2588114685511676e-15, 3.547973320856807e-8)
 ([0.5044435708562829, 0.3475083889757653, 0.32166161915780656, 0.8958377556815078, 0.6226472050312687, 0.2112283719951349, 0.9786385378290976, 0.6637579955379869, 0.8230694654507019, 0.7219827350406843  …  0.9354809757450431, 0.8120185311491867, 0.5447580773835046, 0.48302213696845187, 0.837335454222923, 0.0915850139744725, 0.09443144857141617, 0.07266057198023657, 0.49205817172341115, 0.4339142195043123], 2.275039330477925e-11, 4.769737236450164e-6)
 ([0.0915850139744725, 0.5270150071089016, 0.6226472050312687, 0.07266057198023657, 0.49205817172341115, 0.9807576556964709, 0.32166161915780656, 0.18342915487003242, 0.9786385378290976, 0.3475083889757653  …  0.2112283719951349, 0.802762551279973, 0.8230694654507019, 0.6599599239949302, 0.35041553136394354, 0.9354809757450431, 0.02698475249996979, 0.7219827350406843, 0.48302213696845187, 0.8120185311491867], 3.7770240410526037e-14, 1.9434567247697088e-7)
 ([0.8958377556815078, 0.4339142195043123, 0.9807576556964709, 0.48302213696845187, 0.35041553136394354, 0.16643864408566544, 0.18342915487003242, 0.5447580773835046, 0.25858546995315457, 0.3475083889757653  …  0.02698475249996979, 0.3906633890864917, 0.802762551279973, 0.5270150071089016, 0.030185550039208087, 0.9354809757450431, 0.8230694654507019, 0.2112283719951349, 0.5802116426919226, 0.9893116874624179], 5.446156247344711e-16, 2.3337001194122415e-8)
 ([0.49205817172341115, 0.9786385378290976, 0.8120185311491867, 0.030185550039208087, 0.35041553136394354, 0.1735745757945074, 0.18342915487003242, 0.9354809757450431, 0.3906633890864917, 0.0915850139744725  …  0.837335454222923, 0.5270150071089016, 0.9893116874624179, 0.8958377556815078, 0.5044435708562829, 0.32166161915780656, 0.4339142195043123, 0.6637579955379869, 0.7219827350406843, 0.6599599239949302], 6.788963632131178e-12, 2.6055639758277244e-6)
 ([0.1735745757945074, 0.030185550039208087, 0.18342915487003242, 0.35041553136394354, 0.3723466293304377, 0.5044435708562829, 0.5270150071089016, 0.9786385378290976], 3.262437742629436e-5, 0.00571177533051628)

julia> Random.seed!(42)
TaskLocalRNG()

julia> final_results = original_approach(random_floats, 11)
11-element Vector{Tuple{Vector{Float64}, Float64, Float64}}:
 ([0.9786385378290976, 0.9893116874624179, 0.07266057198023657, 0.5044435708562829, 0.09443144857141617, 0.5802116426919226, 0.25858546995315457, 0.837335454222923, 0.02698475249996979, 0.8230694654507019  …  0.9807576556964709, 0.802762551279973, 0.4339142195043123, 0.18342915487003242, 0.5447580773835046, 0.3475083889757653, 0.35041553136394354, 0.8958377556815078, 0.9354809757450431, 0.5270150071089016], 6.616351194345317e-10, 2.5722268940249647e-5)
 ([0.02698475249996979, 0.7219827350406843, 0.18342915487003242, 0.6599599239949302, 0.3475083889757653, 0.5447580773835046, 0.35041553136394354, 0.9893116874624179, 0.9807576556964709, 0.4339142195043123, 0.8230694654507019, 0.8958377556815078, 0.0915850139744725], 4.448079045892506e-6, 0.002109046952036039)
 ([0.6349805161807821, 0.9893116874624179, 0.7219827350406843, 0.09443144857141617, 0.3475083889757653, 0.49205817172341115, 0.9786385378290976], 0.007167060308582192, 0.08465849224137052)
 ([0.8230694654507019, 0.8120185311491867, 0.2112283719951349, 0.3475083889757653, 0.6599599239949302, 0.5802116426919226, 0.9786385378290976, 0.5447580773835046, 0.802762551279973, 0.3723466293304377  …  0.48302213696845187, 0.18342915487003242, 0.32166161915780656, 0.49205817172341115, 0.07266057198023657, 0.6226472050312687, 0.09443144857141617, 0.07933388944734077, 0.5044435708562829, 0.05529822631508752], 1.0214297693246253e-14, 1.0106580872503942e-7)
 ([0.8120185311491867, 0.6637579955379869, 0.9786385378290976, 0.49205817172341115], 0.2595460767113138, 0.5094566485102671)
 ([0.8230694654507019, 0.02698475249996979, 0.35041553136394354, 0.8120185311491867, 0.6226472050312687, 0.802762551279973, 0.1735745757945074, 0.8958377556815078, 0.32166161915780656, 0.48302213696845187  …  0.3906633890864917, 0.07266057198023657, 0.2112283719951349, 0.9807576556964709, 0.49205817172341115, 0.6349805161807821, 0.25858546995315457, 0.9893116874624179, 0.18342915487003242, 0.9786385378290976], 5.67586382332321e-17, 7.533832904520254e-9)
 ([0.18342915487003242, 0.6599599239949302, 0.3475083889757653, 0.2112283719951349, 0.25858546995315457, 0.02698475249996979, 0.8120185311491867, 0.48302213696845187, 0.07933388944734077, 0.802762551279973, 0.35041553136394354], 5.427354689559372e-7, 0.0007367058225342984)
 ([0.07933388944734077, 0.07266057198023657, 0.9786385378290976, 0.6599599239949302, 0.9807576556964709, 0.7219827350406843, 0.5447580773835046, 0.6637579955379869, 0.6226472050312687, 0.030185550039208087, 0.32166161915780656, 0.837335454222923, 0.9354809757450431, 0.16643864408566544], 7.513259513109988e-7, 0.0008667906040740167)
 ([0.5802116426919226, 0.9354809757450431, 0.5270150071089016, 0.6349805161807821, 0.837335454222923, 0.02698475249996979, 0.49205817172341115, 0.18342915487003242, 0.802762551279973, 0.8958377556815078], 0.0002663937189966142, 0.016321572197451267)
 ([0.3906633890864917, 0.07933388944734077, 0.25858546995315457, 0.05529822631508752, 0.48302213696845187, 0.0915850139744725, 0.837335454222923, 0.9807576556964709], 1.6100131300867773e-5, 0.004012496891072662)
 ([0.9807576556964709, 0.9893116874624179, 0.3475083889757653, 0.6637579955379869, 0.5044435708562829, 0.1735745757945074, 0.9354809757450431, 0.16643864408566544, 0.35041553136394354, 0.7219827350406843, 0.9786385378290976, 0.802762551279973, 0.8230694654507019, 0.3906633890864917, 0.02698475249996979, 0.32166161915780656, 0.030185550039208087], 5.108969340350865e-8, 0.0002260302931102569)

julia> println("\n=== Sample Results (11 combinations) ===")

=== Sample Results (11 combinations) ===

julia> for (i, (combo, product, sqrt_result)) in enumerate(final_results[1:3])
           println("Combination $i (size $(length(combo))):")
           println("  Elements: $(round.(combo[1:min(5, length(combo))], digits=3))$(length(combo) > 5 ? "..." : "")")
           println("  Product: $(round(product, digits=6))")
           println("  √Product: $(round(sqrt_result, digits=6))")
       end

       # Summary statistics
Combination 1 (size 24):
  Elements: [0.979, 0.989, 0.073, 0.504, 0.094]...
  Product: 0.0
  √Product: 2.6e-5
Combination 2 (size 13):
  Elements: [0.027, 0.722, 0.183, 0.66, 0.348]...
  Product: 4.0e-6
  √Product: 0.002109
Combination 3 (size 7):
  Elements: [0.635, 0.989, 0.722, 0.094, 0.348]...
  Product: 0.007167
  √Product: 0.084658

julia> valid_results = [r[3] for r in final_results if !isnan(r[3])]
11-element Vector{Float64}:
 2.5722268940249647e-5
 0.002109046952036039
 0.08465849224137052
 1.0106580872503942e-7
 0.5094566485102671
 7.533832904520254e-9
 0.0007367058225342984
 0.0008667906040740167
 0.016321572197451267
 0.004012496891072662
 0.0002260302931102569

julia> println("\n=== Summary Statistics ===")

=== Summary Statistics ===

julia> println("Valid combinations: $(length(valid_results))")
Valid combinations: 11

julia> if length(valid_results) > 0
           println("Mean √: $(round(mean(valid_results), digits=6))")
           println("Max √:  $(round(maximum(valid_results), digits=6))")
           println("Min √:  $(round(minimum(valid_results), digits=6))")
           
           sizes = [length(r[1]) for r in final_results]
           println("Combination size distribution:")
           for size in sort(unique(sizes))
               count = sum(sizes .== size)
               println("  Size $size: $count combinations")
           end
       end
Mean √: 0.056219
Max √:  0.509457
Min √:  0.0
Combination size distribution:
  Size 4: 1 combinations
  Size 7: 1 combinations
  Size 8: 1 combinations
  Size 10: 1 combinations
  Size 11: 1 combinations
  Size 13: 1 combinations
  Size 14: 1 combinations
  Size 17: 1 combinations
  Size 24: 1 combinations
  Size 28: 1 combinations
  Size 36: 1 combinations

julia> println("\n=== Threading Analysis ===")

=== Threading Analysis ===

julia> println("Key findings:")
Key findings:

julia> println("- For small sizes (11-100): Threading overhead may exceed benefits")
- For small sizes (11-100): Threading overhead may exceed benefits

julia> println("- For medium sizes (1000+): @threads shows good speedup")
- For medium sizes (1000+): @threads shows good speedup

julia> println("- For large sizes (10k+): Chunked approach often performs best")
- For large sizes (10k+): Chunked approach often performs best

julia> println("- Optimal approach depends on: problem size, thread count, and system")
- Optimal approach depends on: problem size, thread count, and system

julia> println("- Random number generation can be a bottleneck in parallel code")
- Random number generation can be a bottleneck in parallel code

julia> println("- Each thread needs its own RNG to avoid synchronization overhead")
- Each thread needs its own RNG to avoid synchronization overhead

Thank you so much. You are all great person.