I’m trying to speed up some code that takes subsets of columns from a data matrix and calculates a maximum score. The orginal code is complicated, so I’ve tried to simplfy it was much as possible here:

```
using Combinatorics
#Helper function to loop through pairs of data columns
allpairs(v) = Iterators.filter(i -> isless(i...), Iterators.product(v,v))
function maxScore(data)
bestScore = 0.0
for (i,j) in allpairs(axes(data,2))
if isodd(i+j)
currentScore = findscore(data,i,j)
if bestScore < currentScore
bestScore = currentScore
end
end
end
return bestScore
end
function findscore(data, i, j)
bestScoreSubset = 0.0
#Create a somewhat small subset
subsets = mod1(i+j,10)
for subset in powerset(1:subsets)
currentScoreSubset = score(data,subset,j)
if bestScoreSubset < currentScoreSubset
bestScoreSubset = currentScoreSubset
end
end
return bestScoreSubset
end
function score(data,subset,j)
if isempty(subset)
return 0.0
end
X = view(data,:,subset)
y = view(data,:,j)
b = X \ y
ŷ = X*b
return sum((yᵢ - ŷᵢ)^2 for (yᵢ, ŷᵢ) in zip(y,ŷ))
end
```

I tried using `ThreadsX`

to parallelize the `maxScore()`

function, but it actually made things slower.

## ThreadsX version of maxScore

```
using ThreadsX
function maxScore_parallel(data)
bestScore = ThreadsX.mapreduce(max, allpairs(axes(data,2))) do (i,j)
if isodd(i+j)
findscore(data,i,j)
else
0.0
end
end
return bestScore
end
```

Here are some benchmarks as well

## Benchmarks

```
julia> using Random, BenchmarkTools
julia> Threads.nthreads()
14
julia> Random.seed!(1);
julia> data = rand(100,20);
julia> @btime maxScore($data)
174.879 ms (612340 allocations: 990.52 MiB)
19.720119456710123
julia> @btime maxScore_parallel($data)
906.113 ms (622461 allocations: 990.90 MiB)
19.720119456710123
```

Any recommendations for improving parallelization would be much appreciated