I am trying to parallelize an algorithm which looks like this. I couldn’t express my code right in my previous post
The following algorithm takes an element from an odd column and adds it to the adjacent even coloumn.
using BenchmarkTools, Random, StatsBase, Distributions
using FLoops, FoldsThreads, ThreadsX
function test()
rows, coloumns = 990, 99990
#########Initializing Matrix A####################################
A = zeros(Int64, rows, coloumns)
########Filling only 75% of rows##################################
entries :: Vector{Int} = sample(1:floor(Int,0.75*rows), coloumns; replace=true)
for i in axes(A,2)
A[1:entries[i],i] .= shuffle(1:entries[i])
end
#########Declaring Dummy Variables####################################
dummy_A = deepcopy(A)
dummy_entries = deepcopy(entries)
# println(dummy_A')
@floop for i in 1:2:coloumns-1
index1 :: Int = rand(1:dummy_entries[i])
index2 :: Int = rand(1:dummy_entries[i+1])
temp :: Int = dummy_A[index1,i]
###############Removing an element from ith coloumn#########################
for j in index1:dummy_entries[i]
dummy_A[j,i] = dummy_A[j+1,i]
end
dummy_entries[i] = dummy_entries[i] - 1
###############Adding an element in i+1th coloumn#########################
for j in dummy_entries[i+1]:-1:index2
dummy_A[j+1,i+1] = dummy_A[j,i+1]
end
dummy_A[index2,i+1] = temp
dummy_entries[i+1] = dummy_entries[i+1] + 1
###############Updating only if condition satisfies#########################
if rand() <= 0.5
entries[i], entries[i+1] = dummy_entries[i], dummy_entries[i+1]
A[1:entries[i],i] .= dummy_A[1:entries[i],i]
A[1:entries[i+1],i+1] .= dummy_A[1:entries[i+1],i+1]
end
end
# println(dummy_A')
end
@benchmark begin
test()
end
Results for the Serial Version
BenchmarkTools.Trial: 6 samples with 1 evaluation.
Range (min … max): 784.253 ms … 877.906 ms ┊ GC (min … max): 2.06% … 10.24%
Time (median): 871.206 ms ┊ GC (median): 10.31%
Time (mean ± σ): 856.444 ms ± 36.009 ms ┊ GC (mean ± σ): 9.12% ± 3.41%
█ █ ██ ██
█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁██▁██ ▁
784 ms Histogram: frequency by time 878 ms <
Memory estimate: 1.91 GiB, allocs estimate: 149748.
Results for Parallel Version with 10 threads
BenchmarkTools.Trial: 6 samples with 1 evaluation.
Range (min … max): 749.625 ms … 1.022 s ┊ GC (min … max): 2.24% … 25.19%
Time (median): 886.070 ms ┊ GC (median): 16.99%
Time (mean ± σ): 888.100 ms ± 114.955 ms ┊ GC (mean ± σ): 16.82% ± 10.66%
█ █ █ █ █ █
█▁▁▁▁▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁█▁▁▁▁▁█ ▁
750 ms Histogram: frequency by time 1.02 s <
Memory estimate: 1.91 GiB, allocs estimate: 149580.
Can someone please let me know where am I making a mistake…?