Hi,
I need to analyze a long vector and calculate the occurrences of certain letters, as it will be very long it will be better to use parallel processing for this.
The thing is, I can do this with map(), storing tasks. But I would like to compare the efficiency in case of just using “for” in this process.
To create a reproducible example, let’s calculate the occurrences of the numbers “1” and “2” in a vector.
using ChunkSplitters
array1 = repeat([1,2],1111)
chk = chunks(1:length(array1); n=Threads.nthreads(), split=:batch)
counts_total = Dict(ii => 0 for ii in [1,2])
tasks = map(chk) do inds
Threads.@spawn begin
sub_counts = Dict(ii => 0 for ii in [1,2])
for val in array1[inds]
sub_counts[val] += 1
end
sub_counts
end
end
thread_sums = fetch.(tasks)
for sub in thread_sums
merge!(+,counts_total,sub)
end
counts_total
In this case it is possible to obtain the general sum:
Dict{Int64, Int64} with 2 entries:
2 => 1111
1 => 1111
Trying to do the same thing just with a loop, I don’t know how to save the tasks so I have to update the total sum vector with merge! inside the loop:
using ChunkSplitters
array1 = repeat([1,2],1111)
chk = chunks(1:length(array1); n=Threads.nthreads(), split=:batch)
counts_total = Dict(ii => 0 for ii in [1,2])
for inds in chk
Threads.@spawn begin
sub_counts = Dict(ii => 0 for ii in [1,2])
for val in array1[inds]
sub_counts[val] += 1
end
#as I can't store staks i have to merge! counts_total from here
merge!(+,counts_total,sub_counts)
end
end
counts_total
Therefore, there is an error in the general account:
Dict{Int64, Int64} with 2 entries:
2 => 667
1 => 667
Is there any way to do this with just for without using map?
Thank you very much,
my best