Hi, I am loading lots of gzipped jsons, parsing and further processing them.
TL;DR: Why is map much slower than list comprehension/dot notation when using in tmap on multiple threads?
I am trying to use multiple threads for it, using library https://github.com/baggepinnen/ThreadTools.jl to make the processing faster, but I am seeing weird performance inconsistency and I just don’t get why this happens.
Note: I am using external process to unzip the json because it is faster than unzipping using https://github.com/bicycle1885/TranscodingStreams.jl
I am doing this benchmark on Julia 1.3.0, Windows 10, CPU Core 7i-9850H.
I am benchmarking both only loading the zipped json to string and loading zipped json+parsing it
using ThreadTools, BenchmarkTools
function load_shas(shas)
[read(`7z e -so $(jsondir(sha)).json.gz`, String) for sha in shas]
end
function load_parse_shas(shas)
[JSON.parse(read(`7z e -so $(jsondir(sha)).json.gz`, String)) for sha in shas]
end
load_parse_sha(sha) = JSON.parse(read(`7z e -so $(jsondir(sha)).json.gz`, String))
println("foreach read")
@btime foreach(sha->read(`7z e -so $(jsondir(sha)).json.gz`, String), shas[1:1000])
# 16.029 s (1086474 allocations: 2.37 GiB)
println("map read")
@btime map(sha->read(`7z e -so $(jsondir(sha)).json.gz`, String), shas[1:1000])
# 15.986 s (1084171 allocations: 2.37 GiB)
println("tmap function read")
@btime collect(Iterators.flatten(tmap(load_shas, Iterators.partition(shas[1:1000], 50))))
# 5.872 s (876144 allocations: 2.35 GiB)
println("tmap map read")
@btime collect(Iterators.flatten(tmap(x->map(sha->read(`7z e -so $(jsondir(sha)).json.gz`, String), x), Iterators.partition(shas[1:1000], 50))))
# 6.361 s (874188 allocations: 2.31 GiB)
println("tmap list comprehensions read")
@btime collect(Iterators.flatten(tmap(x->[read(`7z e -so $(jsondir(sha)).json.gz`, String) for sha in x], Iterators.partition(shas[1:1000], 50))))
# 6.289 s (877305 allocations: 2.37 GiB)
println("foreach read parse")
@btime foreach(sha->JSON.parse(read(`7z e -so $(jsondir(sha)).json.gz`, String)), shas[1:1000])
# 22.340 s (76891104 allocations: 7.38 GiB)
println("map read parse")
@btime map(sha->JSON.parse(read(`7z e -so $(jsondir(sha)).json.gz`, String)), shas[1:1000])
# 42.694 s (76889028 allocations: 7.37 GiB)
println("tmap function read parse")
@btime collect(Iterators.flatten(tmap(load_parse_shas, Iterators.partition(shas[1:1000], 50))))
# 19.709 s (76697131 allocations: 7.34 GiB)
println("tmap map read parse")
@btime collect(Iterators.flatten(tmap(x->map(sha->JSON.parse(read(`7z e -so $(jsondir(sha)).json.gz`, String)), x), Iterators.partition(shas[1:1000], 50))))
# 19.942 s (76695967 allocations: 7.31 GiB)
println("tmap list comprehensions read parse")
@btime collect(Iterators.flatten(tmap(x->[JSON.parse(read(`7z e -so $(jsondir(sha)).json.gz`, String)) for sha in x], Iterators.partition(shas[1:1000], 50))))
# 22.213 s (76689232 allocations: 7.29 GiB)
println("tmap function read_parse dot")
@btime collect(Iterators.flatten(tmap(x->load_parse_sha.(x), Iterators.partition(shas[1:1000], 50))))
# 22.235 s (76688233 allocations: 7.33 GiB)
and I can not get my head around this:
using map in tmap takes ~19 sec., and the list comprehension and dot notation take 22 seconds, which is slower, although it should do same thing.
I am using partition to lists of size 50 because I am then calculating some aggregated statistics and them merging them for those blocks.
EDIT: my bad, because of typo I used different variable sometimes, after rerunning the code again, correctly, the results are bit different than in original post.