tk3369
December 24, 2017, 12:30am
1
I happened to have no access to Internet this morning and didn’t remember the function that joins strings together. I’m a newbie, clearly
So I wrote my own:
concatenate(strArray::Vector{String}, separator=",") =
foldl((x, y) -> *(x, y, separator), "", strArray)[1:end-length(separator)]
I was surprised to find it several times faster than join() when I get home. Perhaps I don’t have the baggage to deal with IO?
julia> for j in 1:10 @time for i in 1:1000000 concatenate(["abc", "def"], ",") end end
0.227739 seconds (5.00 M allocations: 198.364 MiB, 14.12% gc time)
0.218257 seconds (5.00 M allocations: 198.364 MiB, 14.92% gc time)
0.225936 seconds (5.00 M allocations: 198.364 MiB, 15.09% gc time)
0.228916 seconds (5.00 M allocations: 198.364 MiB, 14.85% gc time)
0.230663 seconds (5.00 M allocations: 198.364 MiB, 14.67% gc time)
0.227495 seconds (5.00 M allocations: 198.364 MiB, 14.43% gc time)
0.224295 seconds (5.00 M allocations: 198.364 MiB, 14.72% gc time)
0.225909 seconds (5.00 M allocations: 198.364 MiB, 15.89% gc time)
0.230995 seconds (5.00 M allocations: 198.364 MiB, 16.17% gc time)
0.230334 seconds (5.00 M allocations: 198.364 MiB, 15.15% gc time)
julia> for j in 1:10 @time for i in 1:1000000 join(["abc", "def"], ",") end end
0.844441 seconds (8.00 M allocations: 350.952 MiB, 6.85% gc time)
0.867218 seconds (8.00 M allocations: 350.952 MiB, 6.81% gc time)
0.851981 seconds (8.00 M allocations: 350.952 MiB, 7.00% gc time)
0.839161 seconds (8.00 M allocations: 350.952 MiB, 7.29% gc time)
0.853017 seconds (8.00 M allocations: 350.952 MiB, 6.99% gc time)
0.845652 seconds (8.00 M allocations: 350.952 MiB, 7.04% gc time)
0.855530 seconds (8.00 M allocations: 350.952 MiB, 7.60% gc time)
0.853978 seconds (8.00 M allocations: 350.952 MiB, 7.33% gc time)
0.852078 seconds (8.00 M allocations: 350.952 MiB, 6.96% gc time)
0.868495 seconds (8.00 M allocations: 350.952 MiB, 7.02% gc time)
What if you make the number of strings to join very long? It looks like in your version you create a new temporary for each *
1 Like
Please read the performance tips in the manual: in particular, don’t benchmark in global scope – wrap everything in a function.
Microbenchmarks like this are best done using the BenchmarkTools.jl package.
1 Like
tk3369
December 24, 2017, 8:29am
4
Thanks @dpsanders . Results using BenchmarkTools.jl is shown below.
julia> @benchmark join(["abc", "def"], ",")
BenchmarkTools.Trial:
memory estimate: 368 bytes
allocs estimate: 8
--------------
minimum time: 685.204 ns (0.00% GC)
median time: 698.079 ns (0.00% GC)
mean time: 764.753 ns (4.85% GC)
maximum time: 18.352 μs (87.18% GC)
--------------
samples: 10000
evals/sample: 152
julia> @benchmark concatenate(["abc", "def"], ",")
BenchmarkTools.Trial:
memory estimate: 208 bytes
allocs estimate: 5
--------------
minimum time: 170.898 ns (0.00% GC)
median time: 175.441 ns (0.00% GC)
mean time: 204.251 ns (9.69% GC)
maximum time: 4.017 μs (85.34% GC)
--------------
samples: 10000
evals/sample: 746
Let’s try suggestion from @bramtayl . Wow… 19GiB for a single run! Definitely not scalable
julia> @benchmark join($ar, ",")
BenchmarkTools.Trial:
memory estimate: 514.67 KiB
allocs estimate: 23
--------------
minimum time: 6.261 ms (0.00% GC)
median time: 6.621 ms (0.00% GC)
mean time: 6.912 ms (0.49% GC)
maximum time: 11.861 ms (25.71% GC)
--------------
samples: 719
evals/sample: 1
julia> @benchmark concatenate($ar, ",")
BenchmarkTools.Trial:
memory estimate: 18.64 GiB
allocs estimate: 100002
--------------
minimum time: 14.543 s (15.87% GC)
median time: 14.543 s (15.87% GC)
mean time: 14.543 s (15.87% GC)
maximum time: 14.543 s (15.87% GC)
--------------
samples: 1
evals/sample: 1
Let’s not go crazy with so many strings. That’s probably not practical anyways. Let’s go with just 100 strings. It’s about the same.
julia> @benchmark join($ar, ",")
BenchmarkTools.Trial:
memory estimate: 1.48 KiB
allocs estimate: 13
--------------
minimum time: 7.053 μs (0.00% GC)
median time: 7.211 μs (0.00% GC)
mean time: 7.649 μs (1.05% GC)
maximum time: 436.433 μs (93.34% GC)
--------------
samples: 10000
evals/sample: 4
julia> @benchmark concatenate($ar, ",")
BenchmarkTools.Trial:
memory estimate: 24.30 KiB
allocs estimate: 102
--------------
minimum time: 6.107 μs (0.00% GC)
median time: 7.026 μs (0.00% GC)
mean time: 8.750 μs (14.18% GC)
maximum time: 340.156 μs (91.08% GC)
--------------
samples: 10000
evals/sample: 5
2 Likes
Dan
December 25, 2017, 9:26am
5
Going for even more optimization, try the following:
function fastjoin(a::Vector{String},sep::String)
if length(a) == 1
return a[1]::String
end
n = 0
seplen = sep.len
for str in a
n += str.len
end
n += seplen*(length(a)-1)
out = Base._string_n(n)
offs = 1
unsafe_copy!(pointer(out,offs), pointer(a[1]), a[1].len)
offs += a[1].len
@inbounds for i in 2:length(a)
unsafe_copy!(pointer(out,offs), pointer(sep), seplen)
offs += seplen
unsafe_copy!(pointer(out,offs), pointer(a[i]), a[i].len)
offs += a[i].len
end
return out
end
It gives a 5x improvement over join
on my machine, but it would be best for comparison to run:
@benchmark fastjoin($ar, ",")
on your machine.
Ah: Julia 0.6.2. Strings change in future versions.
1 Like