if don’t println, it costs 0.173 seconds
if ‘o’=>‘a’ and ‘a’=>‘o’, does it work? because Perl tr can do it.
tr/ao/oa/
‘The cat sat on the mat’
to
‘The cot sot an the mot’
Every character changes ( ‘a’ to ‘o’ and ‘o’ to ‘a’) independently against the original string.
No, in its current form it applies both replacements sequentially (as you can see above). Clearly, a simple loop would give you what you want. But I feel there really should just be a proper multi-pair version replace("hallo", 'a'=>'o', 'o'=>'a')
.
UPDATE: Oh, apparently I wanted this before as I gave https://github.com/JuliaLang/julia/pull/30457 a heart
If that, can simplify the syntax like the above jling’s function?
@Yan just try this yourself please, that’s why I have to split and join a string as an array of chars.
See also this discussion: How to replace several characters in a string using Julia - Stack Overflow
This appears to be fast:
function tr(str, reps...)
d = Dict(reps)
return(map(s -> get(d,s,s), str))
end
julia> mystring = "The cat sat on the mat"
"The cat sat on the mat"
julia> @btime perl_tr(mystring,"ao","oL")
1.406 μs (21 allocations: 1.45 KiB)
"The cot sot Ln the mot"
julia> @btime tr(mystring,'a'=>'o','o'=>'L')
951.000 ns (10 allocations: 768 bytes)
"The cot sot Ln the mot"
(as long as you only replace single characters defined with: 'c'
(and not strings with "str"
)
Good, the performance is improved.
It still costs 0.139851 and 0.091713 seconds for 100000 loops, but Perl costs only 0.015 seconds.
Use @btime
from BenchmarkTools.jl for benchmarking instead of @time
. (See my post above.)
Oh, only 13 ns using @btime test, but it seems unbelievable.
I test 1000000 loops again. Perl is obvious faster although it is 0.18 s using Perl times test
I am wondering if the @btime is not support “@btime begin… end”.
and I think the @time test is quite similar to the Perl times test.
Don’t worry about screen scrolling, I run them by using " julia new_trans.jl > temp.julia.txt"
If you are going to apply this replacement to many small strings (which is the only situation in which you actually care about the performance on small strings), then you would want to cache the dictionary between calls. That saves me 30% on my machine. I save an additional 20% by using an explicit loop and no dictionary.
julia> @btime tr($mystring,$('a'=>'o'),$('o'=>'L'))
925.393 ns (9 allocations: 736 bytes)
"The cot sot Ln the mot"
julia> d = Dict('a'=>'o','o'=>'L')
Dict{Char,Char} with 2 entries:
'a' => 'o'
'o' => 'L'
julia> tr(str, d::Dict) = map(s -> get(d,s,s), str)
tr (generic function with 2 methods)
julia> @btime tr($mystring,$d)
654.245 ns (5 allocations: 256 bytes)
"The cot sot Ln the mot"
julia> function mytr(s)
buf = IOBuffer(sizehint=sizeof(s))
for c in s
print(buf, c == 'a' ? 'o' : c == 'o' ? 'L' : c)
end
return String(take!(buf))
end
mytr (generic function with 1 method)
julia> @btime mytr($mystring)
451.426 ns (4 allocations: 240 bytes)
"The cot sot Ln the mot"
You can probably make things even faster if you take advantage of the fact that your replacements are ASCII, in which case you can look at individual bytes rather than decoding/comparing Unicode characters. But that might be excessive work for such a simple function. (Heck, you could go even further by grabbing multi-byte chunks and using SIMD instructions.)
Yes good idea, but the bottleneck Dict
remains in a generic usage. Instead of using Dict I tried out unrolling the replacement with the package Unrolled
, and voila:
julia> using Unrolled
julia> @unroll function decider(c,reps)
@unroll for x in reps
if x.first == c return x.second end
end
return c
end
decider_unrolled_expansion_ (generic function with 1 method)
julia> tr(str, reps...) = map(s -> decider(s,reps), str)
tr (generic function with 1 method)
julia> @btime tr("The cat sat on the mat",'a'=>'o','o'=>'L')
271.617 ns (4 allocations: 240 bytes)
"The cot sot Ln the mot"
The speed is no getting towards Perl. I’m not sure if the Perl version allocates memory in each iteration…
Perl uses reference counting for memory management, and it is probably more efficient for allocating and then discarding many small strings like you are doing here. (Indeed, working with lots of small strings is probably the main situation that Perl’s memory management is optimized for.) (Julia uses mark-and-sweep garbage collection, because reference counting imposes other tradeoffs for compiled and multithreaded code.)
The real question here: in what circumstance is the tr
operation on small strings performance-critical? In a given realistic setting, there is usually a way to avoid doing lots of re-allocations in your inner loop…
Well, if anything, allocating a lot of temporary object is what tracing GC is good for… Obviously you can add optimization to reference counting to deal with this but the most naive form will get hurt a lot by repeated ref count operation and repeated freeing.
Possibly more importantly, perl’s strings are mutable, so a tr
operation can happen in-place.