Hi!
I’ve noticed that length(::String)
isn’t a cheap operation in Julia, since it requires to iterate the string. In fact, it’s more expensive than Python len(str)
, even for unicode strings. Also, length(::String)
is slower in Julia 0.7 than in Julia 0.6:
Julia 0.6.4
julia> const unicode_string = repeat("ñ", 100_000);
julia> length(unicode_string)
100000
julia> quantile([ @elapsed length(unicode_string) for i in 1:100 ], [0.25, 0.5, 0.75])
3-element Array{Float64,1}:
0.00015227
0.000152272
0.000152298
Julia 0.7.0-beta2.81
julia> const unicode_string = repeat("ñ", 100_000);
julia> length(unicode_string)
100000
julia> using Statistics
julia> quantile([ @elapsed length(unicode_string) for i in 1:100 ], [0.25, 0.5, 0.75])
3-element Array{Float64,1}:
0.00046879075
0.000469071
0.00050228975
Julia 0.7.0-beta2.126
julia> const unicode_string = repeat("ñ", 100_000);
julia> length(unicode_string)
100000
julia> using Statistics
julia> quantile([ @elapsed length(unicode_string) for i in 1:100 ], [0.25, 0.5, 0.75])
3-element Array{Float64,1}:
0.00051588
0.000515881
0.000515884
Python 2.7.12
In [1]: unicode_string = u'ñ' * 100000
In [2]: %time len(unicode_string)
CPU times: user 6 µs, sys: 1 µs, total: 7 µs
Wall time: 26 µs
Out[2]: 100000
Python 3.5.2
In [1]: unicode_string = 'ñ' * 100000
In [2]: %time len(unicode_string)
CPU times: user 7 µs, sys: 1 µs, total: 8 µs
Wall time: 13.6 µs
Out[2]: 100000
So, Python took 0.000013 seconds while the latest Julia took 0.000516 seconds.
Sometimes I’m using length(::String)
to decide what to do with a parsed line from eachline(...)
.
In the particular case of testing for an empty string, isempty(::String)
is faster than length(::String) == 0
:
julia> quantile([ @elapsed length(unicode_string) == 0 for i in 1:100 ], [0.25, 0.5, 0.75])
3-element Array{Float64,1}:
0.000515878
0.00051588
0.0005159
julia> quantile([ @elapsed isempty(unicode_string) for i in 1:100 ], [0.25, 0.5, 0.75])
3-element Array{Float64,1}:
4.1e-8
4.1e-8
4.2e-8
But cases like length(::String) <= 3
are trickier.
Cheers,