I would like to iterate over all binary strings of length n, for some smallish integer n. Is there an idiomatic Julia way to do that?
for i = 0:2^(n+1)-1
s = bitstring(i)
println(s[length(s)-n:end])
end
There might be a nicer way to get the last n
characters of a string, but this was quick and dirty
You don’t need to interpolate here. Just do
println(s[end-n:end])
Don’t you need to start from 0?
function bb(n)
for i = 0:2^n-1
s = bitstring(i)
println(s[end-n+1:end])
end
end
and then
julia> bb(3)
000
001
010
011
100
101
110
111
last(somestring, n)
Thank you for this. I wonder if there is a lighter weight solution if you need to iterate over a large range (say n = 30), quickly.
You may try to implement iterator which goes through Gray code, I think it’s as fast and lightweight as it can be. There is no dedicated package, but here is snippet from rosetta code: RosettaCodeData/gray-code.julia at master · acmeism/RosettaCodeData · GitHub
That would iterate over integers not strings, right? Or were you suggesting I write a new version which iterates over strings?
Sorry, my comment can be misleading. Regarding your question, it actually depends on what type of object you are working with. If it’s String
than I do not think that it is possible to do anything faster than solutions shown above (increase some integer with the following conversion to String). If on the other hand it’s possible to use BitArray
or Vector{UInt}
than you can write your own implementation of the Gray Code and mutate corresponding vector inplace.
Why do you need to use strings? What are you trying to do?
It’s never going to be super fast if you use String
, because you need to allocate a new string on each iteration.
I guess if I update the strings in Gray code order then I only need to change one entry at a time which could be very fast.
I need strings as I am computing various distances between them. For example the Levenshtein distance.
String
is immutable in Julia, so it can’t be updated in-place. Of course, you could define your own string type, but a Vector{Bool}
would be a much more natural choice here than a string.
It would be easy (and probably more efficient) to implement these distance metrics using Vector{Bool}
.
As a side note, you still may use package like StringDistances
with your own type of String, due to the multiple dispatch, for example
using StringDistances
struct MyString <: AbstractString
v::Vector{Bool}
end
Base.:ncodeunits(s::MyString) = length(s.v)
Base.:isvalid(s::MyString, i::Int) = true
Base.:iterate(s::MyString, i::Int) = iterate(s.v, i)
s1 = MyString([true, true, false])
s2 = MyString([false, false, false])
evaluate(Levenshtein(), s1, s2) # 2
Is this right? I’m not familiar with this functionality, but according to the docs:
If
isvalid(s, i)
is true thens[i]
will return the character whose encoding starts at that index.
It depends on what we should consider character in this case. If character is Bool
, than this definition is correct. If character is group of 8 Bool
values, than things will be more complicated of course. I suppose then instead of Vector{Bool}
one should use Vector{UInt8}
.
I don’t get it. Why isn’t it
Base.isvalid(s::MyString, i::Int) = 1 <= i <= length(s.v)
for example?
Ah, yes, that’s better of course.
My definition works, because length
check basically happens inside iterate
(and this function is mainly used in StringDistances
algorithms), but in other cases it will fail.