Is there an idiomatic way to iterate over all binary strings of a given length?

I would like to iterate over all binary strings of length n, for some smallish integer n. Is there an idiomatic Julia way to do that?

1 Like
for i = 0:2^(n+1)-1
       s = bitstring(i)
       println(s[length(s)-n:end])
end

There might be a nicer way to get the last n characters of a string, but this was quick and dirty :slight_smile:

4 Likes

You don’t need to interpolate here. Just do

println(s[end-n:end]) 
3 Likes

Don’t you need to start from 0?

function bb(n)
    for i = 0:2^n-1
       s = bitstring(i)
       println(s[end-n+1:end])
    end
end

and then

julia> bb(3)
000
001
010
011
100
101
110
111
2 Likes

last(somestring, n)

4 Likes

Thank you for this. I wonder if there is a lighter weight solution if you need to iterate over a large range (say n = 30), quickly.

You may try to implement iterator which goes through Gray code, I think it’s as fast and lightweight as it can be. There is no dedicated package, but here is snippet from rosetta code: RosettaCodeData/gray-code.julia at master · acmeism/RosettaCodeData · GitHub

1 Like

That would iterate over integers not strings, right? Or were you suggesting I write a new version which iterates over strings?

Sorry, my comment can be misleading. Regarding your question, it actually depends on what type of object you are working with. If it’s String than I do not think that it is possible to do anything faster than solutions shown above (increase some integer with the following conversion to String). If on the other hand it’s possible to use BitArray or Vector{UInt} than you can write your own implementation of the Gray Code and mutate corresponding vector inplace.

Why do you need to use strings? What are you trying to do?

It’s never going to be super fast if you use String, because you need to allocate a new string on each iteration.

1 Like

I guess if I update the strings in Gray code order then I only need to change one entry at a time which could be very fast.

I need strings as I am computing various distances between them. For example the Levenshtein distance.

String is immutable in Julia, so it can’t be updated in-place. Of course, you could define your own string type, but a Vector{Bool} would be a much more natural choice here than a string.

It would be easy (and probably more efficient) to implement these distance metrics using Vector{Bool}.

As a side note, you still may use package like StringDistances with your own type of String, due to the multiple dispatch, for example

using StringDistances

struct MyString <: AbstractString
    v::Vector{Bool}
end

Base.:ncodeunits(s::MyString) = length(s.v)
Base.:isvalid(s::MyString, i::Int) = true
Base.:iterate(s::MyString, i::Int) = iterate(s.v, i)

s1 = MyString([true, true, false])
s2 = MyString([false, false, false])
evaluate(Levenshtein(), s1, s2) # 2
3 Likes

Is this right? I’m not familiar with this functionality, but according to the docs:

If isvalid(s, i) is true then s[i] will return the character whose encoding starts at that index.

1 Like

It depends on what we should consider character in this case. If character is Bool, than this definition is correct. If character is group of 8 Bool values, than things will be more complicated of course. I suppose then instead of Vector{Bool} one should use Vector{UInt8}.

I don’t get it. Why isn’t it

Base.isvalid(s::MyString, i::Int) = 1 <= i <= length(s.v)

for example?

Ah, yes, that’s better of course.
My definition works, because length check basically happens inside iterate (and this function is mainly used in StringDistances algorithms), but in other cases it will fail.