Iterating over String changes its Structure to a an array Characters

I have a list of strings that I’m attempting to compare to a unique other string. I have a function that compares each string character by character, counting differences at each location.

function compareStrings(sequence₁, sequence₂)
    count = 0;
    for i in 1:length(sequence₁)
        if sequence₁[i] != sequence₂[i]
            count+=1;
        end
    end
    return count
end

I then pass the strings like:

test_array = [];

for i in unique_string_list
    if compareStrings(WT,i) == 2
        append!(test_array,i)
    end
end

I’m trying to build up an array of strings that have 2 differences. But when I attempt to append to the array, I just get a list of all the characters in the string.

It seems like the iteration through the string is changing the string into a list of characters and I have been unable to get the string back to a state that allows me to append it to the array.

Any thoughts on this behavior?

The append! function appends one collection to another one — you want the push! function. Note that this deviates from Python, but Python is the outlier here: Julia uses the standard meanings of these functions in most programming languages (Lisp, Perl, Ruby, etc.).

4 Likes

Thank you. Easy as pie!

This code will only work for ASCII strings, since string indices are not consecutive in general—it will throw an error for strings containing other Unicode characters. (You need to use nextind to increment the index.) The easiest solution is probably to use zip:

compare_strings(s1, s2) = count(((a,b),) -> a != b, zip(s1, s2))

Note that, similar your original function, if s1 and s2 have different lengths this function will only look at the characters they have in common. i.e. it only looks at min(length(s1), length(s2)) characters. You could add abs(length(s1) - length(s2)) to the result if you want to count differences in length.

3 Likes

Did not even think about zip. Thanks for the suggestion.

I’m curious about the syntax ((a,b),) - I can’t find in the documentation what that extra common does to the tupel (a,b).

JB

The syntax (x,) denotes a tuple with the single element x, so ((a,b),) is a 1-tuple having (a,b) as its single element.
In the context of an anonymous function expression, the arguments are given in a tuple, so (a,b)->a!=b would denote a function with two arguments (named a and b). ((a,b),)->a!=b is a 1-argument function where the first argument is destructured as (a,b) (https://docs.julialang.org/en/v1/manual/functions/#Argument-destructuring).
Incidentally, you can also use splat to get an equivalent 1-argument function:

count(Base.splat(!=), zip(s1, s2))
2 Likes