I’m looking for a more elegant, maybe faster way to create dictionary of position => difference between two string sequences. This is what I’m currently using.
function mutationpositions(wildtype, variant)
dict = Dict{Int,Char}();
for i ∈ 1:length(wildtype)
if wildtype[i] != variant[i]
push!(dict, i => variant[i])
end
end
return dict
end
I thought maybe I could use a list comprehension with zip kind of like this:
count(((a,b),) -> a != b, zip(sequence₁,sequence₂))
but can’t seem to figure out how to capture the position.
I don't know if this is faster and it seems that `wildtype` and `variant` have to be of same length.
So perhaps you like this, but it doesn't create a Dict:
It’s generally not recommended to do genome stuff with strings. It’s an inefficient representation since you only need two bits per DNA nucleotide and strings have to handle the complexity of potentially holding Unicode data, which cannot happen with DNA. Consider using the BioJulia packages designed for genetic data:
Thanks for the info. I’ve been planning on checking out the biojulia package. I have a code base already written that I’ll have to refactor. And it was as much about learning code alternatives. I did not even think to wrap enumerate around zip - that’s the idea I was looking for.
Thanks again - this discourse community is the best I’ve worked with. Very helpful.