Pairwise Alignment Score function: From Python to Julia. Suggestions

I wanted a function to compute the Pairwise Alignment Score in Julia, but didn’t find any in the BioJulia repository. I submitted an issue, asking for some guidance, but no response till now. So, after waiting a long time, I resorted to porting my Python implementation into Julia. Here’s what I got:

Python Implementation

def score_pairwise(self, seq1, seq2, matrix, gap=True):
        """Pairwise score generator.

        Keyword arguments:
        seq1 -- first alignment
        seq2 -- second alignment
        matrix -- substitution matrix"""
        for A, B in zip(seq1, seq2):
            diag = ('-' == A) or ('-' == B)
            yield (self.gap_extend_score if gap else self.gap_open_score) if diag else matrix[(A, B)]
            gap = diag

Julia port

function score(S1, S2; matrix=BLOSUM90, gop=-1, gep=-1)
	score = 0
	prev_gap = false
	for (A, B) in zip(S1, S2)
		diag = ('-' == A) || ('-' == B)
		score += (diag) ? (prev_gap ? gep : gop) : matrix[A, B]
		prev_gap = diag
	end
	score
end

Disclaimer: I’m pretty new to Julia and I feel this isn’t the best implementation possible and we could improve this a lot. Any suggestions would be welcome :slight_smile:

Problem

The result I’m getting from this method is a little bit off from the one I get from the BioJulia package.


I’ve tried it on a number of sequences, and every time it’s a little bit off, sometimes more than BioJulia score and sometimes less.

Can anyone help me in figuring out the problem? I would be very thankful.

2 Likes

Hey and welcome to the community :wave:
In general, it’s a good idea to include some data-generating code so that people can copy-paste your code into a repl and have it run. That way it’s much easier to help you.

3 Likes

Sure thing.

Imports required

using BioSequences
using BioAlignments

Score using BioJulia

function score_bio(S1, S2; matrix=BLOSUM90, gop=-3, gep=-1, asterisk_penalty=missing)
	scoremodel = AffineGapScoreModel(matrix, gap_open=gop, gap_extend=gep)
	res = pairalign(GlobalAlignment(), S1, S2, scoremodel)
	res
end

score_bio("ACCTAACTAAC", "AACTTTACAATAC")

This returns:

BioAlignments.PairwiseAlignmentResult{Int64,String,String}:
  score: 44
  seq:  1 ACCTA-ACTA-AC 11
          | ||  || | ||
  ref:  1 AACTTTACAATAC 13

The problem with using this method is that if my sequences contain '-' then it won’t run and results in an exception.
I am getting the alignments from another manual method and here, I just want to calculate the score of the generated alignments. Because the generated alignments have '-' in them, I can’t use this method to get the score, even using scoreonly=true parameter doesn’t help.

So I coded my own scoring function, which is:

function pairwise_score(S1, S2; matrix=BLOSUM90, gop=-3, gep=-1)
	score = 0
	prev_gap = false
	for (A, B) in zip(S1, S2)
		diag = ('-' == A) || ('-' == B)
		score += (diag) ? (prev_gap ? gep : gop) : matrix[A, B]
		prev_gap = diag
	end
	score
end

To test its accuracy, I passed it the alignments that were generated from the above function, score_bio:

pairwise_score("ACCTA-ACTA-AC", "AACTTTACAATAC")

This gave the score:

46

while the score_bio function was returning a score of 44 for the same alignment.

I tried different input sequences, but the result is always the same, i.e. my output is off by a couple of numbers.

Hopefully, you’ll be able to reproduce the problem now. :slight_smile:
I’d also appreciate suggestions on how you think the Julia implementation of the above python code could be improved further (made succint, or more Julian-way :roll_eyes:). Thanks :slightly_smiling_face:

2 Likes

I think your code is correct, and that the difference is that BioAlignments adds up the opening and extension gap penalties when you have a new gap (here you have two gaps so you end up with -2 of difference) :

https://github.com/BioJulia/BioAlignments.jl/blob/8f1bdca96013b855665458c1d2ddbd9614d46ba6/src/pairwise/algorithms/needleman_wunsch.jl#L124

I don’t know if that’s a bug in BioAlignments or if that’s an alternative definition. But to me the opening and extension penalties sounds like they should be exclusive. If I’m correct running your score with gop=-4 should give you the same score as BioAlignments.

Edit : not a bug, the docs explain the scoring model here :

https://biojulia.net/BioAlignments.jl/latest/pairalign.html#Alignment-types-and-scoring-models-1

You can either adopt their definition or rescale you gap opening penalty.

3 Likes