Pairwise Alignment Score function: From Python to Julia. Suggestions

aadam · October 10, 2020, 6:07am

I wanted a function to compute the Pairwise Alignment Score in Julia, but didn’t find any in the BioJulia repository. I submitted an issue, asking for some guidance, but no response till now. So, after waiting a long time, I resorted to porting my Python implementation into Julia. Here’s what I got:

Python Implementation

def score_pairwise(self, seq1, seq2, matrix, gap=True):
        """Pairwise score generator.

        Keyword arguments:
        seq1 -- first alignment
        seq2 -- second alignment
        matrix -- substitution matrix"""
        for A, B in zip(seq1, seq2):
            diag = ('-' == A) or ('-' == B)
            yield (self.gap_extend_score if gap else self.gap_open_score) if diag else matrix[(A, B)]
            gap = diag

Julia port

function score(S1, S2; matrix=BLOSUM90, gop=-1, gep=-1)
	score = 0
	prev_gap = false
	for (A, B) in zip(S1, S2)
		diag = ('-' == A) || ('-' == B)
		score += (diag) ? (prev_gap ? gep : gop) : matrix[A, B]
		prev_gap = diag
	end
	score
end

Disclaimer: I’m pretty new to Julia and I feel this isn’t the best implementation possible and we could improve this a lot. Any suggestions would be welcome

Problem

The result I’m getting from this method is a little bit off from the one I get from the BioJulia package.

I’ve tried it on a number of sequences, and every time it’s a little bit off, sometimes more than BioJulia score and sometimes less.

Can anyone help me in figuring out the problem? I would be very thankful.

baggepinnen · October 10, 2020, 7:40am

Hey and welcome to the community
In general, it’s a good idea to include some data-generating code so that people can copy-paste your code into a repl and have it run. That way it’s much easier to help you.

aadam · October 10, 2020, 9:32am

Sure thing.

Imports required

using BioSequences
using BioAlignments

Score using BioJulia

function score_bio(S1, S2; matrix=BLOSUM90, gop=-3, gep=-1, asterisk_penalty=missing)
	scoremodel = AffineGapScoreModel(matrix, gap_open=gop, gap_extend=gep)
	res = pairalign(GlobalAlignment(), S1, S2, scoremodel)
	res
end

score_bio("ACCTAACTAAC", "AACTTTACAATAC")

This returns:

BioAlignments.PairwiseAlignmentResult{Int64,String,String}:
  score: 44
  seq:  1 ACCTA-ACTA-AC 11
          | ||  || | ||
  ref:  1 AACTTTACAATAC 13

The problem with using this method is that if my sequences contain '-' then it won’t run and results in an exception.
I am getting the alignments from another manual method and here, I just want to calculate the score of the generated alignments. Because the generated alignments have '-' in them, I can’t use this method to get the score, even using scoreonly=true parameter doesn’t help.

So I coded my own scoring function, which is:

function pairwise_score(S1, S2; matrix=BLOSUM90, gop=-3, gep=-1)
	score = 0
	prev_gap = false
	for (A, B) in zip(S1, S2)
		diag = ('-' == A) || ('-' == B)
		score += (diag) ? (prev_gap ? gep : gop) : matrix[A, B]
		prev_gap = diag
	end
	score
end

To test its accuracy, I passed it the alignments that were generated from the above function, score_bio:

pairwise_score("ACCTA-ACTA-AC", "AACTTTACAATAC")

This gave the score:

while the score_bio function was returning a score of 44 for the same alignment.

I tried different input sequences, but the result is always the same, i.e. my output is off by a couple of numbers.

Hopefully, you’ll be able to reproduce the problem now.
I’d also appreciate suggestions on how you think the Julia implementation of the above python code could be improved further (made succint, or more Julian-way ). Thanks

jonathanBieler · October 10, 2020, 1:23pm

I think your code is correct, and that the difference is that BioAlignments adds up the opening and extension gap penalties when you have a new gap (here you have two gaps so you end up with -2 of difference) :

https://github.com/BioJulia/BioAlignments.jl/blob/8f1bdca96013b855665458c1d2ddbd9614d46ba6/src/pairwise/algorithms/needleman_wunsch.jl#L124

I don’t know if that’s a bug in BioAlignments or if that’s an alternative definition. But to me the opening and extension penalties sounds like they should be exclusive. If I’m correct running your score with gop=-4 should give you the same score as BioAlignments.

Edit : not a bug, the docs explain the scoring model here :

https://biojulia.net/BioAlignments.jl/latest/pairalign.html#Alignment-types-and-scoring-models-1

You can either adopt their definition or rescale you gap opening penalty.

Topic		Replies	Views
Extract start and end positions from a PairwiseAlignment Biology, Health, and Medicine question , biology	6	688	March 10, 2022
How to retrieve indices from bioalignments? General Usage biology	1	569	September 21, 2020
mrtkp9993/Bioinformatics.jl Community	3	757	July 29, 2018
[ANN] PairwiseMappingFormat.jl Biology, Health, and Medicine	0	99	October 24, 2024
Extract alignments from BAM and associated FASTA reference genome Biology, Health, and Medicine question	2	1279	October 28, 2019