Extract start and end positions from a PairwiseAlignment

Hello, I am dealing with PairwiseAlignments of the form:

PairwiseAlignment{LongDNASeq, LongDNASeq}:
seq: 2 TGTCTTTCGCTGCTGAGGGTAGA 24
| | | | | | | | | | | | | | | | | | | | | |
ref: 307 TGTCTTTCGCTGCTGAGGGT-GA 328

I just want to extract the start and end coordinates for the sequence (2, 24) and reference (307, 328), but this is surprising challenging. As far as I can tell, the current API for BioAlignments.jl does not cover the case of extracting locations. Although if I have one pair, then I can trivially extract the other pair using the seq2ref and ref2seq functions.

Does anybody know how I can extract the at least one of the seq and ref pairs, like (2, 24) and (307, 328) in the above example, from a PairwiseAlignment? Thanks in advance!

1 Like

OK. I wrote the following one-line functions to solve my own problem. Hopefully helpful for others as well:

function GetPairwiseAlignmentSeqStart(aln::PairwiseAlignment)
return aln.a.aln.anchors[1].seqpos + 1
end

function GetPairwiseAlignmentRefStart(aln::PairwiseAlignment)
return aln.a.aln.anchors[1].refpos + 1
end

function GetPairwiseAlignmentSeqEnd(aln::PairwiseAlignment)
return aln.a.aln.anchors[end].seqpos
end

function GetPairwiseAlignmentRefEnd(aln::PairwiseAlignment)
return aln.a.aln.anchors[end].refpos
end

I don’t know that package, or the field, but it looks like you are reaching deep into the internals of the type. That is normally not recommended practice in Julia.

If the internal fields (several layers deep) are not part of the API, you risk that your function suddenly stops working (or gives wrong answers) after a package update.

1 Like

I recall having to do something similar in BioStructures: https://github.com/BioJulia/BioStructures.jl/blob/master/src/spatial.jl#L155-L157. It would be nice if there was a better way.

I agree. To my knowledge-- and as you can tell, I dug into the internals of this package-- the current API does not allow the client to get this information at all. If I were more adept, I would submit a pull request to update the API to have some getters like this. I feel that I would probably create more work for the maintainers if I tried to make a “contribution”.

In the meantime, this solution works for me, and it should work for others too. I hope it can help more adept contributors to BioJulia to better understand clients’ use cases as they improve the API over time :pray:t6:

I think you could open an issue, explaining the use case. That could be helpful, and doesn’t require much expertise to do.

1 Like

I’ll do that, thanks.