Makie - Multi Sequence Alignment Scatterplot

Im trying to visualize a multi sequence alignment with makies scatterplot.
The first idea was to overlap a heatmap with a scatterplot but I’ve found that overlapping markers with either a :rect symbol or a unicode full block like this █ set to specific colors and an alpha value of 0.5 kind of works.
The problem appears with different sizes of matrices.
I think the code and images explain best what I want and what doesn’t work quite yet.

using CairoMakie
msa_matrix = [
	'-' 'G' 'G' '-' 'C' 'T' 'T' 'G' 'C' 'T' 'T' 'A' 'T' 'T' 'G' 'T' '-' '-' 'G' 'T'
	'-' 'G' 'G' 'C' 'T' 'T' 'G' 'C' 'T' 'T' 'A' 'T' 'T' 'G' 'T' 'G' 'T' 'G' 'G' 'T'
	'C' 'G' 'G' '-' 'T' 'T' 'G' 'C' 'T' 'T' 'A' 'T' 'T' 'G' 'T' 'G' '-' '-' '-' 'T'
	'C' 'G' 'G' '-' 'T' 'T' 'G' 'C' 'T' 'T' 'A' 'T' 'T' 'G' 'T' 'G' '-' '-' 'T' '-'
]
index_msa = [(i, j) for i in 1:size(msa_matrix')[1] for j in 1:size(msa_matrix')[2]]
begin
	color_dict = Dict(
		'A' => :yellow,
		'C' => :blue,
		'G' => :green,
		'T' => :red,
		'-' => :magenta,
	)
	
	f = Figure(resolution = (1200, 1200))
	ax = Axis(
	f[1, 1],
		xgridvisible = false,
		ygridvisible = false,
		xautolimitmargin = (0.06, 0.06),
		yautolimitmargin = (0.385f0, 0.385f0),
		xticks = 1:size(msa_matrix)[2],
		xticklabelsize = 7.0f0,
		yticks = (1:size(msa_matrix)[1], ["a", "b", "c", "d"]),
		yticksvisible = false,
		yreversed = true,
	)
	hidespines!(ax)
	# ax.aspect = AxisAspect(7)
	# colsize!(f.layout, 1, Auto(1))
	rowsize!(f.layout, 1, Aspect(1, 0.09))
	# colsize!(f.layout, 1, Aspect(1, 0.5))
	# ax.aspect = DataAspect()
	ax.autolimitaspect = 1.5
	# tightlimits!(ax)
	# tight_yticklabel_spacing!(ax)
	resize_to_layout!(f)
	markersize = 17
	for (pos, marker) in zip(index_msa, [msa_matrix...])
		scatter!(pos, marker = marker, color=:black, markersize = markersize)
		scatter!(pos, marker = '█', color=(color_dict[marker],0.5), markersize = markersize)
	end
	# save("../figures/test.pdf", f)
	f
end

Which produces following output:


I could probably include the yticklabels in the first column of the msa_matrix to get them to stick to the beginning of the sequences but it seems kinda hacky.
Also, when I use matrices of bigger sizes the marker boxes start to overlap and it all in all it does not seem clean with all these hard coded small adjustments.
(Seems like I can only provide one media example as im a new user)
Any help or other ideas would be greatly appreciated :slightly_smiling_face:

1 Like

This is the squished output as the matrix becomes bigger:


I want the sequences to be aligned with the letters equidistant to eachother and for each letter its own background color (like in the provided images).
The distances between the letters should not change with different matrix sizes.

Edit: Also, I’ve kept the outcommented lines to show the options I’ve experimented with so far.

Because you want your figure’s size to depend on the amount of data being shown, and also no “squishing” of the cells, that means your nucleotide cells should have a fixed size. That’s why it’s best to start from there, pick a size, and calculate the axis size that’s needed to contain that data in unsquished form. The function resize_to_layout! can correctly pick up this size to calculate the figure size so there’s no whitespace. Note that autolimitaspect and aspect have a different behavior, check this page Aspect ratio and size control tutorial for reference.

msa_matrix = permutedims([
    '-' 'G' 'G' '-' 'C' 'T' 'T' 'G' 'C' 'T' 'T' 'A' 'T' 'T' 'G' 'T' '-' '-' 'G' 'T'
    '-' 'G' 'G' 'C' 'T' 'T' 'G' 'C' 'T' 'T' 'A' 'T' 'T' 'G' 'T' 'G' 'T' 'G' 'G' 'T'
    'C' 'G' 'G' '-' 'T' 'T' 'G' 'C' 'T' 'T' 'A' 'T' 'T' 'G' 'T' 'G' '-' '-' '-' 'T'
    'C' 'G' 'G' '-' 'T' 'T' 'G' 'C' 'T' 'T' 'A' 'T' 'T' 'G' 'T' 'G' '-' '-' 'T' '-'
])

levels = Dict(
    'A' => 1,
    'C' => 2,
    'G' => 3,
    'T' => 4,
    '-' => 5,
)

colormap = tuple.([:yellow, :blue, :green, :red, :magenta], 0.5)

cellsize = (20, 20)

f = Figure(fontsize = 12, backgroundcolor = :gray95)

axis_size = size(msa_matrix) .* cellsize

ax = Axis(
    f[1, 1],
    xgridvisible = false,
    ygridvisible = false,
    xticks = 1:size(msa_matrix, 1),
    yticks = (1:size(msa_matrix, 2), ["a", "b", "c", "d"]),
    yticksvisible = false,
    yreversed = true,
    width = axis_size[1],
    height = axis_size[2],
)

hidespines!(ax)

heatmap!(ax, [levels[m] for m in msa_matrix], colormap = colormap)
text!(ax, vec(string.(msa_matrix)),
    position = vec(Point2f.(Tuple.(CartesianIndices(msa_matrix)))),
    align = (:center, :center),
    textsize = 12,
)

resize_to_layout!(f)

f

And if I increase the size of the dataset:

The apparent text size difference is just due to scaling on this website.

5 Likes

This is amazing! Thank you :pray: :man_bowing: .

1 Like