Edit: an equivalent blog post is available here, as suggested by colleagues:
I was about to send an e-mail to my students with a series of tips to produce good looking plots with Julia, and decided to post the tips here instead. I hope this is useful for more people, and please let me know of any other tips, nice examples, and possible corrections.
To exemplify, I will describe how to produce this figure, which was published recently [LINK], and contains many details in its construction which are worth mentioning:
To start with, I am using Plots
with GR
(the default option), with
using Plots
I will also use the following packages:
using LaTeXStrings
using Statistics
using ColorSchemes
And I will use one function from a in-house package we have to build one density function from data (probably other options exist):
using M3GTools # from https://github.com/mcubeg/M3GTools
Initially, the layout of the plot is set using
plot(layout=(2,2))
meaning two rows and two columns. I start defining a variable, called sp
(for subplot
), which will define in which subplot the following commands will operate:
sp=1
Subplot 1 contains data for a series of labels (1G6X to 1BXO) which are colored sequentially. This was done as follows. The list of labels is defined with
names = [ "1AMM", "1ARB", "1ATG", "1B0B", "1BXO", "1C52", "1C75", "1D06", "1D4T", "1EW4", "1FK5", "1G67", "1G6X", "1G8A", "1GCI" ]
To plot the data associated with each label with a different color, I used:
for i in 1:length(names)
c = get(ColorSchemes.rainbow,i./length(names))
plot!(subplot=sp,x,y[i,:],linewidth=2,label=names[i],color=c)
end
(I am assuming that in x
the data is the same for all plots, and is stored in vector x[ndata]
, and the plotted data in y
is in an array y
of size y[length(names),ndata]
.
One of the limitations of GR
as plotting back-end is the managing of special characters. To define the labels of the axes, therefore, we use LaTeXStrings
and, furthermore, we change the font of the text such that it is not that different from the standard font of the tick labels and legend:
plot!(xlabel=L"\textrm{\sffamily Contact Distance Threshold / \AA}",subplot=sp)
plot!(ylabel=L"\textrm{\sffamily Probability of~}n\leq n_{XL\cap DCA}",subplot=sp)
The interesting features of the second plot are the overlapping bars, and the variable labels in the x
axis and their angle.
The labels in the x
-axis are defined in a vector (here, amino acid residue types):
restypes = [ "ALA", "ARG", "ASN", "ASP", "CYS", "GLU", "GLN", "GLY", "HIS", "ILE", "LEU", "LYS", "MET", "PHE", "PRO", "SER", "THR", "TRP", "TYR", "VAL" ]
Start with
sp=2
to change where the next commands will operate.
The plot contains two sets of data (red and blue), which we plot using bar!
. First the red data, labeled DCAs. We use alpha=0.5
so that the red color becomes more soft:
bar!(dca_data,alpha=0.5,label="DCAs",color="red",subplot=sp)
The second set of data, “XLs”, will be blue and will overlap the red data. We also used this call to bar!
to define the xticks
with custom labels, and the rotation of the labels:
bar!(xl_data,alpha=0.5,xrotation=60,label="XLs",xticks=(1:1:20,restypes),color="blue",subplot=sp)
Finally, we set the labels of the axes, also using Latex and changing fonts:
bar!(xlabel=L"\textrm{\sffamily Residue Type}",ylabel=L"\textrm{\sffamily Count}",subplot=sp)
The peculiarity of the third plot (sp=3
) (bottom left) is that we have two data sets defined in different ranges, but we want to plot bars with the same width for both sets. This requires a “trick”.
Initially, we tested some different number of bins for one of the sets until we liked the result. We found that for the blue set 40 bins were nice:
histogram!(xl_data,bins=40,label="XLs",alpha=1.0,color="blue",subplot=sp)
Now we need to adjust the number of bins of the other set such that both have the same width. We find out the bin width by computing the range of the “XL” (blue) set above, and dividing it by 40:
xl_bin = ( maximum(xl_data) - minimum(xl_data) ) / 40
The number of bins of the other (DCA - red) set, will be, therefore, computed from the maximum and minimum values of this set and the bin width:
ndcabins = round(Int64,( maximum(all_dca) - minimum(all_dca) ) / xl_bin)
And this number of bins is used to plot the bars of the red set:
histogram!(dca_data,bins=ndcabins,label="DCAs",alpha=0.5,color="red",subplot=sp)
In this plot we also plot some dots indicating the mean of each distribution, something that we did with:
m1 = mean(dca_data)
scatter!([m1,m1,m1,m1],[100,104,108,112],label="",color="red",linewidth=3,linestyle=:dot,subplot=sp,markersize=3)
(the y-positions of the dots were set by hand). And, of course, we use Latex to set the axis labels again:
histogram!(xlabel=L"\textrm{\sffamily C}\alpha\textrm{\sffamily~Euclidean Distance} / \textrm{\sffamily~\AA}",subplot=sp)
histogram!(ylabel=L"\textrm{\sffamily Count}",subplot=sp)
The fourth plot (sp=4
, bottom right) is similar to the third, but it contains a density function (instead of the bars) for one of the data sets (“All contacts” - green). This density function was computed using our own function, using:
x, y = M3GTools.density(all_max_contact_surfdist,step=1.0,vmin=1.0)
and plotted with:
plot!(x,y,subplot=sp,label="All contacts",linewidth=2,color="green",alpha=0.8)
We also added the figure labels A, B, C, D. This was done with the annotate
option. The trick here is to add these annotations to the last plot, such that they stay above every other plot element:
fontsize=48
annotate!( -1.8-16.5, 500, text("A", :left, fontsize), subplot=4)
annotate!( -1.8, 500, text("B", :left, fontsize), subplot=4)
annotate!( -1.8-16.5, 200, text("C", :left, fontsize), subplot=4)
annotate!( -1.8, 200, text("D", :left, fontsize), subplot=4)
(the positions were set by hand, but they are quite easy to align because we need only two positions in x
and two positions in y
).
Last but not least, we save the figure in PDF format (saving it to PNG directly does not provide the same result, at least in my experience):
plot!(size=(750,750))
savefig("./all.pdf")
PDF is a vector graphic format, so that the size does not define the resolution. The size=(750,750)
is used to define the overall size of the plot in what concerns the relative font sizes. Thus, this size is adjusted until the font sizes are nice taking into account the final desired plot size in print.
If required (and I do that), I open this final plot in GIMP, converting it to a bitmap with 300dpi resolution, and save it to TIFF or PNG depending on what I want to do with the figure later.