Tips to create beautiful, publication-quality plots

Edit: an equivalent blog post is available here, as suggested by colleagues:

https://nextjournal.com/leandromartinez98/tips-to-create-beautiful-publication-quality-plots-in-julia


I was about to send an e-mail to my students with a series of tips to produce good looking plots with Julia, and decided to post the tips here instead. I hope this is useful for more people, and please let me know of any other tips, nice examples, and possible corrections.

To exemplify, I will describe how to produce this figure, which was published recently [LINK], and contains many details in its construction which are worth mentioning:

To start with, I am using Plots with GR (the default option), with

using Plots

I will also use the following packages:

using LaTeXStrings
using Statistics
using ColorSchemes

And I will use one function from a in-house package we have to build one density function from data (probably other options exist):

using M3GTools # from https://github.com/mcubeg/M3GTools

Initially, the layout of the plot is set using

plot(layout=(2,2))

meaning two rows and two columns. I start defining a variable, called sp (for subplot), which will define in which subplot the following commands will operate:

sp=1

Subplot 1 contains data for a series of labels (1G6X to 1BXO) which are colored sequentially. This was done as follows. The list of labels is defined with

names = [ "1AMM", "1ARB", "1ATG", "1B0B", "1BXO", "1C52", "1C75", "1D06", "1D4T", "1EW4", "1FK5", "1G67", "1G6X", "1G8A", "1GCI" ]  

To plot the data associated with each label with a different color, I used:

  for i in 1:length(names)
    c = get(ColorSchemes.rainbow,i./length(names))
    plot!(subplot=sp,x,y[i,:],linewidth=2,label=names[i],color=c)
  end 

(I am assuming that in x the data is the same for all plots, and is stored in vector x[ndata], and the plotted data in y is in an array y of size y[length(names),ndata].

One of the limitations of GR as plotting back-end is the managing of special characters. To define the labels of the axes, therefore, we use LaTeXStrings and, furthermore, we change the font of the text such that it is not that different from the standard font of the tick labels and legend:

plot!(xlabel=L"\textrm{\sffamily Contact Distance Threshold / \AA}",subplot=sp)
plot!(ylabel=L"\textrm{\sffamily Probability of~}n\leq n_{XL\cap DCA}",subplot=sp)   

The interesting features of the second plot are the overlapping bars, and the variable labels in the x axis and their angle.

The labels in the x-axis are defined in a vector (here, amino acid residue types):

restypes = [ "ALA", "ARG", "ASN", "ASP", "CYS", "GLU", "GLN", "GLY", "HIS", "ILE", "LEU", "LYS", "MET", "PHE", "PRO", "SER", "THR", "TRP", "TYR", "VAL" ] 

Start with

sp=2

to change where the next commands will operate.

The plot contains two sets of data (red and blue), which we plot using bar!. First the red data, labeled DCAs. We use alpha=0.5 so that the red color becomes more soft:

bar!(dca_data,alpha=0.5,label="DCAs",color="red",subplot=sp)

The second set of data, “XLs”, will be blue and will overlap the red data. We also used this call to bar! to define the xticks with custom labels, and the rotation of the labels:

bar!(xl_data,alpha=0.5,xrotation=60,label="XLs",xticks=(1:1:20,restypes),color="blue",subplot=sp)

Finally, we set the labels of the axes, also using Latex and changing fonts:

bar!(xlabel=L"\textrm{\sffamily Residue Type}",ylabel=L"\textrm{\sffamily Count}",subplot=sp)  

The peculiarity of the third plot (sp=3) (bottom left) is that we have two data sets defined in different ranges, but we want to plot bars with the same width for both sets. This requires a “trick”.

Initially, we tested some different number of bins for one of the sets until we liked the result. We found that for the blue set 40 bins were nice:

histogram!(xl_data,bins=40,label="XLs",alpha=1.0,color="blue",subplot=sp)

Now we need to adjust the number of bins of the other set such that both have the same width. We find out the bin width by computing the range of the “XL” (blue) set above, and dividing it by 40:

xl_bin = ( maximum(xl_data) - minimum(xl_data) ) / 40   

The number of bins of the other (DCA - red) set, will be, therefore, computed from the maximum and minimum values of this set and the bin width:

ndcabins = round(Int64,( maximum(all_dca) - minimum(all_dca) ) / xl_bin)

And this number of bins is used to plot the bars of the red set:

histogram!(dca_data,bins=ndcabins,label="DCAs",alpha=0.5,color="red",subplot=sp)

In this plot we also plot some dots indicating the mean of each distribution, something that we did with:

m1 = mean(dca_data)
scatter!([m1,m1,m1,m1],[100,104,108,112],label="",color="red",linewidth=3,linestyle=:dot,subplot=sp,markersize=3) 

(the y-positions of the dots were set by hand). And, of course, we use Latex to set the axis labels again:

histogram!(xlabel=L"\textrm{\sffamily C}\alpha\textrm{\sffamily~Euclidean Distance} / \textrm{\sffamily~\AA}",subplot=sp)
histogram!(ylabel=L"\textrm{\sffamily Count}",subplot=sp)  

The fourth plot (sp=4, bottom right) is similar to the third, but it contains a density function (instead of the bars) for one of the data sets (“All contacts” - green). This density function was computed using our own function, using:

x, y = M3GTools.density(all_max_contact_surfdist,step=1.0,vmin=1.0)

and plotted with:

plot!(x,y,subplot=sp,label="All contacts",linewidth=2,color="green",alpha=0.8)

We also added the figure labels A, B, C, D. This was done with the annotate option. The trick here is to add these annotations to the last plot, such that they stay above every other plot element:

fontsize=48
annotate!( -1.8-16.5,  500, text("A", :left, fontsize), subplot=4)
annotate!( -1.8,       500, text("B", :left, fontsize), subplot=4)
annotate!( -1.8-16.5,  200, text("C", :left, fontsize), subplot=4)
annotate!( -1.8,       200, text("D", :left, fontsize), subplot=4)  

(the positions were set by hand, but they are quite easy to align because we need only two positions in x and two positions in y).

Last but not least, we save the figure in PDF format (saving it to PNG directly does not provide the same result, at least in my experience):

plot!(size=(750,750))
savefig("./all.pdf")  

PDF is a vector graphic format, so that the size does not define the resolution. The size=(750,750) is used to define the overall size of the plot in what concerns the relative font sizes. Thus, this size is adjusted until the font sizes are nice taking into account the final desired plot size in print.

If required (and I do that), I open this final plot in GIMP, converting it to a bitmap with 300dpi resolution, and save it to TIFF or PNG depending on what I want to do with the figure later.

38 Likes

This is great work and will help lots of people creating nice plots.

One thing I have to complain is about colors. Plot A will be problematic for color blind (red-green) people. See e.g. Picking a colour scale for scientific graphics | Better Figures

I think you already know about this and it is more about technical aspects on how to generate plots in general. But I thought I say it anyways for others who are not yet aware of this.

4 Likes

Thanks for sharing.
Looks very nice.
Are the other figures (1 and 3) of the paper also generated with Julia?

A second example [LINK]. This example is interesting because we have added non-linear fits to scatter plots, and there are some tricks to get the same colors for specific sets of data in different plots and annotations.

The example figure is this one:

Here, we use the following packages:

using Plots
using DelimitedFiles
using LsqFit
using LaTeXStrings

We used the DelimitedFiles package to read the data, with

file = "./data/data.dat"
data = readdlm(file,comments=true,comment_char='#') 
time = data[:,1] # time in the first column
hbonds = data[:,3] # data in the third column

The layout is the same as that of the first example plot(layout=(2,2)), and I will focus in the new features used only. Subplots 1 and 2 (upper ones), are bar plots which contain error bars:

labels=["BCL as acceptor","BCL as donor"]
bar!(labels,y,label="",subplot=sp,color=[1,2],yerr=yerr,ylim=[0,ymax])
plot!(ylabel=L"\textrm{\sffamily Urea H-bonds}",subplot=sp)
plot!(title="Deprotonated",subplot=sp)

Note that “ymax” was adjusted, so that in this case it is the same in both plots, for comparison.
The error bars are added with yerr, and the labels of the x-axis were defined with the labels vector, defined before the plot.

We will perform exponential fits to some of our data to produce the plots “C” and “D”. We define the model here (it will be used by the LsqFit package):

# Exponential fit model
@. model(x,p) = exp(-x/p[1])
p0 = [ 0.5 ] # initial guess

For each data set, the fit is performed with

We will perform exponential fits to some of our data to produce the plots “C” and “D”. We define the model here (it will be used by the LsqFit package):

fit = curve_fit(model,times,lifetime,p0)

(times and lifetime are the vector containing the actual x and y data).

And the final characteristic time is, in this case, the first element of the array that is retrieved by the coef function of LsqFit, given the fit result:

tau = coef(fit)[1]  

Using the parameter from the fit, we can generate data to plot a line corresponding to the model. The trick here is to the use the collect function to generate a x vector, and then the model already defined to obtain the y data given the parameters:

x = collect(0:0.01:10)
y = model(x,[tau])

The fit will be plotted as a line, accompanied by the scatter of the actual data:

idata=1
plot!(x,y,linewidth=2,subplot=sp,label="BCL as acceptor",color=idata)
scatter!(times,lifetime,label="",color=idata,subplot=sp)

Note the color definition idata=1. This will guarantee that the the two data sets are ploted with the same color. Now we want to write an annotation with that same color. This is tricky, and is done with:

color=get_color_palette(:auto, plot_color(:white), 5)[idata]

(I don’t even understand the details of this command, but it works). It will retrieve the color in the current colorscale associated with the index idata. With this it is possible to write annotations with the desired colors, but again some tricks are required. We need to parse the string using raw and latextrings, to use the text option of the annotate function and change the color of the text:

note=raw"\large\textrm{\sffamily "*"$tau_avg"*raw"} \pm \textrm{\sffamily "*"$tau_std"*raw"}"
annotate!( 0.0,  0.04, text(latexstring(note), :left, 7, color=color),subplot=sp) 

(the complication here with the raw function is only because we want to use the Latex fonts and the \pm symbol in those annotations).

9 Likes

Very nice remark. I am not completely aware of that.

1 Like

No, those figures were done by students and, if I am not mistaken, the first plot was done using gnuplot and the third in R.

Nevertheless, beautiful contact maps can be done with the heatmap function in Julia.

nice! this could have a lot of reach in the form of a Julia blog post :smile:

2 Likes

Is that a suggestion about which I can do anything?

For what i know, there always a need to blog posts about julia, as a form of outreach, Discurse has the disadvantage of having a hard discoverability, but a blog doesn’t have that inconvenient. Also, there is a page, Julia Bloggers, that help reach a larger portion of people that doesn’t have necessarily an account on Discourse.
For example, the rust community directy calls for community blog posts, julia as a language can benefit a lot from the outreach generated from things like this :smiley:

5 Likes

I second the suggestion to make this a blog. Discourse will likely have comments interspersed with your helpful plotting tips, so it may be best to make it a full blogpost elsewhere.

2 Likes

Nextjournal would be a good place for a blog like this. You could show examples.

6 Likes

Nice. I created an equivalent blog post here:

Yet without actual running examples, but perhaps another day. I will sent it to Julia Bloggers.

14 Likes

Nice examples!

Should using Statstics be using Statistics?

1 Like

fixed

1 Like

Do you by chance know how to make serif axes ticks so that it displays correctly in scientific notation. This was my biggest problem with GR. And just changing fonts in general.

You can use Latex for the ticks with, for example:

x=rand(10) ; y=rand(10)
plot(x,y)
xlabels=[L"10^{-3}",L"10^{-2}",L"10^{-1}"]
plot!(xticks=(0:0.5:1,xlabels))

You might change the font to serif using:

xlabels=[L"\textrm{\sffamily 10}^{\textrm{\sffamily -3}}",L"10^{-2}",L"10^{-1}"]

(here I changed the first one). Or, if you want the exponent to be smaller, using

 xlabels=[L"\textrm{\sffamily 10}^{\textrm{\sffamily\tiny -3}}",L"10^{-2}",L"10^{-1}"]

Of course this is quite unpractical and probably a function should be written for converting
the tick labels into that format. The result is:

(again, I changed to serif only the first tick label).

(also, in my experience always save the figure to PDF an convert it other formats using GIMP or other software)

2 Likes

Adding to the previous answer, I wrote the function (because I will use it myself, for sure). This is a working example:


using LaTeXStrings
using Formatting

# First parameter: number, second parameter: number of decimal places
# optional font parameter: if anything else than "sf", will be default latex font

function latex_sci_not( x , ndec; font="sf" )
  xchar = strip(Formatting.sprintf1("%17.$(ndec)e",x))
  data = split(xchar,"e")
  inonzero = findfirst( i -> i != '0', data[2][2:length(data[2])])
  if font == "sf"
    f = "\\textrm{\\sffamily "
    fe = "\\textrm{\\sffamily\\scriptsize "
  else
    f = "{"
    fe = "{"
  end
  if inonzero == nothing
    string = latexstring("$f$(data[1])}")
  else
    if data[2][1] == '-'
      string = latexstring("$f$(data[1])}\\times $f 10}^{$fe$(data[2][1])$(data[2][inonzero+1:length(data[2])])}}")
    else
      string = latexstring("$f$(data[1])}\\times $f 10}^{$fe$(data[2][inonzero+1:length(data[2])])}}")
    end
  end
  return string
end

x = rand(10) ; y = rand(10) ;

ticks = collect(0:0.2:1)
ticklabels = [ latex_sci_not(x,2) for x in ticks ]
plot(x,y,xticks=(ticks,ticklabels))
plot!(size=(300,300))
savefig("teste.pdf")

Which produces:

1 Like

Thank you for sharing this. I just wish there was a simpler way to achieve this. In my job, units usually require sci. notation, log scales pretty often. Here, for example, you have a problem with 0.0 on X axis. The spacing of “-” in the exponent of 10s weird. Typically, “x10^{-1}” is typed off the axes, to mean that everything on this axis is to be multiplied by “x10^{-1}”. I wish Plots.jl would support common scientific needs.

Right now, it feels like both Plots.jl and Makie.jl are made by developers just to showcase their products. I feel that common needs of a scientist is to have very fast yet beatiful plotting system with minimal config from users side. Things like like exponents off the axis (and axes offsets) should be naturally taken care of by the plotting environment.

Currently, PyPlot achieves offsetting exponents and serves well. It is, yet the best plotting system Julia has. GR is immature to plot consistently. Gnuplot suffers from offsets and exponents too. I just wish someone to meet these common needs. Here is what I mean in an example:

using PyPlot
using Plots
using Gaston
gr()
x = collect(@. (10^-5 * (1:10)) + 1)
y = collect(@. (10^-13 * (1:10)) + 10);
PyPlot.plot(x, y)

image

Plots.plot(x,y)

Just exits with error

┌ Warning: No strict ticks found
â”” @ PlotUtils ~/.julia/packages/PlotUtils/EybJR/src/ticks.jl:168
┌ Warning: No strict ticks found
â”” @ PlotUtils ~/.julia/packages/PlotUtils/EybJR/src/ticks.jl:168
┌ Warning: No strict ticks found
â”” @ PlotUtils ~/.julia/packages/PlotUtils/EybJR/src/ticks.jl:168
┌ Warning: No strict ticks found
â”” @ PlotUtils ~/.julia/packages/PlotUtils/EybJR/src/ticks.jl:168
GKS: Rectangle definition is invalid in routine SET_WINDOW
GKS: Rectangle definition is invalid in routine SET_WINDOW
┌ Warning: No strict ticks found
â”” @ PlotUtils ~/.julia/packages/PlotUtils/EybJR/src/ticks.jl:168
┌ Warning: No strict ticks found
â”” @ PlotUtils ~/.julia/packages/PlotUtils/EybJR/src/ticks.jl:168
┌ Warning: No strict ticks found
â”” @ PlotUtils ~/.julia/packages/PlotUtils/EybJR/src/ticks.jl:168
┌ Warning: No strict ticks found
â”” @ PlotUtils ~/.julia/packages/PlotUtils/EybJR/src/ticks.jl:168
GKS: Rectangle definition is invalid in routine SET_WINDOW
GKS: Rectangle definition is invalid in routine SET_WINDOW

Gnuplot

Gaston.plot(x,y)

image

Plots pyplot:

pyplot()
Plots.plot(x, y)

image

┌ Warning: No strict ticks found
â”” @ PlotUtils /home/zhanibek/.julia/packages/PlotUtils/EybJR/src/ticks.jl:168
┌ Warning: No strict ticks found
â”” @ PlotUtils /home/zhanibek/.julia/packages/PlotUtils/EybJR/src/ticks.jl:168

Errors for some mysterious reasons.

Take this is with a grain of salt, but this is a typical scenario for a scientist, and s/he will be very frustrated to see that a common user case just straight up fails miserably. It could be my buggy system and installation, but it would happen to some people, and that is a great discouragement to use Julia on a constant basis. Plots is just immature. We need to focus on this alongside “time-to-first-plot”, it does not matter how fast julia gets booted if it fails to plot a simple line.

I love Julia for its speed and problems it solves, it really changed the way I do my work. But Julia has no:

  • No mature (consistently working) plotting system. Plots tries to be jack of all, but just ends up being bad at some use case. PyPlot is the only thing what would work for the most general case
  • Too many plotting unpolished plotting packages: PGFPlotsX, Gaston, Makie, Gadfly, Winston …

For me, the only way to plot in Julia has been PyPlot with seaborn, which produces fairly good-looking plots, without any hassle, with pdf output it serves as almost “Publication Quality”, fonts look consistent and equally spaced (although offsets are not latex formatted, but at least it has it)

image

I have been looking to get Plots to workout, but I always come to find some unexpected behaviours. Could someone point me toward a better solution for plotting? I really wish I am wrong about my opinion about Plots, please help me to just find a better solution.

My system:

              _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.0.5 (2019-09-09)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/  

    Status `~/.julia/environments/v1.0/Project.toml`
  [537997a7] AbstractPlotting v0.9.10
  [6e4b80f9] BenchmarkTools v0.4.3
  [336ed68f] CSV v0.5.16
  [717857b8] DSP v0.6.1
  [a93c6f00] DataFrames v0.19.4
  [0c46a032] DifferentialEquations v6.9.0
  [31c24e10] Distributions v0.21.7
  [7a1cc6ca] FFTW v1.1.0 #master (https://github.com/JuliaMath/FFTW.jl.git)
  [4af5e9cd] Findpeaks v0.0.0 #master (https://github.com/tungli/Findpeaks.jl)
  [59287772] Formatting v0.4.1
  [f7f18e0c] GLFW v3.1.0
  [4b11ee91] Gaston v0.10.0 #master (https://github.com/mbaz/Gaston.jl.git)
  [7073ff75] IJulia v1.20.2
  [a98d9a8b] Interpolations v0.12.5
  [2fda8390] LsqFit v0.8.1
  [47be7bcc] ORCA v0.3.0 #master (https://github.com/sglyon/ORCA.jl.git)
  [429524aa] Optim v0.19.4
  [d96e819e] Parameters v0.12.0
  [5ad8b20f] PhysicalConstants v0.2.0
  [995b91a9] PlotUtils v0.6.1
  [f0f68f2c] PlotlyJS v0.13.0 #master (https://github.com/sglyon/PlotlyJS.jl.git)
  [91a5bcdd] Plots v0.27.0 #master (https://github.com/JuliaPlots/Plots.jl.git)
  [438e738f] PyCall v1.91.2
  [d330b81b] PyPlot v2.8.2
  [295af30f] Revise v2.3.1
  [79098fc4] Rmath v0.6.0
  [6038ab10] Rotations v0.12.0 #master (https://github.com/FugroRoames/Rotations.jl)
  [90137ffa] StaticArrays v0.11.1
  [2913bbd2] StatsBase v0.32.0
  [f3b207a7] StatsPlots v0.12.0
  [a759f4b9] TimerOutputs v0.5.3
  [37b6cedf] Traceur v0.3.0
  [37e2e46d] LinearAlgebra 
  [10745b16] Statistics 
2 Likes

FWIW I can reproduce this in Plots currently, however I’m not sure this kind of tone is very helpful in encouraging people who have put in quite a bit of effort to get Plots and other packages to the place they are now to invest more of their time in solving your problems.

It appears that PyPlot is doing something automagically here for certain values of x- and y-ticks, have you opened an issue on the Plots repo to see whether somethingn like this could be implemented?

3 Likes

Please consider the possibility that you are extremely misinformed about this. I am not even sure what you consider the “products” here.

PGFPlotsX is just a very thin layer on the LaTeX library pgfplots, which is very mature and polished.

6 Likes