How to color points in scatter plot by value?

I have a list of 3-dimensional points. I want to plot them in a plane scatter plot (using the first two-dimensions), and then color each point according to the third value.

How can I do this?

2 Likes

Gadfly examples here:
http://gadflyjl.org/latest/gallery/scales.html

With VegaLite.jl:

using VegaLite, DataFrames

df = DataFrame(x=randn(100), y=randn(100), z=randn(100))

df |> @vlplot(:point, x=:x, y=:y, color=:z)

example1

You can customize the color scale with any of the pre-defined color schemes:

df |> @vlplot(:point, x=:x, y=:y, color={:z, scale={scheme=:plasma}})

example2

You can also go entirely custom by specifying a custom piecewise scale:

df |>
@vlplot(
  :point,
  x=:x,
  y=:y,
  color={
    :z,
    scale={
      domain=[-3, -1, 1, 3], 
      range=[:red, :blue, :green, :yellow]
    }
  }
)

example3

And yes, I am aware that my custom color scheme example is probably a strong argument to go with the pre-defined schemes :wink:

6 Likes

PGFPlotsX.jl example:

using PGFPlotsX

@pgf Plot(
    {only_marks, scatter, scatter_src = "explicit"},
    Table(
        {x = "x", y = "y", meta = "col"}, 
         x = randn(100), y = randn(100), col = randn(100)
    )
)

image

5 Likes

How can I set xlim and ylim with VegaLite? Without defining those, the plot is totally off.

Came across with this question. Just an update

using Plots
scatter([1,2,3],[4,5,6],color=["red","blue","black"],legend=nothing)

Congrats on the beautiful wrapper to the PGFPlots that you have been developing. For high-quality plotting in the Julia environment, it seems second to none.

I have a similar problem to that posted by @e3c6, but I want the third dimension to be passed into SIZE rather than into COLOR in the plane scatter plot. The problem is how to size points in a scatter plot by the value of a third variable. Indeed, if it makes the solution easier, we can have color and size, but the size is crucial.

I am not an expert on LaTeX but have adapted your wrapper to some specific needs of mine. For example, I have been able to plot using a two y-axis ordinates (and the quality of the output is superb because we can use all the available tricks from LaTeX), create vertical shapes (like the vspan function in Plots.jl), or any sort of shapes (like the Shape function in Plots.jl). But unfortunately, I have not been successful in introducing the SIZE of a third variable into a scatter plot.

I checked the entire PGFPlots documentation (latest version, Version 1.17 – 2020/02/29), in particular, section “4.5.12 Scatter Plots” and section “4.17.2 Changing the Appearance of Individual Coordinates”, and I also checked the internet forums like here, here and here.

I tried with both raw LaTeX code and code adapted to the PGFPlotX wrapper, without success.

Some help would be appreciated, and my clumsy piece of code follows below. The errors that come out depend on whether raw code is used or not. This may be easy, but I could not figure it out. —Thank’s.

using PGFPlotsX
using LaTeXStrings

push!(PGFPlotsX.CUSTOM_PREAMBLE, 
        raw"\pgfplotsset{tick label style = {font = {\fontsize{12 pt}{12 pt}\selectfont}},
                         label style = {font = {\fontsize{12 pt}{12 pt}\selectfont}},
                         legend style = {font = {\fontsize{12 pt}{12 pt}\selectfont}},
                         title style = {font = {\fontsize{12 pt}{12 pt}\selectfont}},
                        }"
        )

@pgf a = Axis({
              height = "13cm",
              width = "15cm",
              colorbar,
              "colormap/jet",
              #grid = "major",
              xlabel = L"x",
              ylabel = L"y",
              title = "A Scatter Plot: size as a marker to a third dimension", 
               },
    
    
Plot(
    {only_marks, scatter, scatter_src = "explicit"},
         raw"\visualization depends on = {5*z \as \perpointmarksize},
         scatter/@pre marker code/.append style={/tikz/mark size=\perpointmarksize}",

Table(
        {x = "x", y = "y", meta = "z"}, 
         x = randn(50), y = randn(50), z = randn(50),
        ))
)


#raw"\visualization depends on = {5*z \as \perpointmarksize},
#      scatter/@pre marker code/.append style={/tikz/mark size=\perpointmarksize}",

#visualization_depends_on = "{5*z \as \perpointmarksize},
#        scatter/@pre marker code/.append style={/tikz/mark size=\perpointmarksize}",

Actually, on page 407, section 4.25 of the documentation of PGFPlots (Version 1.17 – 2020/02/29), there is detailed information about the introduction of the size of a third variable into a 2D scatter plot. It comes with the title:

/pgfplots/visualization depends on=〈expression〉\as〈\macro〉 (initially empty)
/pgfplots/visualization depends on=value 〈expression〉\as〈\macro〉 (initially empty)

Despite the detailed explanation being centered on a 3D case (4 with the size of the third variable), the example provided is one of a univariate process (see figure below). I can easily replicate the figure in pure LaTeX, but I could not reproduce this simple univariate case inside the PGFPlotsX. It has been so far the only case I have failed to apply various functionalities of PGFPlots within the PGDPlotsX package. Do I need to pass some commands to the preamble? Thank’s a lot.

In plots.jl there is the option of using marker_z.

using Plots
x= rand(10)
y= rand(10)
z= rand(10)
scatter(x,y,marker_z=z)
7 Likes

Thank’s, but this is not what I need. I can do this both in Plots or in PGFPlotsX. Your suggestion solves the problem initially posted by @e3c6: a 2D scatter plot (x,y), and the third variable/array passing information by coloring the points according to its values. So you are adding one further dimension to the two first ones.

I need to pass this new dimension in terms of the size of the points, not their color. That may look trivial, but if I want to use the size of different countries by population, or their GDP levels, it makes a huge difference because some will appear as tiny points. As we can see in the figure above, in the PGFPlots documentation, each point’s size is easily visible.

Thank’s a lot, anyway.

This problem I am raising must be easy to solve, and in some fields, this output I’m looking for is frequently used. For example, with Plotly (Python), the pice of code that does a similar job is as simple as this:

import plotly.graph_objects as go
import numpy as np
np.random.seed(1)

N = 100
x = np.random.rand(N)
y = np.random.rand(N)
colors = np.random.rand(N)
sz = np.random.rand(N) * 30

fig = go.Figure()
fig.add_trace(go.Scatter(
    x=x,
    y=y,
    mode="markers",
    marker=go.scatter.Marker(
        size=sz,
        color=colors,
        opacity=0.6,
        colorscale="Viridis"
    )
))

The output looks like this:

The crucial part of the code looks very similar to the code in PGFPlotsX.

You can also pass the size of the markers via the markersize argument.
Other options like markershape, markerstrokewidth, … are listed here: Series Attributes · Plots

Thank’s a lot for your help. In Plots, it is as easy as this:


using Plots
x= randn(200)
y= randn(200)
z= randn(200)
scatter(x,y, marker_z = z, markersize = 5*z,  color = :jet)
6 Likes
@pgf Plot({ scatter, scatter_src = "y", samples = 40,
            visualization_depends_on = raw"{5*cos(deg(x)) \as \perpointmarksize}",
            "scatter/@pre marker code/.append style" = raw"{/tikz/mark size=\perpointmarksize}" },
          Expression("sin(deg(x)"))

works fine for me.

@Tamas_Papp thanks a lot for your help. I tried many ways, but not the right way: inserting the raw command in the right places. Your code works fine for me as well for this univariate case

But following @kristoffer.carlsson above code for 3D, I tried to apply it to this particular case and I get an error. The code for 3D works fine for me as well, the only problem is that I still do not how to put the size of a third variable into a scatter plot. I can do it with Plots, but it would be fine to see how this works in PGFPLotsX as well. It is used a lot in many fields. It is not by mere chance that this particular type of plot is the first one we see when we visit the Plotly (Python) website.

My code, following the inputs from @kristoffer.carlsson code and your contribution, looks like this:

using PGFPlotsX
using LaTeXStrings

push!(PGFPlotsX.CUSTOM_PREAMBLE, 
        raw"\pgfplotsset{tick label style = {font = {\fontsize{12 pt}{12 pt}\selectfont}},
                         label style = {font = {\fontsize{12 pt}{12 pt}\selectfont}},
                         legend style = {font = {\fontsize{12 pt}{12 pt}\selectfont}},
                         title style = {font = {\fontsize{12 pt}{12 pt}\selectfont}},
                        }"
        )

@pgf a = Axis({
              height = "13cm",
              width = "15cm",
              colorbar,
              "colormap/jet",
              #grid = "major",
              xlabel = L"x",
              ylabel = L"y",
              title = "A Scatter Plot: size as a marker to a third dimension", 
               },
    
    
Plot(
    { only_marks, scatter, scatter_src = "explicit",
            visualization_depends_on = raw"{8*z \as \perpointmarksize}",
            "scatter/@pre marker code/.append style" = raw"{/tikz/mark size=\perpointmarksize}" 
    },
           
Table(
        {x = "x", y = "y", meta = "z"}, 
         x = randn(50), y = randn(50), z = randn(50)
    ))
)

What is my mistake? Thanks.

I am not familiar enough with visualization depends on, but if you have an idea of what you want the emitted LaTeX code to look like, that would make it easier to help. I usually search StackOverflow for a LaTeX solution and than make PGFPlotsX emit it.

Equally (or more) simple, with GMT.jl
https://www.generic-mapping-tools.org/GMT.jl/latest/gallery/scripts_agu/scatter_cart/

From those three hyperlinks above, the first and the third ones deal directly with my concerns. The only difference is that they base their plots on a built-in table data set (inside their LaTeX code), while we have data that is generated outside the LaTex code.

Interestingly, the piece of code that @kristoffer.carlsson provided above can easily render a 3D scatter which is capable of displaying 4 dimensions by using color.

The code is just a mere repetition of @kristoffer.carlsson code, with one more variable passed into the Table function:

using PGFPlotsX
using LaTeXStrings

push!(PGFPlotsX.CUSTOM_PREAMBLE, 
        raw"\pgfplotsset{tick label style = {font = {\fontsize{12 pt}{12 pt}\selectfont}},
                         label style = {font = {\fontsize{12 pt}{12 pt}\selectfont}},
                         legend style = {font = {\fontsize{12 pt}{12 pt}\selectfont}},
                         title style = {font = {\fontsize{12 pt}{12 pt}\selectfont}},
                        }"
      )

@pgf a = Axis({
              height = "13cm",
              width = "15cm",
              colorbar,
              "colormap/jet",
              grid = "major",
              xlabel = L"MB",
              ylabel = L"CPI",
              zlabel= L"RIT",
              ztick_distance ="4", # set the distance betwen each tick
              title = "A Scatter Plot: color as a marker to a fourth dimension", 
               },
Plot3(
    {only_marks, scatter, scatter_src = "explicit"},
    Table(
        {x = "x", y = "y", z= "z", meta = "y"}, 
         x = MB, y = CPI, z = RIT, 
    ))
)
#pgfsave("Scatter_4D.pdf", a)

If you manage to add “size” as another dimension, a simple 3D scatter plot can easily display 5 dimensions of a given problem. Moreover, the constant meta can be costlessly switched across different variables.

2 Likes