Format x-labels using VegaLite

I have waisted a lot of time unsuccessfully trying to find how to format the x-labels in the following graph to show a year as

2016

in stead of

2, 016

in the following plot:
where df is

	pubyear	za_count	world_count	za_percentage
1	2005	7643	1736590	0.440115
2	2006	8713	1828751	0.476445
3	2007	10071	1986892	0.506872
4	2008	10971	2118454	0.517878
5	2009	12141	2260698	0.537047
6	2010	12904	2300998	0.5608
7	2011	14684	2424086	0.605754
8	2012	16819	2545252	0.660799
9	2013	17541	2651360	0.661585
10	2014	19145	2763027	0.692899
11	2015	20688	2838265	0.728896
12	2016	22794	2928999	0.778218
13	2017	22155	2742119	0.807952

and using the following command (in Julia 0.6.4) to plot

df |> @vlplot(:bar, x=:pubyear, y = :za_percentage, points = true)

which results in the x-labels formatted as

2, 004    2,008   2,012   2,016

How do I do it?

1 Like

My guess is, that you can get what you want by changing the type of :pubyear to string:

df[:pubyear]=string.(df[:pubyear])

If I am wrong please consider reformating your question according to

Thanks. That helped. But I thought it should not be necessary to change the type of a whole column where a simple format function should be able to do the trick.

1 Like

There may be other (and better) solutions, but without some complete code example it is not easy to guess which solution is most suitable. Currently this is what I would have to do: (1) create something similar to your description of the problem, (2) search for a good solution. (3) afterwards I could still be wrong, because I don’t know your code. Because of (3) I don’t start with (1) :grin:

I gave you all the code I thought you need. I did not give you an image of the graph - because I don’ t know how to do it using this interface.

I have a DataFrame and want to plot it. I even gave you the content of the DataFrame and the exact code I used to plot it.

Here it is (again) with a modified command for the plot and one extra item: the description of the DataFrame (df):

df

	pubyear	za_count	world_count	za_percentage
1	2005	7643	1736590	0.440115
2	2006	8713	1828751	0.476445
3	2007	10071	1986892	0.506872
4	2008	10971	2118454	0.517878
5	2009	12141	2260698	0.537047
6	2010	12904	2300998	0.5608
7	2011	14684	2424086	0.605754
8	2012	16819	2545252	0.660799
9	2013	17541	2651360	0.661585
10	2014	19145	2763027	0.692899
11	2015	20688	2838265	0.728896
12	2016	22794	2928999	0.778218
13	2017	22155	2742119	0.807952


describe(df)

	variable	mean	min	median	max	nunique	nmissing	eltype
1	pubyear	2011.0	2005	2011.0	2017		0	Int32
2	za_count	15097.6	7643	14684.0	22794		0	Int64
3	world_count	2.39427e6	1736590	2.42409e6	2928999		0	Int64
4	za_percentage	0.613482	0.440115	0.605754	0.807952		0	Float64


df |> @vlplot(:bar,title = "ZA percentage of World Output",
            x={:pubyear, axis={title="Year of Publication"}}, 
            y = {:za_percentage, axis={title="ZA as percentage of world output"}})

What do you need more? How did I construct the DataFrame? I used a sql-query.

This code reproduces your issue:

julia> using DataFrames
julia> using VegaLite
julia> df=DataFrame(pubyear=[2005,2006,2007],za_count=[7643,8713,10071],za_percentage=[0.440115,0.476445,0.506872])
3Γ—3 DataFrame
β”‚ Row β”‚ pubyear β”‚ za_count β”‚ za_percentage β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚ 2005    β”‚ 7643     β”‚ 0.440115      β”‚
β”‚ 2   β”‚ 2006    β”‚ 8713     β”‚ 0.476445      β”‚
β”‚ 3   β”‚ 2007    β”‚ 10071    β”‚ 0.506872      β”‚
julia> df |> @vlplot(:bar, x=:pubyear, y = :za_percentage, points = true)

visualization

As you can see, because of the Number-Type in pubyear we get a quantitative type on the x-axis, which results in the intermediate tics of 2005.5 and 2006.5.

My straight forward, easy to understand solution would be:

julia> df[:xlabels]=[ @sprintf("Year %d",year) for year in df[:pubyear] ]
3-element Array{String,1}:
 "Year 2005"
 "Year 2006"
 "Year 2007"

julia> df |> @vlplot(:bar, x=:xlabels, y = :za_percentage, points = true)

visualization-1

The Vega-Lite style solution would be this:

julia> df |> @vlplot(:bar, y = :za_percentage, points = true,
       encoding={x={field="pubyear",typ="ordinal",axis={title="Year",format="d"}}})

visualization-vega
Here I have explicitly defined the x axis in Vega-Lite JSON syntax.

You can read about how to translate Vega-Lite JSON into a VegaLite.jl julia command:
http://fredo-dedup.github.io/VegaLite.jl/stable/userguide/vlspec.html

The Vega-Lite axis syntax is documented here:

And information on the format strings can be found here:

When you create a plot with VegaLite.jl you will get a link (View Source) to the source. For the last plot above it looks like:

{
  "points": true,
  "encoding": {
    "x": {
      "axis": {"format": "d", "title": "Year"},
      "field": "pubyear",
      "type": "ordinal"
    },
    "y": {"field": "za_percentage", "type": "quantitative"}
  },
  "data": {
    "values": [
      {
        "pubyear": 2005,
        "za_percentage": 0.440115,
        "za_count": 7643,
        "xlabels": "Year 2005"
      },
      {
        "pubyear": 2006,
        "za_percentage": 0.476445,
        "za_count": 8713,
        "xlabels": "Year 2006"
      },
      {
        "pubyear": 2007,
        "za_percentage": 0.506872,
        "za_count": 10071,
        "xlabels": "Year 2007"
      }
    ]
  },
  "mark": "bar"
}

This can help to find the proper solution.

I hope this helps and also for the question on "What do you need more?: What we need is code which is easy to copy&paste into the REPL. This would take away 50% of the work for people who try to help (no offence meant).

2 Likes

Thanks for a thorough answer.

And apologies for making you work harder to provide it.

1 Like