PGFPlotsX.jl - xtick distance/range/count for histograms?

This is more a general question: I have to deal with lots of histograms and I have problems setting the xticks when the option ybar interval is set, since xtick distance = {NUMBER} seems to be ignored by pgfplots.

To be more specific:

using PGFPlotsX
using StatsBase

figure = @pgf Axis(
    {
        "ybar interval",
        xmajorgrids = false,
        xtick_distance="{10}",
    },
    Plot(Table(fit(Histogram, rand(-100:100, 1000), -50:50)))
)

produces
hist1

without restricting the range, it’s still ignored but at least it’s readable:

figure = @pgf Axis(
    {
        "ybar interval",
        xmajorgrids = false,
        xtick_distance="{10}",
    },
    Plot(Table(fit(Histogram, rand(-100:100, 1000))))#, -50:50)))
)

hist2
Whereas it works when I don’t use the option ybar interval (of course then I get a line plot instead).

figure = @pgf Axis(
    {
        #"ybar interval",
        xmajorgrids = false,
        xtick_distance="{10}",
    },
    Plot(Table(fit(Histogram, rand(-100:100, 1000))))#, -50:50)))
)

hist3

Does anyone know how to do it without hardcoding the whole range with e.g. xtick={...}? I’d like to either set the distance between ticks or the number of ticks

Maybe try passing a LinRange / Range object?

I don’t know how to do this without hardcoding xtick, but I also don’t really see a problem with this approach, e.g.

figure = @pgf Axis(
           {
               "ybar interval",
               xmajorgrids = false,
               x_tick_label_as_interval=false,
               xtick=-50:25:50
           },
           Plot(Table(fit(Histogram, rand(-100:100, 1000), -50:50)))
       )

Is this a no-go for you?

1 Like

Sorry @jbrea I lost this thread. Yes it’s an alternative which I can live with…

1 Like

Still I am confused why such basic things are not working, so I guess I am not grasping how to do it correctly.

Given a 1D dataset, I’d like to simply plot a histogram and define the xtick distance by a single value, instead I have to do specify a range:

data = randn(1000) .* 10

@pgf Axis(
    {
        "ybar interval",
        xtick=-50:10:50,
        #xtick_distance="{10}",   # this has no effect whatsoever
        xmajorgrids = false,
        x_tick_label_as_interval=false
    },
    Plot(Table(StatsBase.fit(Histogram, data, nbins=100)))
)

This means that for every histogram I plot, I need to provide a range which is at least as large as the dataset span and then pass that range (-50:10:50) to it, while the only relevant number in this case is 10. I’d rather prefer passing the tick distance (10) somehow :thinking:

59

With Plots.jl the defaults were already pretty good:

53

AFAIK it is a feature of ybar interval to add bin labels to every bin. You can fiddle around with the styling, but IMO the simplest workaround is the one you found.

We could include a utility like pretty in R to make a -50:10:50 from a bunch of random points. I wonder if some package has this already?

Ah I understand, it just happened that I have not encountered such a situation with ybar yet. Indeed with ybar it works fine.

pretty sounds useful but not sure where to feed this in. Especially when we overlay multiple histograms for example… The data need to be analysed before creating the Axis options, if I understand the mechanics correctly. Or modify them afterwards?

Anyways, I can live with the solution above, so if I am the lonely wolf, we can just forget about it :wink:

Yes, exactly,

But preprocessing is important with PGFPlotsX anyway, since overplotting is very costly.

1 Like

Also, with PGFPlotsX you are part of a wolf pack whose members are kind of obsessed about how their plots look to an extent that borders on unhealthy, so suggestions, feature requests and PRs are very welcome.

I often needed an equivalent of pretty. We should provide more tools for preprocessing, but I am not yet sure if they should live in PGFPlotsX or not. PlotUtils.jl has some code for this, but I need to understand it better.

2 Likes

PlotUtils is intended as a low-dependency library with useful utilities to share across plotting packages: AFAIU both Makie and Plots use it. I think just:

using PlotUtils
optimize_ticks(0.3, 12, k_min = 5, k_max = 9)

should give a sensible outcome. k_min and k_max are the minimum and maximum number of ticks allowed. How many ticks the algorithm finds is hard to tell in advance, but you can constrain between these two extrema (I think Plots chooses 4 and 8 for example). These two values need to be sufficiently spread apart so that the algorithm can find “strict ticks”, i.e. good ticks that do not go outside of the data. There is some discussion here.

4 Likes