Vegalite bar chart strange behaviour

Hello, I have a dataframe of a single column, it contains 619245 data points. I want to make a bar char to check the distribution. Here is my code

df_BQBZ|>
@vlplot(:bar, height=250, width=500, x={:BQBZ, steps=0.3}, y=“count()”)

And here is the resulting plot. Something is off, if you look at the number on the y axis, I should be in the range hundred of thousands, but it shows 1600. Also, this code causes regular crashes of Pluto. I have 24 threads and 70 Go of ram so it shouldn’t be a hardware issue.

weird

can someone help me debug my plot?
Thank you

Maybe try with a subset of the data to see if it is correct for differing
counts of input values?

It’s actually working, the first time it produced this correct plot, and now I don’t know why I can’t reproduce it
visualization

Ok, I can reproduce the correct plot with this simple code

df_BQBZ|>
@vlplot(
:bar,
x={:BQBZ,bin=true},
y=:“count()”
)

But as soon as I try to change the “step” values I get the plot of the original message. Any idea? I would love steps of 0.2

I ended up making a density plot, because I need to progress

df_BQBZ|>
@vlplot(
:area,
transform=[
{density=“BQBZ”,bandwidth=0.2}
],
x={“value:q”, title=“BGBZ values”},
y=“density:q”
)

but I still would love how to change the step of the binning in the histogram

the vega-lite bin docs shows the various ways to set the binning parameters.

The VegaLite Tutorial has an example using the Vega.printrepr() function to generate the @vlplot style version of a JSON specification as from the vega-lite site.

As mentioned by @devel-chm, the vega-lite docs have a detailed documentation to customize the binning. In particular, if you want to fix the bin width, you can do it with the step property of the bin parameters.

In VegaLite.jl probably you need to do x={:BQBZ, bin={step=0.2}}.

In Deneb.jl, you can have a look at this example and replace maxbins=50 with step=0.2.

As for the crashes in Pluto. Is your df_BQBZ dataset very large? VegaLite charts store the whole data in its specification, so several plots with fairly large data can easily result in slow responsiveness (and possibly crashes?). You can see more about that in Altair’s documentation. A common approach to solve that is to pre-aggregate the data (e.g. binning) and filter it to keep only the columns that the figure needs.

2 Likes

Thanks, your solution works beautifully. I still sometimes have issues with the Julia syntax. I am not coming from an IT background. The data sets are relatively large. We are in a million data points. I switched to a Jupyter notebook, and there was no problem.

Thank you for mentioning Deneb.jl I did not know about it.