Vegalite.jl: levels and DataFrames

I was reading about levels for DataFrames. How exactly can they be useful with Vegalite.jl?

I read about for example display purposes: when I tried to plot some data with this in mind (already some months ago), there was no automatic ordering in the plot. Maybe I was an ignorant fool about it. If so, please explain.

For example, I want to make a barplot with counts/frequency, and some variable/column has three unique values. Regardless of the counts, I want a certain value first (e.g. at the top on Y), another second, and the last third.

I supposed that Vegalite.jl (and any other plotting pack) would then recognize the order set, and plot accordingly. Am I wrong?

Can link to the DataFrame levels? I have to read about it first, to see, what you see, for the use in VegaLite.jl.

I was reading the DataFrames manual . The heading is Categorical Data.

In fact, if I now render a DataFrame string column categorical, Vegalite.jl does not want to do any plotting with it. (I recall it was not so before, if not mistaken.) It gives a method error concerning ambiguity and conversion.

OP you should post an MWE and the full error message for us to be able to help you.

Yes. I would, but I don’t have that opportunity. I thought that giving some essential words would then be the next best thing.

I think it can be done, but supposing that it should be done automagically is wrong (for now), at least in my opinion. Typically I dislike automatisms like this, I prefer that I have to be explicit with the things I want to have in a plot and how it should be. But for VegaLite.jl this is not me who has to decide.

Despite that, the categorial values can be used just like normal columns in the DataFrame:

using VegaLite, DataFrames, CategoricalArrays

data = DataFrame(
    a=["A","B","C","D","E","F","G","H","I"],
    b=[28,55,43,91,81,53,19,87,52],
    c=["C1","C1","C2","C2","C1","C1","C2","C2","C2"]
)

categorical!(data, :c)

data |> @vlplot(
	:bar, 
	x="a:o", 
	y=:b,
	color=:c
)

The bar chart you are heading for seems to be the grouped bar chart as in your other (older) thread

For what it’s worth in addition: only if I change an existing String column to a CategoricalString column, errors occur. If I load for example a DataFrame from RDatasets, it doesn’t occur at all and things work normally.

Thanks.

No, not a grouped one necessarily. I was in this thread wondering about the visual ordering of the variables/columns in the plot, thinking that using categorical columns, levels and so on would enforce/control/feed this information to a plotting pack (e.g. Vegalite.jl) which in turn would plot according to this preorganized order.

I’m unsure as to whether I understand you. Are you saying that what I mentioned in the previous paragraph is so, or that it is not? (I am right now somewhat tired and have an almost pounding headache, so that might be a problem :slight_smile: )

My guess is, that it is currently not like this. I guess, because I didn’t check in the code, or better, as it is also depending on the Vega/VegaLite code, I am just not capable of checking alle the code (in a timely manner). At least I wouldn’t rely on the order of categorial data. If the order of categories is important, I would make it explicit in the transform part (or where it has to be) of the VegaLite spec.

On the other hand, it could be made, to be like this. But this, I wouldn’t appreciate, but it is not me to decide, how it should be in future.

Indeed, it is not so. I just checked it myself by merely using an existing DataFrame and making changes as I went. It is not supported, if this little experiment is to be taken definitively.

Though you are against it, it would be a nice option, wouldn’t it? Perhaps a simple toggle somewhere would do the trick.

And as you say or mean, the decision is up to someone else.