Graphical Exploratory Data Analysis in Julia with VegaLite/Gadfly/Plots

That does make some sense…

I like the QuickVega idea. As long as they’re composable, and + can convert them to layers and add them to a plot, it seems like that’s more or less all the way there. When there’s a simple VegaLite macro, then they can compile directly to something like your @vlplot({:histogram, color=:lightblue}) whereas when they require custom calculation within Julia, then they’re more complicated, but also there’s a clearly defined interface to write your own plotting. There are always going to be people who want to create rather complicated plots: things that with one command automatically visualize say the mixing of multiple chains of an MCMC run. We can’t just rely on vegalite to do that for us :wink:

In fact, how is the QuickVega idea really different from a library of “geoms” ? geom_histogram(), geom_density(), geom_myfavoriteplot() etc. in R.

obviously you’d like it to be easy to simply make a whole plot by calling histogram(df.x) but as long as you can also do ... + histogram(df.x) and add a histogram layer to something else, it seems like it gives you the best of both worlds.

Hi @davidanthoff, I’m writing my next tutorial using Vegalite. The first thing I’m trying to do is to create a panel of 3 graphs, but unlike the examples you mention in your documentation, these are just “horizontally stacked” not faceted on some column in the data.

It’s clear that I can return from my function a column vector of vegalite plots, and I get a stack of three separate plots… How can I do this except horizontally?

EDIT: another issue… I’ve been banging my head on my desk trying to get points and a regression line or loess line… I thought it was my own fault, but running the example here produces a blank page:
https://www.queryverse.org/VegaLite.jl/stable/examples/examples_advancedcalculations/#Loess-Regression-1

Summary
using VegaLite, VegaDatasets

dataset("movies") |>
@vlplot(
layer=[{
    mark={:point,filled=true},
    x="Rotten_Tomatoes_Rating:q",
    y="IMDB_Rating:q"
},
{
    transform=[
        {
            loess="IMDB_Rating",
            on="Rotten_Tomatoes_Rating"
        }
    ],
    mark={:line,color="firebrick"},
    x="Rotten_Tomatoes_Rating:q",
    y="IMDB_Rating:q"
}]
)

In Firefox 76.0.1 on Linux, also Chromium 81.0.4044.92

EDIT2: Turns out I had ancient versions of Queryverse due to version conflict with Gadfly. I uninstalled Gadfly, updated Queryverse… lets see what happens.

EDIT3: Ok, at least the example works with current versions of VegaLite…

EDIT4: But now, if I have two plots that work, doing [plota; plotb] causes me to see two stacked plots that show the box and the axis labels and are otherwise blank
image

and I see I can “hcat” vlspecs together, but I again get blank boxes:

image

Just so it’s clear, both of the individual plots are fine, and look more or less like this:

image

Can you post the code you are using to create the stacked plots?

I figured out a way around it, and didn’t check in the intermediate broken bits… so I don’t have the non-working code anymore :frowning:

The essential bits that were the fix was to create the vlspecs, hcat them together, and then pipe the data into them

Here’s the code from my working notebook:


function plotstate(df,state)
    dfstate = df |> @filter(_.state == state) |> 
        @mutate(testpct=_.positiveIncrease/(_.totalTestResultsIncrease+.1)) |>
        @select(:thedate,:state,:testpct,:positiveIncrease,:deathIncrease)|>DataFrame

    testing = @vlplot(width=300,layer=[],title="Testing in $state") +
        @vlplot(:point,x=:thedate,y={:testpct,axis={title="Percentage Positive"}}) + 
        @vlplot(transform=[{loess=:testpct,on=:thedate}],mark=:line,x=:thedate,y=:testpct)

    cases = @vlplot(width=300,layer=[],title="Cases in $state") + 
        @vlplot(mark={:point,filled=true},
                x=:thedate,y={:positiveIncrease,axis={title="Cases Per Day"}})+
    @vlplot(transform=[{loess=:positiveIncrease,on=:thedate,bandwidth=.2}],
            mark=:line,x=:thedate,y=:positiveIncrease)
    
    deaths = @vlplot(width=300,layer=[],title="Deaths in $state") + 
        @vlplot(:point,x=:thedate,y=:deathIncrease) + 
        @vlplot(transform=[{loess=:deathIncrease,on=:thedate,bandwidth=.2}],
                mark=:line,x=:thedate,y={:deathIncrease,axis={title="Deaths Per Day"}})
    
    return(dfstate |> hcat(testing,cases,deaths))
    
end

the version that wasn’t working piped the data into each individual plot and then tried to hcat them together.

The whole code is in the repo I was trying to figure out binder in: GitHub - dlakelan/JuliaDataTutorials: Tutorials For Data Analysis in Julia check out the COVID-tracking.jmd file, or the ipynb

Hm, weird, I think it should actually also work if you give every plot it’s own data… But glad it worked :slight_smile:

Yeah, strangely when I tried to do a MWE it did work fine when I gave each plot its own data… so I couldn’t even reproduce the problem in a small example.

There’s also something to do with versions of queryverse that might have changed. I wrote some notebooks that used Gadfly, but then later found that Gadfly was holding back a lot of packages. So Queryverse was at a really old version.

If I add Gadfly and then add Queryverse, I get Queryverse 0.5.0

If I put Gadfly on #master then update, I get VegaLite 1.0.0 and Queryverse 0.3.1

This seems to be a showstopper for Gadfly for me.

Removing Gadfly, I get Queryverse 0.5.0 VegaLite 2.1.3

Gadfly#master (soon to be v1.3) is compatible with CategoricalArrays 0.8 and DataFrames 0.21, but Queryverse is not yet. What’s the plan for Queryverse?

Ah, I had somehow missed the CompatHelper PR! New Queryverse tag that is compatible with DataFrames 0.21 is here.

2 Likes

Hi @dlakelan, I started Julia few months back, been watching it since its birth. Like yourself, I use Julia mostly for data analysis, munging, model fitting and visualization. Prior to Julia was R and Gnuplot. Thanks to @Mattriks and @davidanthoff - they have been very helpful with Gadfly and Vegalite. I also have explored other visualization packages - e.g Makie.jl, Gnuplot.jl and GR.jl. Gadfly’s documentation is excellent. Most of my plots are done in Gadfly and some in Makie. Makie, like Julia is new kid on the block, with promising future. Hope this does not confuse you or offends anybody.

2 Likes

Hmm… Without Gadfly I get the new Queryverse@0.6.0 but with Gadfly#master I get it downgrading back to Queryverse@0.3.1 and VegaLite@1.0.0

Trying to pull Queryverse@0.6.0 I get:

ERROR: Unsatisfiable requirements detected for package DataFrames [a93c6f00]:
 DataFrames [a93c6f00] log:
 ├─possible versions are: [0.11.7, 0.12.0, 0.13.0-0.13.1, 0.14.0-0.14.1, 0.15.0-0.15.2, 0.16.0, 0.17.0-0.17.1, 0.18.0-0.18.4, 0.19.0-0.19.4, 0.20.0-0.20.2, 0.21.0-0.21.2] or uninstalled
 ├─restricted to versions * by an explicit requirement, leaving only versions [0.11.7, 0.12.0, 0.13.0-0.13.1, 0.14.0-0.14.1, 0.15.0-0.15.2, 0.16.0, 0.17.0-0.17.1, 0.18.0-0.18.4, 0.19.0-0.19.4, 0.20.0-0.20.2, 0.21.0-0.21.2]
 ├─restricted by compatibility requirements with Queryverse [612083be] to versions: 0.20.0-0.20.2
 │ └─Queryverse [612083be] log:
 │   ├─possible versions are: [0.1.0, 0.2.0, 0.3.0-0.3.1, 0.5.0, 0.6.0] or uninstalled
 │   └─restricted to versions 0.6.0 by an explicit requirement, leaving only versions 0.6.0
 └─restricted by compatibility requirements with CategoricalArrays [324d7699] to versions: 0.21.0-0.21.2 or uninstalled — no versions left
   └─CategoricalArrays [324d7699] log:
     ├─possible versions are: [0.3.11, 0.3.13-0.3.14, 0.4.0, 0.5.0-0.5.5, 0.6.0, 0.7.0-0.7.7, 0.8.0-0.8.1] or uninstalled
     └─restricted to versions 0.8 by Gadfly [c91e804a], leaving only versions 0.8.0-0.8.1
       └─Gadfly [c91e804a] log:
         ├─possible versions are: 1.3.0 or uninstalled
         └─Gadfly [c91e804a] is fixed to version 1.3.0

It looks like Queryverse@0.6.0 isn’t quite compatible with DataFrames 0.21.x

├─restricted by compatibility requirements with Queryverse [612083be] to versions: 0.20.0-0.20.2

For now I’m uninstalling Gadfly, and re-upping Queryverse.

@Mattriks, @davidanthoff

With Gadfly#master, I’m using DataFrames 0.21.x :grinning:

Yes, it looks like although @davidanthoff said that Queryverse 0.6.0 would be compatible with 0.21.0, Pkg doesn’t think it is… and wants only 0.20.0-0.20.2

In a clean environment, try

pkg> add Gadfly
pkg> add Queryverse#master

I think that’s compatible with DataFrames 0.21.x

I had some mixup for the registration of Queryverse 0.6.0. New version is pending tagging New version: Queryverse v0.6.1 by JuliaRegistrator · Pull Request #16111 · JuliaRegistries/General · GitHub, hopefully that will properly be compatible with the new DataFrames. Sorry for the chaos here.

1 Like

No worries, thanks for being responsive and helpful!
I will now go on to make some more Tutorials without fear that I can’t use all the various packages together at once. It looks like I can get Gadfly, Queryverse, and Turing all to work together in current versions, which is great!

1 Like