I ran into a problem when using VegaLite (+Jupyter) on larger datasets (80k+ rows, 6 variables). When rendering a scatterplot inline in a notebook, jupyter hangs when saving. This is a MWE:
df = DataFrame([Symbol("var$i")=> rand(1:100, 80_000) for i in 1:6]...)
df |> @vlplot(
mark={typ=:point, filled=true},
x=:var1,
y=:var2,
size={value=1}
)
Presumably this is due to the overhead of saving the whole vegalite specification (which for instance includes the whole dataset, I think). My question: since I only require interactivity in specific cases, is it possible to make VegaLite.jl render the output directly as png or pdf to circumvent the vegaspec overhead?
I don’t know how to do that, but I found that the Jupyter notebook often contains three different pieces of data for VegaLite (the raw data, the PNG and the SVG) and wondered whether a keyword argument could be added to select only one of these.
Yes, that is a good idea. Maybe one way to do this would be to pipe it into a PNG type, so something like df |> @vlplot(...) |> PNGImage works… Then what would be displayed is PNGImage, and not the actual plot object itself. Could you open an issue about this on the VegaLite.jl repo, and we can discuss some options there?
In general the story around large datasets is not great right now for VegaLite.jl, because we also use this incredibly inefficient JSON representation of the data to hand it off to the vega engine. My plan is to transition that over to arrow, once the rewrite of Arrow.jl is done. That might help a lot with non-Jupyter situations, but it is actually not clear to me how we might be able to solve the large data problem in Jupyter itself…
Ah, that is nice! I think it would be great if you were to register that, seems a much cleaner solution to have this in its own package, and then I could get rid of MimeWrapper in VegaLite.jl, which always felt strange there.