Interactive plot, acting as a range slider tool for another plot?

How can I get in Julia an interactive plot with range slider for another plot?
Something like this (but not necessary with the same data on both plots):
https://bokeh.pydata.org/en/latest/docs/gallery/range_tool.html
ex

2 Likes

I don’t even know Julia can take mouse input

Yes, it is possible to take mouse input in Julia. I suspect Makie is a good candidate for implementing something like this. The interactive docs are here (they seem slightly outdated as they point to the Reactive package for the underlying logic, whereas Observables is being used now, but I imagine things should work with minimal changes). I’m not sure if a range_tool has been implemented though, chances are you may need to implement the logic yourself (meaning, make the larger plot update automatically as soon as the rectangle extremas are changed), which should be done with the lift function.

1 Like

I did this exact thing once. It can be done using VegaLite.jl

using DataFrames, VegaLite

function PlotWithFocus(A::DataFrame)
    A |> [
        @vlplot(mark={typ=:line,interpolate="linear"},x={field=:Time,typ=:quantitative,scale={domain={selection=:brush}},axis={title=""}}, y=:Displacement, width=600,height=400,title="Title",background="white");
        @vlplot(selection={brush={typ=:interval,encodings=["x"]}},mark={typ=:line,interpolate="linear"},x={field=:Time,typ=:quantitative}, y=:Displacement, width=600,height=80,background="white")
   ]
end

A=DataFrame(Time = collect(1:100),Displacement=[rand() for _ in 1:100])

PlotWithFocus(A)

Here I’ve used the same data for both but that wouldn’t be necessary.

5 Likes

Very nice! I’ve never really did much with the interactive features in vega-lite, so great to see that it actually all works via our julia wrapper :slight_smile:

1 Like

Thank you, this is a really nice example! I’ve made some more complex example with multiple scales:visualization

Source code
using DataFrames, VegaLite

function PlotWithFocus(A::DataFrame)
    A |> [
        @vlplot(
            mark={typ=:line,interpolate="step-after"},
            x={
                field=:Time,
                typ=:quantitative,
                scale={domain={selection=:brush1}},
                axis={title=""}
            },
            y=:Trend1,
            width=600,
            height=160,
            title="Title",
            background="white"
        );
        @vlplot(
            selection={brush1={typ=:interval,encodings=["x"]}},
            mark={typ=:area},
            x={
                field=:Time,
                typ=:quantitative,
                scale={domain={selection=:brush2}},
                axis={title=""}
            },
            y=:Trend2,
            width=600,
            height=80,
            background="white"
        );
        @vlplot(
            selection={brush2={typ=:interval,encodings=["x"]}},
            mark={typ=:line},
            x={field=:Time,typ=:quantitative},
            y=:Trend3,
            width=600,
            height=80,
            background="white"
        )
   ]
end

N = 2000

function some_data!(X, cX)
    x0 = 0
    s0 = 0
    for i=1:length(X)
        x0 += (X[i] - x0)*0.05
        X[i] = x0
        s0 += x0
        cX[i] = s0
    end
end

X = randn(Float32, N)
cX = similar(X)
some_data!(X, cX)


A = DataFrame(
    Time = collect(1:N),
    Trend1 = X,
    Trend2 = X,
    Trend3 = cX,
)

PlotWithFocus(A)

But, the main reason to do multi-scale is to handle large amounts of data at different levels of detail - both for better performance and better visual representation. Some simplified data representation (with less points) can be plotted at full range, and detailed data (with more points) can be plotted with fewer points within a small range.

In this context, I don’t understand:

  1. How to plot several datasets with different length? Or more generally, how can I bind plots with diferent data sources (and probably not only DataFrames type)?
  2. How to plot large amounts of data efficiently, so they are dynamically loaded on the required range, when the selection is changed? Or more generally, how to bind VegaLite events to some custom Julia functions e.g. for data range selection from a given data sourse. With current solution, interacting with 10k points is already laggy.
3 Likes

For questions in general the VegaLite.jl Docs are very helpful as is the VegaLite documentation (examples from there require a little translation but it’s not hard). Here is where data import is discussed in the context of VegaLite.jl where it describes ingesting csv, json, and a few other file types in addition to the DataFrame support. More detail is provided here from the VegaLite documentation.

I’m not sure what you mean by “datasets with different length”. Since they are being plotted together they must share some sort of axis. If one is missing (for example two time histories where once contains dates the other doesn’t) then one could be padded perhaps? In any case, the transformations provided in VegaLite are very important if one is going to be using it seriously. Particularly the Filter Transform which can be used to reduce and alter data from various datasets.

As for the large amount of data problem I have had the same issue. It can be tackled to some degree by using Filter so you select only every 50th point or something since all the points cannot be represented on the screen anyway. I have had this issue as well however and have found that for very large datasets (I ran into this problem when I threw about 2 million points at VegaLite) other plotting solutions may be needed. Others may have suggestions on how to squeeze more performance out of VegaLite or other plotting suggestions such as Makie mentioned previously.

It’s also worth noting the VegaLite is Lite for a reason, it’s a simplified interface to the Vega plotting system. Much more complicated data ingestion and transformation is possible with the full Vega system. I believe there has been discussion of a Vega.jl package or similar but I don’t know much about it but the number of points issue would likely remain.

2 Likes

Simply, I have two datasets with different length on the same x-axis:

dataset1 = DataFrame(xTime = collect(1:100), yTrend = randn(Float32, 100))
dataset2 = DataFrame(xTime = [10, 35, 60, 94], yEvents = [1, 2, 1, 1])
using DataFrames, VegaLite

function PlotDifferentData(A::DataFrame,B::DataFrame)
    [
        @vlplot(mark={typ=:line,interpolate="linear"},data=A,x={field=:xTime,typ=:quantitative,scale={domain={selection=:brush}},axis={title=""}}, y=:yTrend, width=600,height=400,title="Title",background="white");
        @vlplot(selection={brush={typ=:interval,encodings=["x"]}},data=B,mark={typ=:point},x={field=:xTime,typ=:quantitative}, y=:yEvents, width=600,height=80,background="white")
   ]
end

dataset1 = DataFrame(xTime = collect(1:100), yTrend = randn(Float32, 100))
dataset2 = DataFrame(xTime = [10, 35, 60, 94], yEvents = [1, 2, 1, 1])

PlotDifferentData(dataset1,dataset2)

It also turns out that VegaLite.jl handles missing values just fine so if you want data sources of different length on the same axes just pad with missing like this.

2 Likes

In my mind there are two distinct issues currently with large datasets:

  1. How do you get them into vega-lite? Right now if you start with a very large dataset in julia and plot it with VegaLite.jl, the dataset will essentially be sent in a JSON format to vega-lite (the Javascript part). That can’t be a very efficient way :wink: I think in the short run, maybe one way around that is to save the data as a CSV file and then specify the file as the data source in the vega-lite spec. I’m not really sure that is faster, but it might. Medium/long term I think something base on Add Arrow/Feather reader · Issue #1300 · vega/vega · GitHub will probably resolve this issue.
  2. The second issue is whether vega-lite can deal with very large datasets, once the transfer problem has been solved. I think right now the javascript part starts to choke with very large datasets (although 10000 seems a number it should still handle fairly easily). I’m not super familiar with the plans of the vega-lite team, but I know this scenario is very much on their radar. I think GitHub - vega/falcon: Brushing and linking for big data is a first attempt to tackle this. I have no idea how exactly that work might interact with the julia story in the future, but I would expect we can sort that out somehow eventually.
2 Likes

Conceptually, almost all useful plots of large datasets are in fact plots of very small datasets that result from a transformation (binning, quantiles, various combinations of these). Otherwise overplotting makes plots very hard to read.

Making the transformations interactive is very nice, but that results in an application that does much more than plotting. Of course it is nice to leverage something already available, but IMO Julia will soon be at the point that applications like this can be written natively, using some web-based UI, plotting packages, and existing data table management libraries. For very large data, probably the weakest link at the moment is the latter (if the data does not fit in memory).

CSV can’t be very efficient either - it’s the same text as JSON. For my personal needs I store data just in plain binary files with column format and names (and other metadata) defined in separate header file.

Actually, I would not link plot performance problem with large datasets, because memory limitations for array storage is far beyond the limitations of plotting system. In other words, even relatively small datasets (100k points of Float32 ~ 390KB) can be plotted with lags on interaction, if library is using SVG or Canvas, instead of WebGL. Or if there are no effective caching of invisible data points beyond the visible range. But it can be used for interaction with small datasets or generating static plots, with no interaction.

In Matlab, I’ve made some primitive plot caching mechanism that binds file reading by chunks with plotting a small range of long-term signal. When visual range is changed, It just reads this range with some additional ranges on edges, so there is no need to read file with every small range updates, and big amounts of data can be loaded dynamically on range changes.

Large Datasets

FYI. The InspectDR plotting tool was designed to deal with large datasets (ex: 2Gb+):
→ GitHub - ma-laforge/InspectDR.jl: Fast, interactive Julia/GTK+ plots (+Smith charts +Gtk widget +Cairo-only images)

InspectDR uses “F1-acceleration” to draw line “glitches” that display the min/max values of a dataset when they are too close to be drawn on different “plot pixels”.

Caveat: F1-acceleration is only active when you plot with lines. This filter generates confusing/misleading plots if you are plotting with “symbols” instead of lines. However, the behaviour can be overwritten if you really want to apply F1-acceleration with symbols.

The GUI is built on Gtk, and I must say it has pretty good interactivity. For example: you can use the mouse to perform a Box zoom - and you can even add “delta” markers to measure slopes, etc:

Unfortunately, it does not have that cool capability of using one plot as a slider tool for another, though. That is truly a great feature that VegaLite.jl has there!

4 Likes

This topic came up on the vega slack channel recently. My understanding is that they actually plan a design where the JavaScript part can call back into whatever host language one is using to do the data transformations there, in reaction to interactions. I think the prototype they have is based on SQL, but I believe they plan to also add support on the Python side for pandas and ibis. I think a natural story for julia then would be to use Query.jl in the same way for that.

In any case, I’m in touch with the vega-lite group, and we’ll probably have a meeting soon to discuss those kinds of things. I still wouldn’t expect anything anytime soon, but folks are thinking about this issue :slight_smile:

1 Like

It sounds like Vega is trying to cover functionalities similar to Bokeh plus datashader. But maybe it would be sufficiently different so that it uses grammar of graphics-like approach and focuses on statistical visualization?

By the way, I think writing Bokeh backend in Julia is another approach for interactive plot for “big data.” I think Julia is a great fit for implementing something like datashader.

I’m the project lead for Bokeh. I’d just like to reiterate that I think Bokeh and Julia would make a fantastic combination. I don’t have the experience to design or implement Julia APIs myself, but I’d be more than happy to help/collaborate with anyone on the Julia side that wanted to work on Julia bindings for BokehJS (basically: generate the right pile of JSON to drive BokehJS)

5 Likes

Yes, a Bokeh package would be great! I won’t have time to do that, but if someone is picking up that project, please feel free to reach out how to handle things like MIME types, integration with ElectronDisplay.jl and VS Code, saving in different formats etc. I think I figured out good solutions for almost all of these for VegaLite.jl, and I assume many of those tactics could easily be reused for a bokeh package. And I had a lot of false starts on that front, so if I can help someone else avoid making those, I’d love to do so :slight_smile:

3 Likes

Some of the questions, that should be clarified:

  1. How data sourse connection is done? Convertion to plain text (as JSON) is inefficient, one wants to pass some AbstractArray or SubArray types directly to plotting system.
  2. Filters and other callbacks? Filtering functions usually are very domain-specific and should be programmed separately and bound as callbacks, just like GUI interactions, (not just declaring the standard filtering options in VegaLite).
  1. How data sourse connection is done? Convertion to plain text (as JSON) is inefficient, one wants to pass some AbstractArray or SubArray types directly to plotting system.

The Python Bokeh server has a websocket protocol that can send array that correspond to JS typed array types directly without any encoding. A small subset of of Bokeh server capability could be implemented in Julia, to send data in this efficient manner. If you are generating standalone output (i.e. and static HTML document) then this is obviously not an option, regardless. A base64 enocoding can be used in this case, which can be more performant than a JSON encoding, especially in the case of multi-dimensional arrays.

Filters and other callbacks?

Bokeh supports defining Custom JS callbacks for any kind of output, and in response to any property change or update. The Python bokeh server allows real Python callbacks to be defined in reponse to these same set of events. To support Julia callbacks would mean implementing more of the Bokeh server in Julia directly, which is not a trivial task but also not an impossible task.