Directly support HoloViews-style inspect operations #1126

jbednar · 2022-09-26T16:58:59Z

When Datashader renders a large dataset, a human being is usually able to see patterns and interesting datapoints that merit further investigation. Unfortunately, the rendered image does not provide any easy means of doing so, as the original datapoints have all been reduced to pixels (or more accurately, to scalar accumulated values in bins of a 2d histogram). To support investigation of interesting features, HoloViews implements a series of "inspect" operations that query the original dataset after a selection or hover event on the rasterized data. E.g. inspect_points in https://examples.pyviz.org/ship_traffic will query the original dataset to show hover and other information about the original datapoints being visualized. However, going back to the original dataset is quite slow, because it requires traversing either the entire dataset or (for a spatially indexed data structure) at least a chunk of the dataset, which makes the interface unpleasant and awkward and thus eliminating certain types of interactivity.

Datashader can collect multiple aggregations on a single pass through the data, so I suggest that we support an accumulation mode that gathers datapoint indexes rather than datapoints, so that hover and drilldown information can be supported instantaneously. Of course, arbitrarily many datapoints can be aggregated into a single pixel, while any practical aggregation can only accumulate a fixed number of indexes per pixel. Still, that's already how the inspect_ operations work; they discard all but a configurable number of results, which is fine for linking to one or two examples per pixel, and allows single-datapoint precision with enough zooming in. By default I'd suggest accumulating the index of the minimum and the maximum value per pixel, but even just keeping the first or last datapoint for that pixel would be useful.

If we keep at least three datapoints per pixel (e.g. min, max, and one other) we'd be able to distinguish between complete and incomplete inspection data for that pixel (i.e. are these the only points? Yes, if there are 2 or fewer; unclear otherwise). Seems to me that we should be able to have a fully responsive, fully inspectable rendering of a dataset at low computational and memory cost using this method.

The text was updated successfully, but these errors were encountered:

jlstevens · 2022-09-28T13:01:46Z

After some discussion, I think we agreed that a first and last index (per bin) aggregate makes sense and that a where aggregator (e.g. 'show the ship with the highest tonnage value that contributed to this pixel') would be nice too.

The only other point that I think is important is that you need to know the count because that can give you the context to know whether the 'first' or 'last' index is unique or a random sample that contributed to the pixel value (ignoring the possibility that it is meaningful due to sorting e.g. by time)

jbednar · 2022-09-28T13:33:19Z

@philippjfr suggests implementing a where aggregator that returns some column's value given an aggregator that's applied to some other column, e.g. where(max('value'), 'index'). That way a user can define which samples are kept.

I believe this syntax could support an n argument, retaining the top n values (e.g. the n largest value datapoints encountered). It will be important to clearly indicate in the documentation the conditions under which it's just n arbitrary samples compared to the top n along a well-defined measure. The default for plotting purposes would probably need to be arbitrary since counts are plotted by default and counts don't establish any ordering between datapoints. In that case a single datapoint is probably the most reasonable default (one exemplar per pixel), i.e. a default like where(..., 'index', n=1).

ianthomas23 · 2022-10-18T10:04:02Z

Some of this can already be done in datashader, e.g.

import datashader as ds
import pandas as pd

df = pd.DataFrame(dict(x=[0, 1, 0, 1, 0], y=[0, 0, 1, 1, 0], myindex=[4, 5, 6, 7, 8]))
canvas = ds.Canvas(3, 3)
agg = canvas.line(
    df, "x", "y",
    agg=ds.summary(count=ds.count(), first=ds.first("myindex"), last=ds.last("myindex")),
)

which produces

<xarray.Dataset>
Dimensions:  (x: 3, y: 3)
Coordinates:
  * x        (x) float64 0.1667 0.5 0.8333
  * y        (y) float64 0.1667 0.5 0.8333
Data variables:
    count    (y, x) uint32 2 1 1 0 2 0 1 1 1
    first    (y, x) float64 4.0 4.0 4.0 nan 5.0 nan 5.0 6.0 6.0
    last     (y, x) float64 7.0 4.0 4.0 nan 7.0 nan 5.0 6.0 6.0

and you can read individual variables using agg['first'] or similar. Note that I have manually added the myindex column to the DataFrame, and ds.first and ds.last always return floats.

Longer term ideas like where(max('value'), 'myindex') require some infrastructure changes because that needs two reductions to interact on a per-pixel basis which currently is not supported; all current reductions are independent.

Eventually that could lead to where(max_n('value', n=3), 'myindex'). We would first need max_n as a standalone reduction that needs to write to a 3D array of shape (ny, nx, n); this also needs infrastructure changes.

I am hoping that the example above is sufficient to start implementing support for this in holoviews. That should give me time to work on a refactor of the canvas/reduction code in datashader to make adding the new reductions much easier.

ianthomas23 · 2022-10-25T10:18:16Z

Possible API for where reduction:

where(selector: Reduction, lookup: str | None = None)

(although I have just made up the names selector and lookup and they can easily change).

If the user specifies a string name for lookup then it is the name of the column that must already be in the DataFrame and are the values returned to the user based on the selector. If lookup is None then Datashader uses the index of the row in the DataFrame instead.

jbednar · 2023-04-24T13:51:38Z

@hoxbro , @jlstevens , @ianthomas23, @mattpap , thanks for your recent work making this closer to reality! Can you please chime in here with the remaining tasks involved? What I am aware of:

Ian: first/last on Datashader using Dask (datashader#1182)
Mateusz: Small issue with Bokeh hover? (Need a new issue.)
Jean-Luc: custom hovertool support in HoloViews (Need an issue)
All: Work out good defaults for HoloViews that give hover information approximating what the pre-datashaded Bokeh plot would include.

jlstevens · 2023-04-24T13:54:58Z

I think that is a good summary of what is needed.

For the Bokeh hover tool, my understanding was that the necessary changes would be fairly straightforward to implement but that some API changes/additions are also needed. @mattpap can correct me if I am wrong!

ianthomas23 · 2023-04-24T14:31:53Z

Datashader: what you have at the moment is support for max, max_n, min and min_n reductions on CPU, GPU and dask, on their own and within a where reduction. Needed are:

first and last need dask and GPU support (this is the issue you were looking for: Dask and CUDA support for first and last reductions #1182).
first_n and last_n need dask and GPU support.
The above 4 reductions need to be supported within a where reduction.

In HoloViews I don't think there is built-in support for calling Bokeh's categorical colormapping or Datashader's where reduction yet, but this probably needs @hoxbro to confirm?

mattpap · 2023-04-24T15:06:49Z

For the Bokeh hover tool, my understanding was that the necessary changes would be fairly straightforward to implement but that some API changes/additions are also needed. @mattpap can correct me if I am wrong!

If this is what we discussed last week, then it requires some changes to make referencing custom formatter more robust (and hopefully deprecate HoverTool.formatters).

jbednar · 2023-04-25T00:29:17Z

Ok, please open the appropriate issues and then link back here! Thanks.

mattpap · 2023-04-25T10:25:42Z

I actually found a way to work around limitations related to referencing custom formatters. Consider this example (based on bokeh's examples/plotting/customjs_hover.py):

from bokeh.models import CustomJSHover, HoverTool
from bokeh.plotting import figure, show

# range bounds supplied in web mercator coordinates
p = figure(
    x_range=(-2000000, 6000000), y_range=(-1000000, 7000000),
    x_axis_type="mercator", y_axis_type="mercator",
)
p.add_tile("CartoDB Positron")

p.circle(x=[0, 2000000, 4000000], y=[4000000, 2000000, 0], size=30)

formatter = CustomJSHover(code="""
    const projections = Bokeh.require("core/util/projections")
    const {x, y} = special_vars
    const coords = projections.wgs84_mercator.invert(x, y)
    const dim = format == "x" ? 0 : 1
    return coords[dim].toFixed(2)
""")

p.add_tools(HoverTool(
    tooltips=[
        ("lon", "$x{x}"),
        ("lat", "$y{y}"),
    ],
    formatters={
        "$x": formatter,
        "$y": formatter,
    },
))

show(p)

Given that the contents of {} can be anything except empty and custom has no intrinsic meaning (in fact it's not referenced in the implementation at all). Thus you can use it to enumerate possible implementations of a custom formatter. This translates nicely to the example @hoxbro sent me. Note that I would consider this is a bit of an abuse of the API.

hoxbro · 2023-04-25T13:05:11Z

Thank you @mattpap. Got it to work with your example.

I assume you would still want to make custom formatters more robust?

philippjfr · 2024-09-26T16:42:20Z

@jbednar I'd say we close this. We have other issues to actually leverage the new aggregates for inspection purposes in the other repos and afaik where along with <agg>_n covers everything we need out of datashader.

jbednar added this to the wishlist milestone Sep 26, 2022

jbednar assigned ianthomas23 Sep 26, 2022

ianthomas23 modified the milestones: wishlist, v0.14.3 Oct 24, 2022

ianthomas23 modified the milestones: v0.14.3, v0.14.x Nov 17, 2022

ianthomas23 mentioned this issue Dec 16, 2022

Add new where reduction #1155

Merged

ianthomas23 modified the milestones: v0.14.x, v0.14.4 Jan 19, 2023

jbednar mentioned this issue Jan 20, 2023

index based selection fails on rasterized plots holoviz/holoviews#5596

Open

ianthomas23 mentioned this issue Feb 16, 2023

first_n, last_n, max_n and min_n reductions #1184

Merged

This was referenced May 2, 2023

Dask and CUDA support for first_n and last_n reductions #1207

Closed

Categorical support for where and <whatever>_n reductions #1210

Closed

ianthomas23 modified the milestones: v0.14.4, v0.14.5 May 16, 2023

ianthomas23 modified the milestones: v0.14.5, v0.15.x Jun 6, 2023

ianthomas23 removed their assignment Oct 24, 2023

jbednar added this to Datashader Inspections Sep 11, 2024

ambrannen moved this to Todo in Datashader Inspections Oct 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Directly support HoloViews-style inspect operations #1126

Directly support HoloViews-style inspect operations #1126

jbednar commented Sep 26, 2022

jlstevens commented Sep 28, 2022

jbednar commented Sep 28, 2022

ianthomas23 commented Oct 18, 2022

ianthomas23 commented Oct 25, 2022

jbednar commented Apr 24, 2023 •

edited

Loading

jlstevens commented Apr 24, 2023 •

edited

Loading

ianthomas23 commented Apr 24, 2023

mattpap commented Apr 24, 2023

jbednar commented Apr 25, 2023

mattpap commented Apr 25, 2023

hoxbro commented Apr 25, 2023

philippjfr commented Sep 26, 2024

Directly support HoloViews-style inspect operations #1126

Directly support HoloViews-style inspect operations #1126

Comments

jbednar commented Sep 26, 2022

jlstevens commented Sep 28, 2022

jbednar commented Sep 28, 2022

ianthomas23 commented Oct 18, 2022

ianthomas23 commented Oct 25, 2022

jbednar commented Apr 24, 2023 • edited Loading

jlstevens commented Apr 24, 2023 • edited Loading

ianthomas23 commented Apr 24, 2023

mattpap commented Apr 24, 2023

jbednar commented Apr 25, 2023

mattpap commented Apr 25, 2023

hoxbro commented Apr 25, 2023

philippjfr commented Sep 26, 2024

jbednar commented Apr 24, 2023 •

edited

Loading

jlstevens commented Apr 24, 2023 •

edited

Loading