-
-
Notifications
You must be signed in to change notification settings - Fork 366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Directly support HoloViews-style inspect operations #1126
Comments
After some discussion, I think we agreed that a The only other point that I think is important is that you need to know the count because that can give you the context to know whether the 'first' or 'last' index is unique or a random sample that contributed to the pixel value (ignoring the possibility that it is meaningful due to sorting e.g. by time) |
@philippjfr suggests implementing a I believe this syntax could support an |
Some of this can already be done in datashader, e.g. import datashader as ds
import pandas as pd
df = pd.DataFrame(dict(x=[0, 1, 0, 1, 0], y=[0, 0, 1, 1, 0], myindex=[4, 5, 6, 7, 8]))
canvas = ds.Canvas(3, 3)
agg = canvas.line(
df, "x", "y",
agg=ds.summary(count=ds.count(), first=ds.first("myindex"), last=ds.last("myindex")),
) which produces
and you can read individual variables using Longer term ideas like Eventually that could lead to I am hoping that the example above is sufficient to start implementing support for this in holoviews. That should give me time to work on a refactor of the canvas/reduction code in datashader to make adding the new reductions much easier. |
Possible API for where(selector: Reduction, lookup: str | None = None) (although I have just made up the names If the user specifies a string name for lookup then it is the name of the column that must already be in the DataFrame and are the values returned to the user based on the selector. If lookup is |
@hoxbro , @jlstevens , @ianthomas23, @mattpap , thanks for your recent work making this closer to reality! Can you please chime in here with the remaining tasks involved? What I am aware of:
|
I think that is a good summary of what is needed. For the Bokeh hover tool, my understanding was that the necessary changes would be fairly straightforward to implement but that some API changes/additions are also needed. @mattpap can correct me if I am wrong! |
Datashader: what you have at the moment is support for
In HoloViews I don't think there is built-in support for calling Bokeh's categorical colormapping or Datashader's |
If this is what we discussed last week, then it requires some changes to make referencing custom formatter more robust (and hopefully deprecate |
Ok, please open the appropriate issues and then link back here! Thanks. |
I actually found a way to work around limitations related to referencing custom formatters. Consider this example (based on bokeh's from bokeh.models import CustomJSHover, HoverTool
from bokeh.plotting import figure, show
# range bounds supplied in web mercator coordinates
p = figure(
x_range=(-2000000, 6000000), y_range=(-1000000, 7000000),
x_axis_type="mercator", y_axis_type="mercator",
)
p.add_tile("CartoDB Positron")
p.circle(x=[0, 2000000, 4000000], y=[4000000, 2000000, 0], size=30)
formatter = CustomJSHover(code="""
const projections = Bokeh.require("core/util/projections")
const {x, y} = special_vars
const coords = projections.wgs84_mercator.invert(x, y)
const dim = format == "x" ? 0 : 1
return coords[dim].toFixed(2)
""")
p.add_tools(HoverTool(
tooltips=[
("lon", "$x{x}"),
("lat", "$y{y}"),
],
formatters={
"$x": formatter,
"$y": formatter,
},
))
show(p) Given that the contents of |
Thank you @mattpap. Got it to work with your example. I assume you would still want to make custom formatters more robust? |
@jbednar I'd say we close this. We have other issues to actually leverage the new aggregates for inspection purposes in the other repos and afaik |
When Datashader renders a large dataset, a human being is usually able to see patterns and interesting datapoints that merit further investigation. Unfortunately, the rendered image does not provide any easy means of doing so, as the original datapoints have all been reduced to pixels (or more accurately, to scalar accumulated values in bins of a 2d histogram). To support investigation of interesting features, HoloViews implements a series of "inspect" operations that query the original dataset after a selection or hover event on the rasterized data. E.g.
inspect_points
in https://examples.pyviz.org/ship_traffic will query the original dataset to show hover and other information about the original datapoints being visualized. However, going back to the original dataset is quite slow, because it requires traversing either the entire dataset or (for a spatially indexed data structure) at least a chunk of the dataset, which makes the interface unpleasant and awkward and thus eliminating certain types of interactivity.Datashader can collect multiple aggregations on a single pass through the data, so I suggest that we support an accumulation mode that gathers datapoint indexes rather than datapoints, so that hover and drilldown information can be supported instantaneously. Of course, arbitrarily many datapoints can be aggregated into a single pixel, while any practical aggregation can only accumulate a fixed number of indexes per pixel. Still, that's already how the
inspect_
operations work; they discard all but a configurable number of results, which is fine for linking to one or two examples per pixel, and allows single-datapoint precision with enough zooming in. By default I'd suggest accumulating the index of the minimum and the maximum value per pixel, but even just keeping the first or last datapoint for that pixel would be useful.If we keep at least three datapoints per pixel (e.g. min, max, and one other) we'd be able to distinguish between complete and incomplete inspection data for that pixel (i.e. are these the only points? Yes, if there are 2 or fewer; unclear otherwise). Seems to me that we should be able to have a fully responsive, fully inspectable rendering of a dataset at low computational and memory cost using this method.
The text was updated successfully, but these errors were encountered: