Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a validate=False option for graph_objects and px figures #1812

Open
michaelbabyn opened this issue Oct 10, 2019 · 16 comments
Open

add a validate=False option for graph_objects and px figures #1812

michaelbabyn opened this issue Oct 10, 2019 · 16 comments
Labels
feature something new P3 backlog

Comments

@michaelbabyn
Copy link
Contributor

There's already an issue outlining the effects graph_object validation has on plot generation time. Users can bypass this performance hit by replacing the graph_objects with dict and then display the plot with plotly.offline.iplot(fig, validate=False) or if they are creating graphs in Dash, they can forgo the plotly.py library altogether and just use a dict in their Graph component's figure argument.

This solution can greatly improve the performance of Dash apps but it means that Dash users with expensive graphs have to choose between using px/plotly.py's update methods and optimally fast code.

I wonder if a way to turn off validation, especially in Dash apps, would help Dash users get the best of both worlds.

cc @matthewchan15

@emmanuelle
Copy link
Contributor

@emmanuelle
Copy link
Contributor

To be checked: can we do this and still keep the magical underscore methods?

Also possible: half-way point where we would disable the validation of only data arrays.

Note that the "import" time is a big part of the lag when developing

@parksj10
Copy link

Any update on this? certainly have my +1, using large data sets with datashader and it's taking seconds to validate. Likely will have to retrofit my code with the dict methods :(

@nicolaskruchten
Copy link
Contributor

@parksj10 can you confirm you’re seeing performance issues with a version of plotly of 4.7 or higher? We made a number of performance improvements in 4.7 so I just want to make sure :)

@parksj10
Copy link

@nicolaskruchten running plotly 4.8.1, I've attached a cProfile below, you can see that half the figure generation time is spent validating. In case you're interested, I've also attached the cProfile .dat file. Let me know if I can do anything else to help or provide other information. I think it would be rather difficult to create a low-complexity, working example from my app, but perhaps @michaelbabyn 's examples could be useful in this regard

Screen Shot 2020-06-25 at 7 41 04 PM

temp.dat.zip

@nicolaskruchten
Copy link
Contributor

Thanks! This is something we should fix, and we’d appreciate any help :)

@ndrezn
Copy link
Member

ndrezn commented Dec 5, 2022

I'm running into this, a few years later 🙂. This causes major issues when working with e.g. choropleth maps with large GeoJSON files, where you will end up with giant JSON blobs that certainly do not need to be validated.

I imagine this is a pretty common issue for folks working with charts with many points, and I had no idea this was even a thing until today. It'd be great at least to document this behaviour or make people more aware of it until it's possible to disable validation. Maybe even on https://plotly.com/python/webgl-vs-svg/?

@alexcjohnson
Copy link
Collaborator

I like the idea of a three-level approach: full validation (current behavior), top-level validation (don’t dig into data arrays or nested objects like GeoJSON), and no validation.

@ndrezn
Copy link
Member

ndrezn commented Dec 6, 2022

(want to note as well that I'm seeing ~1second validation time/mb of object. With GeoJSONs, we often see blobs in the size of 60mb+, which just destroys your app performance.)

Having the top-level validation option seems perfect!

@nicolaskruchten
Copy link
Contributor

So independently of the validation issue, if the GeoJSONs are static, you should always load them from assets in a Dash app, for caching purposes. Basically just pass in the URL rather than the GeoJSON blob.

@nicolaskruchten
Copy link
Contributor

Having the top-level validation option seems perfect!

Yes, of course, although the last time we tried, we were unable to make it work :)

@ndrezn
Copy link
Member

ndrezn commented Dec 6, 2022

@nicolaskruchten -- yes, I'm able to mostly get around this issue by using OperatorTransform from Dash Extensions and combining that with using objects to define my Dash apps. Adding to assets/ would make it even better though... great idea.

My main concern here is that this isn't intuitive, and it's also not intuitive that you can boost performance of figures in Dash apps with a large number of points just by switching how they are defined (which is why it'd be great to at least see this behaviour documented).

@ndrezn
Copy link
Member

ndrezn commented Dec 9, 2022

(cc @red-patience / @LiamConnors on that last point maybe)

@hannahker
Copy link

Throwing my support behind this one! Even if it takes some time to add in a validate=False param, in the meantime it would be really helpful to have documentation to alert people that this might be a bottleneck in chart performance and that you can work around it with creating the dict directly.

Both this trick and passing data as a static asset url have massively improved the performance of my graph and I wouldn't have known to do either of these things if I hadn't been pointed towards this issue.

cc @red-patience

@bmaranville
Copy link
Contributor

bmaranville commented Mar 13, 2023

I think I have a related issue affecting subplots.make_subplots, where the time to execute increases non-linearly with the number of plots. For a 20x20 grid of plots it is taking 14 seconds, for a 21x21 grid it takes 18 seconds, for example. This is for an empty figure, which is created with make_subplots e.g.

from plotly.subplots import make_subplots

%time fig = make_subplots(rows=20, cols=20)

From profiling, it is spending the vast majority of its time in the _ret function of basedatatypes.py, and all of the time in that function is spent in find_closest_string, which I think is because it is pre-calculating an error message for a missing key - which is related to the validation. There would be a > 90% speedup if validation could be disabled, from what I can see in the profiling.

EDIT: I think I will make a new issue for this: see #4100

@nicolaskruchten
Copy link
Contributor

Thanks for that profiling! We could probably speed things up by only computing error strings when we know there's an error...

@gvwilson gvwilson self-assigned this May 23, 2024
@gvwilson gvwilson removed their assignment Aug 2, 2024
@gvwilson gvwilson added feature something new P3 backlog and removed enhancement labels Aug 12, 2024
@gvwilson gvwilson changed the title A validate=False an option for graph_objects and px Figures? add a validate=False option for graph_objects and px figures Aug 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature something new P3 backlog
Projects
None yet
Development

No branches or pull requests

9 participants