-
-
Notifications
You must be signed in to change notification settings - Fork 850
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Threaded planet computation (and a few more speedups) #3847
Conversation
Great PR! Please pay attention to the following items before merging: Files matching
Files matching
This is an automatically generated QA checklist based on modified files. |
This comment was marked as resolved.
This comment was marked as resolved.
Test on small notebook: 30fps (V24.2) --> 45fps. Quite a gain! |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
Hmm... please run via CLI: |
Yes, right, I have just found that again this morning, and it works. |
Ideally, Qt should provide this via their CMake config. I'm not sure what I'm doing wrong. |
RaspberryOS 64bit (10k+ solar system bodies, Qt6-based builds): |
Do the markers enabled by the new checkbox represent all the objects added as you described here? I don't reproduce any serious reduction of performance with 0 extra threads (Core i5-8265U with built-in GPU), and for some reason I only get the green markers but not the white ones closer to the center. What do these white markers represent? |
The thin white ones are the overwhelming number of "ordinary" asteroids, to be expected mostly in the main belt between Mars and Jupiter, and the Jupiter Trojans. Pure white is for the few dwarf planets. SDOs and Sednoids are green. Someone might elaborate on the actual differences, and propose better colors. Did you load 15.000 objects or more to put a little stress to the SSO computation? (The logfile will likely show many complaints, but will give the number of loaded objects. Maybe your CPU can go very fast with 1 core, so the effect would cancel out? |
I followed your list of things to add, and now I get this in the log:
So apparently, I've missed something. How can I check what's missing? |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
My final solar system had the first 10.000, second 10.000, plus the "interesting" objects. in total about 27.000 objects. Indeed the loading was terribly slow, in this time the program seemed to be unresponsive, but it was finished an Much of the overall speed gain in this (renewed) branch was found from studying profiling diagrams. I am new to this field, but was able to track down and remove a few expensive operations. As a general pattern, if you need GETSTELMODULE in a frequently run method (update, draw, ...), make it static. If you often need to retrieve strings and there is a chance they have not changed, cache them if possible. These fixes helped a lot. Seeing profile runs with 0 and with 4 extra threads it seems indeed much of the gain from running extra threads is lost by the synchronizing waits which are only even shown when running those threads. If somebody has an idea what to do here, it would clearly help. Still, I consistently see improved framerate with just a few extra threads. However, now I think (also talking to colleagues) the frequent alloc/free in the many short-lived StelPainter instances may be what currently slows it down most when the object count is large. If we want a fluent SolarSystem with many objects, we should do something about the actual drawing, re-use StelPainter's internal buffers or whatever. The Planet::draw() may need to be rearranged. Maybe using a "modern" (float) Z buffer can be an option, this would also fix the orbit overdrawing. |
OK, I've redone the import of the 10000 objects, now it's saved and loads on subsequent runs too, yielding 16860 Solar System objects. My observations with different number of threads are as follows:
So, something seems to be not quite working as expected. This is on Ubuntu 20.04 amd64, 4-core*HT = 8 hyperthreads Intel Core i5-8265U 1.60GHz, location set to the Solar System Observer, markers enabled, atmosphere and landscape disabled, orbits constantly displayed as in the screenshot in the OP. |
Interesting. Now I've turned off everything: atmosphere, landscape, Milky Way, stars, planets, cardinal direction markers, and here's the result:
This looks suspicious. I have thousands of objects that are recomputed on every frame, and still I don't get even 300% CPU usage on a 8-hyperthread CPU. Something seems to not be parallelized correctly. Particularly, if all the objects are in the Solar System, I guess there should be some way to parallelize this much better. I'll try looking later into what's going on and maybe how to improve this. |
- also clean up docs - also improve setting thread number - also reactivate parallel sort
Use config.ini:[devel]/compute_positions_algorithm (default:2): 0=old single-threaded 3-loop solution 1=first QtConcurrent::blockingMap() 3-loop solution 2=strided manual multithreading 3-loop solution 3=blocked manual multithreading 1-loop solution Solution 3 would be fastest, but has problems for moons, where position depends on the parent planet's positions which may not have been computed yet.
ffe6252
to
9c8a757
Compare
No idea why, but this rebase -i was difficult. Commit "fix crash" should have gone just after @10110111 's last commit, but it wouldn't go. We can leave it as-is as document that there was a problem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
Maybe this shouldn't go into 24.3 so late with this potential for breakages. |
I fear this patch should be postponed to version 24.4 |
I had not remembered that Ruslan had integrated the final call into his loop. Fixed that in the conservative paths. Now I can run RemoteSync'ed V24.2 and this. 135 vs 73 fps. As far as I can see, result with setting "2" are identical. |
Hello @gzotti! Please check the fresh version (development snapshot) of Stellarium: |
Hello @gzotti! Please check the latest stable version of Stellarium: |
Description
This is a revamped version of #3794 which was too experimental and is hereby superseded. (The attempted trial of using std::transform_reduce() showed useless.)
All SolarSystem object (SSO) computations so far were run on the main thread. This branch allows splitting the SSO position loop onto several threads. Likewise, the far-to-near sorting can be split. This is useful in case of thousands of objects. Given the complexity of drawing commands, there is not much use for more than 4 additional worker threads even for 25.000 objects. The additional threads can be configured or even suppressed in GUI.
Further, using the Intel VTune profiler, I have identified a series of sometimes surprising inefficiencies addressed in the later commits of this PR.
With 27.701 SSO objects with some default setting (Preetham atmosphere, a few gridlines, stars, landscape), this improves framerate from about 6 (V24.2) to about 18.
A new feature that is possible by this is a display of symbolic markers for minor bodies. These are plotted regardless of accuracy of orbital elements, and just provide an impression of distribution of the minor bodies. Color coding is applied per pType. The colors are currently hardcoded. However, drawing those many objects, despite adding code similar to StarMgr for mass drawing, is slow. I hope another developer can optimize StelPainter so it could be re-used instead of frequent buffer alloc/delete.
Fixes #1427 (not intended, just happened...)
Screenshots (if appropriate):
Type of change
How Has This Been Tested?
Load many SSOs with the SSEditor. At least the first 10.000 asteroids and 1000comets.ini file. Then find new options in View/SSO and Config/Tools tabs.
Test Configuration:
Will test on smaller notebook, older Win10 PC and RasPi4.
Checklist: