Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add loop line merger #1083

Open
wants to merge 45 commits into
base: main
Choose a base branch
from
Open

Add loop line merger #1083

wants to merge 45 commits into from

Conversation

wipfli
Copy link
Contributor

@wipfli wipfli commented Nov 2, 2024

This pull request adds a LoopLineMerger class which is a replacement for JTS LineMerger . The new class takes loops in the LineString graph into account. Loops that are shorter than a minLoopLength will be cut open and only the shortest edge will be kept.
Here is a bit of motivation: https://oliverwipfli.ch/improving-linestring-merging-in-planetiler-2024-10-30/
Questions from my side are:

  • How should we expose the new parameter minLoopLength? At the moment I just set it to lengthLimit from FeatureMerge.mergeLineStrings. Should we maybe add an optional loopLengthLimitCalculator?
  • LoopLineMerger is slower than JTS LineMerger. I guess it is because they are doing less (no loops taken into account) but also they are doing things more efficiently. Should I investigate in subclassing JTS LineMergeGraph instead of building my own data structure plus helper functions for the graph?

Copy link

github-actions bot commented Nov 2, 2024

This Branch 0c2f3c7 Base b59c63e
0:01:06 DEB [archive] - Tile stats:
0:01:06 DEB [archive] - Biggest tiles (gzipped)
1. 14/4942/6092 (158k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.40015 (poi:86k)
2. 9/154/190 (144k) https://onthegomap.github.io/planetiler-demo/#9.5/41.77078/-71.36719 (landcover:85k)
3. 10/308/380 (136k) https://onthegomap.github.io/planetiler-demo/#10.5/41.90214/-71.54297 (landcover:66k)
4. 10/308/381 (136k) https://onthegomap.github.io/planetiler-demo/#10.5/41.63994/-71.54297 (landcover:72k)
5. 14/4941/6092 (114k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.42212 (poi:66k)
6. 14/4941/6093 (113k) https://onthegomap.github.io/planetiler-demo/#14.5/41.81227/-71.42212 (building:62k)
7. 14/4940/6092 (100k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.44409 (building:92k)
8. 11/616/762 (99k) https://onthegomap.github.io/planetiler-demo/#11.5/41.7057/-71.63086 (landcover:71k)
9. 14/4942/6091 (97k) https://onthegomap.github.io/planetiler-demo/#14.5/41.84501/-71.40015 (building:79k)
10. 11/616/761 (95k) https://onthegomap.github.io/planetiler-demo/#11.5/41.83679/-71.63086 (landcover:72k)
0:01:06 DEB [archive] - Max tile sizes
                      z0    z1    z2    z3    z4    z5    z6    z7    z8    z9   z10   z11   z12   z13   z14   all
           boundary  151   336   412   548   872   366   487   619   899  1.6k    2k  7.2k  6.4k  5.8k  4.5k  7.2k
              water 7.7k  3.7k  8.6k  5.5k  2.6k  5.1k   15k   18k   16k   26k   15k   13k   17k   15k   12k   26k
              place    0     0   441   441   441   640   714    1k  1.6k  3.1k  5.7k  3.3k  1.7k   803   948  5.7k
            landuse    0     0     0     0   549   695  1.6k  6.8k   17k   44k   59k   50k   38k   19k   12k   59k
     transportation    0     0     0     0   311   781  1.2k    4k  5.7k   17k   13k   17k   62k   47k   33k   62k
           waterway    0     0     0     0   112   119     0     0     0  3.1k  2.3k    2k  2.1k  4.9k  2.4k  4.9k
               park    0     0     0     0     0     0  1.2k  4.2k  9.7k   18k   13k  8.2k  3.7k  3.4k  4.4k   18k
transportation_name    0     0     0     0     0     0   287   364  1.1k  1.8k  5.5k  4.7k  3.9k  3.6k   18k   18k
          landcover    0     0     0     0     0     0     0  9.6k   29k   85k   72k   81k   53k   30k   25k   85k
      mountain_peak    0     0     0     0     0     0     0  1.1k  1.8k  3.4k  4.3k  2.8k  1.4k  1.4k   869  4.3k
         water_name    0     0     0     0     0     0     0     0     0   486   461   433   452  1.2k  1.5k  1.5k
    aerodrome_label    0     0     0     0     0     0     0     0     0     0   666   328   273   221   221   666
            aeroway    0     0     0     0     0     0     0     0     0     0  1.6k  2.1k    3k  3.4k  2.8k  3.4k
                poi    0     0     0     0     0     0     0     0     0     0     0     0   506   503   86k   86k
           building    0     0     0     0     0     0     0     0     0     0     0     0     0   59k   92k   92k
        housenumber    0     0     0     0     0     0     0     0     0     0     0     0     0     0   35k   35k
          full tile 7.9k    4k  9.5k  6.5k  3.7k    6k   20k   40k   82k  195k  182k  135k  113k  128k  249k  249k
            gzipped 6.2k  3.5k  7.1k  5.2k  3.1k  4.8k   14k   28k   59k  144k  136k   99k   83k   92k  158k  158k
0:01:06 DEB [archive] -    Max tile: 249k (gzipped: 158k)
0:01:06 DEB [archive] -    Avg tile: 5.4k (gzipped: 4k) using weighted average based on OSM traffic
0:01:06 DEB [archive] -     # tiles: 4,115,029
0:01:06 DEB [archive] -  # features: 5,517,101
0:01:06 INF [archive] - Finished in 20s cpu:1m12s avg:3.6
0:01:06 INF [archive] -   read    1x(3% 0.5s wait:18s done:1s)
0:01:06 INF [archive] -   encode  4x(55% 11s wait:2s done:1s)
0:01:06 INF [archive] -   write   1x(22% 4s wait:13s)
0:01:06 INF [archive] - Finished in 1m6s cpu:3m35s gc:1s avg:3.3
0:01:06 INF [archive] - FINISHED!
0:01:06 INF [archive] - 
0:01:06 INF [archive] - ----------------------------------------
0:01:06 INF [archive] - data errors:
0:01:06 INF [archive] - 	render_snap_fix_input	16,669
0:01:06 INF [archive] - 	osm_multipolygon_missing_way	360
0:01:06 INF [archive] - 	osm_boundary_missing_way	73
0:01:06 INF [archive] - 	merge_snap_fix_input	12
0:01:06 INF [archive] - 	feature_centroid_if_convex_osm_invalid_multipolygon_empty_after_fix	2
0:01:06 INF [archive] - 	render_snap_fix_input2	1
0:01:06 INF [archive] - 	omt_fix_water_before_ne_intersect	1
0:01:06 INF [archive] - 	feature_polygon_osm_invalid_multipolygon_empty_after_fix	1
0:01:06 INF [archive] - 	feature_point_on_surface_osm_invalid_multipolygon_empty_after_fix	1
0:01:06 INF [archive] - ----------------------------------------
0:01:06 INF [archive] - 	overall          1m6s cpu:3m35s gc:1s avg:3.3
0:01:06 INF [archive] - 	lake_centerlines 3s cpu:6s avg:2
0:01:06 INF [archive] - 	  read     1x(16% 0.5s done:2s)
0:01:06 INF [archive] - 	  process  4x(0% 0s done:2s)
0:01:06 INF [archive] - 	  write    1x(0% 0s done:2s)
0:01:06 INF [archive] - 	water_polygons   15s cpu:42s avg:2.8
0:01:06 INF [archive] - 	  read     1x(41% 6s done:7s)
0:01:06 INF [archive] - 	  process  4x(27% 4s wait:4s done:5s)
0:01:06 INF [archive] - 	  write    1x(4% 0.5s wait:10s done:5s)
0:01:06 INF [archive] - 	natural_earth    6s cpu:12s avg:2
0:01:06 INF [archive] - 	  read     1x(95% 6s)
0:01:06 INF [archive] - 	  process  4x(13% 0.8s wait:6s)
0:01:06 INF [archive] - 	  write    1x(0% 0s wait:6s)
0:01:06 INF [archive] - 	osm_pass1        2s cpu:6s avg:3.1
0:01:06 INF [archive] - 	  read     1x(2% 0s wait:2s)
0:01:06 INF [archive] - 	  parse    4x(31% 0.6s)
0:01:06 INF [archive] - 	  process  1x(73% 1s)
0:01:06 INF [archive] - 	osm_pass2        18s cpu:1m11s avg:3.9
0:01:06 INF [archive] - 	  read     1x(0% 0s wait:10s done:8s)
0:01:06 INF [archive] - 	  process  4x(75% 14s)
0:01:06 INF [archive] - 	  write    1x(2% 0.4s wait:18s)
0:01:06 INF [archive] - 	ne_lakes         0s cpu:0s avg:9.6
0:01:06 INF [archive] - 	boundaries       0s cpu:0s avg:1.2
0:01:06 INF [archive] - 	agg_stop         0s cpu:0s avg:21.4
0:01:06 INF [archive] - 	sort             1s cpu:3s avg:2.6
0:01:06 INF [archive] - 	  worker  1x(52% 0.7s)
0:01:06 INF [archive] - 	archive          20s cpu:1m12s avg:3.6
0:01:06 INF [archive] - 	  read    1x(3% 0.5s wait:18s done:1s)
0:01:06 INF [archive] - 	  encode  4x(55% 11s wait:2s done:1s)
0:01:06 INF [archive] - 	  write   1x(22% 4s wait:13s)
0:01:06 INF [archive] - ----------------------------------------
0:01:06 INF [archive] - 	archive	108MB
0:01:06 INF [archive] - 	features	284MB
-rw-r--r-- 1 runner docker 86M Nov 18 12:08 run.jar
0:01:05 DEB [archive] - Tile stats:
0:01:05 DEB [archive] - Biggest tiles (gzipped)
1. 14/4942/6092 (160k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.40015 (poi:86k)
2. 9/154/190 (149k) https://onthegomap.github.io/planetiler-demo/#9.5/41.77078/-71.36719 (landcover:85k)
3. 10/308/380 (138k) https://onthegomap.github.io/planetiler-demo/#10.5/41.90214/-71.54297 (landcover:66k)
4. 10/308/381 (137k) https://onthegomap.github.io/planetiler-demo/#10.5/41.63994/-71.54297 (landcover:72k)
5. 14/4941/6092 (114k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.42212 (poi:66k)
6. 14/4941/6093 (113k) https://onthegomap.github.io/planetiler-demo/#14.5/41.81227/-71.42212 (building:62k)
7. 14/4940/6092 (100k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.44409 (building:92k)
8. 11/616/762 (99k) https://onthegomap.github.io/planetiler-demo/#11.5/41.7057/-71.63086 (landcover:71k)
9. 14/4942/6091 (97k) https://onthegomap.github.io/planetiler-demo/#14.5/41.84501/-71.40015 (building:79k)
10. 11/616/761 (95k) https://onthegomap.github.io/planetiler-demo/#11.5/41.83679/-71.63086 (landcover:72k)
0:01:05 DEB [archive] - Max tile sizes
                      z0    z1    z2    z3    z4    z5    z6    z7    z8    z9   z10   z11   z12   z13   z14   all
           boundary  155   375   444   584   939   371   467   587   823  1.7k  2.1k  7.2k  6.4k  5.8k  4.5k  7.2k
              water 7.7k  3.7k  8.6k  5.5k  2.6k  5.1k   15k   18k   16k   26k   15k   13k   17k   15k   12k   26k
              place    0     0   441   441   441   640   714    1k  1.6k  3.1k  5.7k  3.3k  1.7k   803   948  5.7k
            landuse    0     0     0     0   549   695  1.6k  6.8k   17k   44k   59k   50k   38k   19k   12k   59k
     transportation    0     0     0     0   370   905  1.3k    6k  8.1k   24k   17k   19k   65k   48k   36k   65k
           waterway    0     0     0     0   112   119     0     0     0  3.2k  2.3k  2.1k  2.1k  4.9k  2.4k  4.9k
               park    0     0     0     0     0     0  1.2k  4.2k  9.7k   18k   13k  8.2k  3.7k  3.4k  4.4k   18k
transportation_name    0     0     0     0     0     0   369   464  1.2k  1.8k  5.5k  4.7k  3.9k  3.4k   18k   18k
          landcover    0     0     0     0     0     0     0  9.6k   29k   85k   72k   81k   53k   30k   25k   85k
      mountain_peak    0     0     0     0     0     0     0  1.1k  1.8k  3.4k  4.3k  2.8k  1.4k  1.4k   869  4.3k
         water_name    0     0     0     0     0     0     0     0     0   486   461   433   452  1.2k  1.5k  1.5k
    aerodrome_label    0     0     0     0     0     0     0     0     0     0   666   328   273   221   221   666
            aeroway    0     0     0     0     0     0     0     0     0     0  1.6k  2.1k    3k  3.4k  2.8k  3.4k
                poi    0     0     0     0     0     0     0     0     0     0     0     0   506   503   86k   86k
           building    0     0     0     0     0     0     0     0     0     0     0     0     0   59k   92k   92k
        housenumber    0     0     0     0     0     0     0     0     0     0     0     0     0     0   35k   35k
          full tile 7.9k    4k  9.5k  6.5k  3.8k  6.2k   20k   42k   85k  203k  185k  135k  114k  129k  252k  252k
            gzipped 6.2k  3.6k  7.1k  5.2k  3.1k    5k   14k   30k   60k  149k  138k   99k   83k   92k  160k  160k
0:01:05 DEB [archive] -    Max tile: 252k (gzipped: 160k)
0:01:05 DEB [archive] -    Avg tile: 5.5k (gzipped: 4.1k) using weighted average based on OSM traffic
0:01:05 DEB [archive] -     # tiles: 4,115,029
0:01:05 DEB [archive] -  # features: 5,517,101
0:01:05 INF [archive] - Finished in 19s cpu:1m10s avg:3.7
0:01:05 INF [archive] -   read    1x(3% 0.5s wait:17s done:1s)
0:01:05 INF [archive] -   encode  4x(56% 11s wait:2s done:1s)
0:01:05 INF [archive] -   write   1x(22% 4s wait:12s done:1s)
0:01:05 INF [archive] - Finished in 1m5s cpu:3m35s gc:1s avg:3.3
0:01:05 INF [archive] - FINISHED!
0:01:05 INF [archive] - 
0:01:05 INF [archive] - ----------------------------------------
0:01:05 INF [archive] - data errors:
0:01:05 INF [archive] - 	render_snap_fix_input	16,669
0:01:05 INF [archive] - 	osm_multipolygon_missing_way	360
0:01:05 INF [archive] - 	osm_boundary_missing_way	73
0:01:05 INF [archive] - 	merge_snap_fix_input	12
0:01:05 INF [archive] - 	feature_centroid_if_convex_osm_invalid_multipolygon_empty_after_fix	2
0:01:05 INF [archive] - 	render_snap_fix_input2	1
0:01:05 INF [archive] - 	omt_fix_water_before_ne_intersect	1
0:01:05 INF [archive] - 	feature_polygon_osm_invalid_multipolygon_empty_after_fix	1
0:01:05 INF [archive] - 	feature_point_on_surface_osm_invalid_multipolygon_empty_after_fix	1
0:01:05 INF [archive] - ----------------------------------------
0:01:05 INF [archive] - 	overall          1m5s cpu:3m35s gc:1s avg:3.3
0:01:05 INF [archive] - 	lake_centerlines 2s cpu:5s avg:2.2
0:01:05 INF [archive] - 	  read     1x(21% 0.5s done:2s)
0:01:05 INF [archive] - 	  process  4x(0% 0s done:2s)
0:01:05 INF [archive] - 	  write    1x(0% 0s done:2s)
0:01:05 INF [archive] - 	water_polygons   15s cpu:42s avg:2.8
0:01:05 INF [archive] - 	  read     1x(41% 6s done:7s)
0:01:05 INF [archive] - 	  process  4x(27% 4s wait:4s done:5s)
0:01:05 INF [archive] - 	  write    1x(4% 0.6s wait:9s done:5s)
0:01:05 INF [archive] - 	natural_earth    7s cpu:13s avg:2
0:01:05 INF [archive] - 	  read     1x(95% 6s)
0:01:05 INF [archive] - 	  process  4x(13% 0.8s wait:6s)
0:01:05 INF [archive] - 	  write    1x(0% 0s wait:6s)
0:01:05 INF [archive] - 	osm_pass1        2s cpu:6s avg:3.2
0:01:05 INF [archive] - 	  read     1x(2% 0s wait:2s)
0:01:05 INF [archive] - 	  parse    4x(35% 0.7s)
0:01:05 INF [archive] - 	  process  1x(67% 1s)
0:01:05 INF [archive] - 	osm_pass2        19s cpu:1m14s avg:4
0:01:05 INF [archive] - 	  read     1x(0% 0s wait:11s done:8s)
0:01:05 INF [archive] - 	  process  4x(75% 14s)
0:01:05 INF [archive] - 	  write    1x(2% 0.4s wait:18s)
0:01:05 INF [archive] - 	ne_lakes         0s cpu:0s avg:0
0:01:05 INF [archive] - 	boundaries       0s cpu:0s avg:2.8
0:01:05 INF [archive] - 	agg_stop         0s cpu:0s avg:0
0:01:05 INF [archive] - 	sort             1s cpu:3s avg:2.6
0:01:05 INF [archive] - 	  worker  1x(52% 0.7s)
0:01:05 INF [archive] - 	archive          19s cpu:1m10s avg:3.7
0:01:05 INF [archive] - 	  read    1x(3% 0.5s wait:17s done:1s)
0:01:05 INF [archive] - 	  encode  4x(56% 11s wait:2s done:1s)
0:01:05 INF [archive] - 	  write   1x(22% 4s wait:12s done:1s)
0:01:05 INF [archive] - ----------------------------------------
0:01:05 INF [archive] - 	archive	108MB
0:01:05 INF [archive] - 	features	284MB
-rw-r--r-- 1 runner docker 86M Nov 18 12:09 run.jar

Full logs: https://github.com/onthegomap/planetiler/actions/runs/11892070532

@wipfli
Copy link
Contributor Author

wipfli commented Nov 2, 2024

I added some tests for the public functions. I don't really know how private functions are tested in Java. Should I for example write specific tests for the private function findAllPaths?

@msbarry
Copy link
Contributor

msbarry commented Nov 2, 2024

It really only needs tests for the public api. If there's an internal method you wanted to make sure works the way you expected you can make it package protected (no modifier) so tests can use it since they're in the same package.

@msbarry
Copy link
Contributor

msbarry commented Nov 2, 2024

How should we expose the new parameter minLoopLength? At the moment I just set it to lengthLimit from FeatureMerge.mergeLineStrings. Should we maybe add an optional loopLengthLimitCalculator?

I think something like this would make sense for an API to control those parameters:

var merger = new LoopLineMerger()
  .setMinLength(0.1)
  .setLoopMinLength(0.1)
  .setPrecision(new PrecisionModel(16)); // to control rounding
merger.add(lineStrings);
var merged = merger.getMergedLineStrings();

@msbarry
Copy link
Contributor

msbarry commented Nov 2, 2024

LoopLineMerger is slower than JTS LineMerger. I guess it is because they are doing less (no loops taken into account) but also they are doing things more efficiently.

I can help look into the performance, I think there will be some low hanging fruit to make it fast. JTS line merge also only considers full linestrings, it doesn't break them up into segments so this one will always be a little slower. The first thing that jumps out at me is creating full LineStrings and reversing/comparing them while building up the lists. I think we might want to have the intermediate code work with lists of coordinates that can be concatenated efficiently, then construct the LineStrings at the very end.

@wipfli
Copy link
Contributor Author

wipfli commented Nov 2, 2024

Great, thanks for the feedback! I added some more tests and marked the methods that I want to test separately as protected.

@msbarry
Copy link
Contributor

msbarry commented Nov 3, 2024

I added a BenchmarkLineMerge class we can use to compare performance of JTS line merge vs. this new utility.

@wipfli
Copy link
Contributor Author

wipfli commented Nov 3, 2024

Thanks for adding the benchmarks. I changed the api just now. Let me have a look at the benchmark...

@msbarry
Copy link
Contributor

msbarry commented Nov 3, 2024

Pushed a few updates to the benchmark, basically it creates between 10 and 1000 line segments in the range from (0,0) to (100, 100) with between 10 and 1000 intermediate points then uses both line merge. I disabled the 1000 lines with 1000 points because it currently does not finish with loop line merger 😬

Let me know if you can think of more realistic scenarios?

@wipfli
Copy link
Contributor Author

wipfli commented Nov 3, 2024

I had a look at what is given to line merging if one wants to render highway=primary at z6 for Switzerland. The result is that you get many short linestrings:

groupedFeatures.size  50043
0 <= points per linestring < 5 is 52.04124452970446 percent
5 <= points per linestring < 10 is 26.543172871330654 percent
10 <= points per linestring < 15 is 8.292868133405271 percent
15 <= points per linestring < 20 is 4.248346422077014 percent
20 <= points per linestring < 25 is 3.6428671342645327 percent
25 <= points per linestring < 30 is 1.6805547229382731 percent
30 <= points per linestring < 35 is 0.8912335391563256 percent
35 <= points per linestring < 40 is 0.6374517914593449 percent
40 <= points per linestring < 45 is 0.44961333253402075 percent
45 <= points per linestring < 50 is 0.30174050316727613 percent
50 <= points per linestring < 55 is 0.24978518474112263 percent
55 <= points per linestring < 60 is 0.14587454788881563 percent
60 <= points per linestring < 65 is 0.10590891833023598 percent
65 <= points per linestring < 70 is 0.08992266650680415 percent
70 <= points per linestring < 75 is 0.10191235537437802 percent
75 <= points per linestring < 80 is 0.07193813320544332 percent
80 <= points per linestring < 85 is 0.06394500729372739 percent
85 <= points per linestring < 90 is 0.055951881382011466 percent
90 <= points per linestring < 95 is 0.03397078512479268 percent

@wipfli
Copy link
Contributor Author

wipfli commented Nov 3, 2024

I guess these statistics reflect that in OpenStreetMap a typical Way has only a handful of Nodes.

Regarding the length of the linestrings, actually almost half of them are too short to occupy distinct start and end points on the 1/16 grid:

0.0 <= linestring length < 0.03125 is 45.149171712327394 percent
0.03125 <= linestring length < 0.0625 is 21.459544791479328 percent
0.0625 <= linestring length < 0.09375 is 9.12015666526787 percent
0.09375 <= linestring length < 0.125 is 5.285454509122155 percent
0.125 <= linestring length < 0.15625 is 3.6388705713086744 percent
0.15625 <= linestring length < 0.1875 is 2.5937693583518175 percent
0.1875 <= linestring length < 0.21875 is 2.036248826009632 percent
0.21875 <= linestring length < 0.25 is 1.5606578342625343 percent
0.25 <= linestring length < 0.28125 is 1.3768159382930678 percent
0.28125 <= linestring length < 0.3125 is 1.0490977759127151 percent

The bins here are 0.5 * 1/16 wide. That means that the linestrings in the first bracket will snap to the same point on the grid for start and end node.

@msbarry
Copy link
Contributor

msbarry commented Nov 3, 2024

Cool, I think we probably want to benchmark a few "average" cases (in terms of num lines/num points) and also a worst-case to make sure it doesn't hang. Feel free to update the benchmark test data generation to be more realistic!

@wipfli
Copy link
Contributor Author

wipfli commented Nov 3, 2024

That is the Swiss highway=primary network. It has 50k+ input linestrings.
image
Should I just feed those to the benchmarks?

@wipfli
Copy link
Contributor Author

wipfli commented Nov 3, 2024

Thanks for adding the benchmarks @msbarry! The numbers on my machine are currently:
image

@msbarry
Copy link
Contributor

msbarry commented Nov 3, 2024

Some stats when I run over the planet using openmaptiles profile:

  • num lines to merge: average=3.6 max=239777
  • points per line: average=6.8 max=1759
  • num points in all lines to merge: average=24.6 max=480642

so for worst case we might want a case that's on the order of 500k line segments each with 2 points, and a case that's 200 lines with 5000 points each. For average case, maybe 50x2 10x10 and 2x50?

@wipfli
Copy link
Contributor Author

wipfli commented Nov 3, 2024

These number sound perfect. Thanks for investigating

@msbarry
Copy link
Contributor

msbarry commented Nov 4, 2024

Oh actually I'll try to grab those actual largest input geometries so we can just test against the real thing.

@wipfli
Copy link
Contributor Author

wipfli commented Nov 9, 2024

Thanks for implementing A*. I looks like it works on my test road networks. Here are the benchmark results from my laptop. Nice that the loop(20) finishes now!

image

The code looks good to me overall. What I am not quite sure is the heuristics part. We are sure that this will always find the shortest path, right?

@wipfli
Copy link
Contributor Author

wipfli commented Nov 9, 2024

How should we handle these sort of junctions:

Screenshot_20241109_233959

This is after loop removal but before min length line dropping. After min length the connecting segment is gone:

Screenshot_20241109_233951

When we do simplification, the separation between the two connected lines will increase further.

It feels like when we have situations like "short edge" connects to 2 "long edges" on both ends we might want to not drop the short edge. Or actually we can remove the short edge, but keep the connecting lines unmerged such that we have endpoints which will keep the lines there after douglas peuker. What do you think?

@wipfli
Copy link
Contributor Author

wipfli commented Nov 9, 2024

By the way another nice test case is I95 south of Washington. It has 3 carriage ways that sometimes are quite far separated... https://www.openstreetmap.org/#map=15/38.43446/-77.40842

@msbarry
Copy link
Contributor

msbarry commented Nov 10, 2024

The code looks good to me overall. What I am not quite sure is the heuristics part. We are sure that this will always find the shortest path, right?

A* uses the heuristic to do a "best-first" search from A to B instead of breadth-first. The "best" path is determined using the current path length + the heuristic (straight line distance from the end of the path to B). As long as the heuristic is the minimum possible cost from the end node to B, it's still guaranteed to produce the shortest path. There might be a better name to give it in the code though...

How should we handle these sort of junctions:

I think that's very similar to the same case that I'm concerned about trying to make transportation_name segments as long as possible (and the failing test I added testMergeCarriagewaysWithOneSplitLongerThanLoopMinLength) before min-length dropping.

Instead of splitting segments at 3+ way intersections, we need to pick 2 segments to connect to make a longer one. There might be some heuristic we could use when looking at each intersection, but another idea I was thinking about was to find the "longest shortest path between 2 points" through the graph (I think this is called the "diameter" of the graph), emit that as a linestring, removing it from each of the nodes, then repeat until the remaining lines are shorter than min length. I'm not sure if there's an efficient way to do that though...

@wipfli
Copy link
Contributor Author

wipfli commented Nov 10, 2024

Currently we do this:

  • break loops
  • merge
  • remove all short stubs
  • merge
  • remove all short edges
  • merge

With this approach you are right that if minLength > loopMinLength, some loops will be dropped in the "remove all short edges" step. I see this as expected behavior and would recommend to set minLength < loopMinLength.

Regarding your proposal to decide at 3-way junctions which two edges to merge I think that is a great idea overall because it leads to longer connected linestrings. But what I can see as a problem is that line simplification will lead to a visual gap. Something like this:

image

Maybe a simple solution to this would be to flip the order, i.e. simply -> merge 3-way junctions for longest possible strings. What do you think?

@wipfli
Copy link
Contributor Author

wipfli commented Nov 10, 2024

Here is a nice crazy 3-junction group of highway=primary roughly at https://www.openstreetmap.org/#map=12/47.5090/8.7469

image

Another approach could be to merge pairs of lines at 3-way+ junctions that have a mutual angle closest to 180 degrees. I.e., lines that might be the logical continuation of each other. This would help with label placement in MapLibre because there you want flat angles too... And it would be easy to implement since it happens all locally at the nodes.

@wipfli
Copy link
Contributor Author

wipfli commented Nov 10, 2024

I made a separate pull request for angle merging and progressive stub removal in wipfli#18. Let me know if you think these are good directions...

@msbarry
Copy link
Contributor

msbarry commented Nov 10, 2024

The angle heuristic sounds like a promising way to break up 3+ way intersections, will be curious to test it on transportation_name layer.

Re: simplification we could simplify individual edge geometries without touching the endpoints as a part of line merging process. That would avoid breaking up connections. The only problem is that because we group by attributes first then merge lines with the same attribute so that could break connections that should exist between lines with different tags...

@wipfli
Copy link
Contributor Author

wipfli commented Nov 10, 2024

The only problem is that because we group by attributes first then merge lines with the same attribute so that could break connections that should exist between lines with different tags...

I have seen this behavior for example here, where I have two layers. One for highway=primary and one for highway=secondary.

Before simplification it looks like this (https://www.openstreetmap.org/#map=11/47.1629/8.0278):

Screenshot_20241110_223349

And after simplification like this:

Screenshot_20241110_223431

Red lines are nicely connected and purple lines are nicely connected, but between them things are misaligned after simplification.

I wonder if we should introduce something like a set of points that cannot be moved by line simplification?

@msbarry
Copy link
Contributor

msbarry commented Nov 11, 2024

I made a separate pull request for angle merging and progressive stub removal in wipfli#18. Let me know if you think these are good directions...

Nice! The overall approach looks good, made a couple of comments we can either address in that PR or over here after you want to merge it first.

I tried it out on transportation_name layer (I had to change FeatureMerge to set setLoopMinLength and minLength to lengthLimit) and the results are looking closer to JTS line merging (except that JTS contains duplicate lines for each lane of the divided highway and loop line merger collapses them to 1 line 👍 )

image image

@msbarry
Copy link
Contributor

msbarry commented Nov 11, 2024

For simplification, I'm trying to figure out what the rule should be for which points we would be required to keep? Within a graph of lines we are merging, it's straightforward because once we've merged all nodes with degree=2 we can simplify each edge individually before the final merge that combines at nodes based on angles. But to preserve connections across multiple graphs it seems like we'd need to preserve any point that also exists in another graph of lines to merge.

I'm almost wondering if we need to make the line merger aware of all sets of graphs it needs to merge for it to handle this...

@wipfli
Copy link
Contributor Author

wipfli commented Nov 11, 2024

It might actually prove useful outside of the problem of line merging to have a full graph of the road network somewhere available with edges having all the attributes attached. An interesting problem is the automatic selection of roads and there is quite some literature about it. @brawer has found these two nice papers which we are reading at the moment:

To implement those ideas in Planetiler it would be super useful to have the full network stored as a graph. I am optimistic that results could look quite good even if we do this at the tile level.

* Progressive stub removal, angle merge after simplification

* Review fixes
@msbarry
Copy link
Contributor

msbarry commented Nov 12, 2024

It wouldn't be too hard to make that change, we would basically need to pull the loop line merger out one level in FeatureMerge loop and move the "group by tags" logic into LoopLineMerge, just want to figure out how much we should try to tackle in this initial PR.

Those papers seem pretty promising, and give some solutions I had started thinking about related to graph diameter, etc. Since it appears there is an endless amount of tweaking we can do here, what do you think the goal for this initial PR should be? On my end no worse performance or output quality on transportation and transportation_name layers, and any cases where it does better than JTS is a bonus.

Owning the logic in planetiler opens up the door to fixing long-standing issues like properly merging one-way roads, and testing out ideas suggested in those papers later.

Comment on lines 220 to 224
merger = new LoopLineMerger();
for (var outputSegment : outputSegments) {
merger.add(outputSegment);
}
outputSegments = merger.getMergedByAngle();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's pull merge by angle into loop line merger as an (optional?) step that it runs before returning the results so we don't need to invoke it separately

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good

// result.add(feature1.copyWithNewGeometry((LineString) merged).copyWithExtraAttrs(attrs));
// attrs.put("end", "yes");
// result.add(feature1.copyWithNewGeometry(((LineString) merged).getEndPoint()).copyWithExtraAttrs(attrs));

// re-simplify since some endpoints of merged segments may be unnecessary
if (line.getNumPoints() > 2 && tolerance >= 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we should tackle moving line simplification into LoopLineMerger in this initial PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we need to do angle merging after simplification to get connected lines through junctions like this

\        /
 x ---- x
/        \

}

private void mergeTwoEdges(Edge a, Edge b) {
assert a.to == b.from;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this assertion is failing in tests, I think it should be assert a.from == b.from; ?

Comment on lines 229 to 231
// minLength = 10 * 0.0625;
// removeShortEdges();
// merge();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need this, otherwise line merger is emitting linestrings shorter than min length which makes the tests fail. If we do angle merging as a part of this process, should we do it before/beetween any of these min length removals?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the order should be

  • break loops
  • drop stubs
  • simplify
  • merge by angle
  • drop by minlength

so simplify should be inside looplinemerger.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK great! I can add a simplifyEdges() method

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops I see you added that already! Updated it to use a specialized version of DouglasPeuckerSimplifier that works on List<Coordinate> to avoid the extra transformation.

Copy link
Contributor

@msbarry msbarry Nov 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the dropping stubs step, at least for transportation_name layer... I tried the latest version on Massachusetts and it's looking like we have shorter line segments than old JTS version

image image

But if I remove the drop stubs step then JTS and loop liner merger output look pretty close

image image

return result;
}

public List<LineString> getMergedByAngle() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove this public API and make it a configurable part of regular merging.

Comment on lines 272 to 274
if (maxLengthToTrack <= 0) {
return Double.POSITIVE_INFINITY;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was an optimization, when things settle a bit more you can try the benchmarks with and without the short-circuiting and if it doesn't make any difference than just remove it so this just computes the full length of the segment.

Comment on lines +381 to +385
if (!getEdges().isEmpty() && !end.getEdges().isEmpty()) {
Coordinate a = getEdges().getFirst().coordinates.getFirst();
Coordinate b = end.getEdges().getFirst().coordinates.getFirst();
return a.distance(b);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might make sense to just put coordinate on each node so you don't need to look at the edges to get its location.

@wipfli
Copy link
Contributor Author

wipfli commented Nov 15, 2024

What do you think the goal for this initial PR should be? On my end no worse performance or output quality on transportation and transportation_name layers, and any cases where it does better than JTS is a bonus.

My initial motivation for this pull request was to be able to render highway=primary at zoom 6 in Switzerland in the Protomaps Basemap with a good amount of line simplification. With the JTS line merger that did not really work, lines looked fragmented and short features were dropped because they were doing loops.

So I agree with your goals, and suggest we worry about things like road selection methods or alignement between groupedFeatures at a later point in time.

@wipfli
Copy link
Contributor Author

wipfli commented Nov 16, 2024

Thanks for your review! I think I have the code roughly in the state I want for the highway=primary z6 data I used as a playground. As a next step I will look into the tests and also work on the remaining review comments.

Comment on lines +190 to +196
// TODO remove debug features comment
// Map<String, Object> attrs = new HashMap<>();
// attrs.put("idx", i++);
// result.add(feature1.copyWithNewGeometry(line.getStartPoint()).copyWithExtraAttrs(attrs));
// result.add(feature1.copyWithNewGeometry(line).copyWithExtraAttrs(attrs));
// attrs.put("end", "yes");
// result.add(feature1.copyWithNewGeometry(line.getEndPoint()).copyWithExtraAttrs(attrs));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove commented code now?

Comment on lines +209 to +214
// i = 0;
// for (var outputSegment : outputSegments) {
// Map<String, Object> attrs = new HashMap<>();
// attrs.put("idx", ++i);
// result.add(feature1.copyWithNewGeometry(outputSegment).copyWithExtraAttrs(attrs));
// }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here?

import org.locationtech.jts.geom.PrecisionModel;


public class LoopLineMerger {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good, can we just add javadoc to the class explaining the high level idea of what this line merger does, and each public method explaining what the parameter controls?

return result;
}

public static List<VectorTile.Feature> mergeLineStringsAllParams(List<VectorTile.Feature> features,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably don't need this extra method any more. FeatureMerge should eventually have a builder API like LoopLineMerge has, but until then should keep the main entry point opinionated about what defaults it uses.

return null;
}

private Edge mergeTwoEdges(Node node, Edge a, Edge b) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about this small tweak to make a best-effort attempt to preserve line segment directions from the input?

Suggested change
private Edge mergeTwoEdges(Node node, Edge a, Edge b) {
private Edge mergeTwoEdges(Node node, Edge edge1, Edge edge2) {
// attempt to preserve segment directions from the original line
// when: A << N -- B then output C reversed from B to A
// when: A >> N -- B then output C from A to B
Edge a = edge1.main ? edge2 : edge1;
Edge b = edge1.main ? edge1 : edge2;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants