Support PBF to Atlas creation irrespective of PBF size #147

MikeGost · 2018-06-22T17:58:01Z

The current OSM PBF to Atlas flow is optimized for sharded PBF files. The process is very memory intensive and results in OutOfMemoryError exceptions for large PBF files. There needs to be a way to support any type of PBF files, irrespective of size. Here is one possible option:

Given a PBF file location and a sharding tree, shard the PBF file, produce an Atlas for each shard and produce either sharded Atlas output or as a single Atlas file (multi-atlas the sharded atlases and clone into a PackedAtlas)
If no sharding tree is provided, fall back to a slippy tile zoom level and flat sharding case, then follow the same output strategy as outlined above.

This is loosely related to issue #88. An example of a reported use-case can be found here.

The text was updated successfully, but these errors were encountered:

flowrean · 2019-05-23T15:24:54Z

Can you give a small example of how you would shard an OSM PBF file?
Is there any documentation detailing working with shards in Atlas? I can only find this README about sharding, but it does not include a code example.

flowrean · 2019-05-31T15:10:40Z

Trying to follow your outline above, I have used osmosis to shard a larger OSM PBF file, using the completeWays option. But when I multi-atlas the sharded atlases and clone into a PackedAtlas, this is again very memory intensive and takes a very long time.
Is there any way around this or am I doing something wrong?

matthieun · 2019-05-31T20:08:48Z

Hello @flowrean!

Both osm pbf and Atlas are not designed to handle large amounts of data in one single place. When developing locally, I try to allocate 10Gb of memory to my processes, and that can only handle a handful of Atlas shards that are ~20Mb each on disk (zipped). Once in memory, those are much bigger, which is the tradeoff we chose to get very fast processing, even on complex problems.

To process larger datasets, there is an option which uses Spark: https://github.com/osmlab/atlas-generator. However that requires you to have access to a spark cluster, and some kind of distributed storage where to put all the sharded pbf files to process. In the end, this will distribute the processing of each shard, to produce one atlas per shard, but it will still not generate a single large atlas for you.

One other option, if you only care about a specific type of data, is to take each shard individually and serially filter them down. Once this is complete, and the filtering is aggressive enough, you might be able to do a massive multi-atlas with the slimmed down shards in a reasonable amount of memory. See this StackOverflow question.

flowrean · 2019-06-20T15:25:23Z

Thank you for your clarification @matthieun, it is most valuable to me.

I let go of the intention to create a single large Atlas file, but I would still like to process a large OSM PBF. This I would do in shards (created by Osmosis with the completeWays option) as suggested above. But I run into a new problem: a way that crosses a shard boundary is processed more than once. The geometry of the result can also be different (number of times the line is split), if there are incoming or outgoing ways that do not appear in all shards. This also messes up the edge IDs.

Have you encountered this too or do you have any idea how to avoid this situation?

MikeGost added the New Feature label Jun 22, 2018

MikeGost mentioned this issue Jun 22, 2018

Task run failure osmlab/atlas-checks#50

Closed

MikeGost changed the title ~~Support PBF to Atlas translation irrespective of PBF size~~ Support PBF to Atlas creation irrespective of PBF size Jun 22, 2018

jklamer mentioned this issue Jun 11, 2019

Task :runChecks FAILED osmlab/atlas-checks#164

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support PBF to Atlas creation irrespective of PBF size #147

Support PBF to Atlas creation irrespective of PBF size #147

MikeGost commented Jun 22, 2018 •

edited

Loading

flowrean commented May 23, 2019

flowrean commented May 31, 2019

matthieun commented May 31, 2019

flowrean commented Jun 20, 2019

Support PBF to Atlas creation irrespective of PBF size #147

Support PBF to Atlas creation irrespective of PBF size #147

Comments

MikeGost commented Jun 22, 2018 • edited Loading

flowrean commented May 23, 2019

flowrean commented May 31, 2019

matthieun commented May 31, 2019

flowrean commented Jun 20, 2019

MikeGost commented Jun 22, 2018 •

edited

Loading