-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support PBF to Atlas creation irrespective of PBF size #147
Comments
Can you give a small example of how you would shard an OSM PBF file? |
Trying to follow your outline above, I have used osmosis to shard a larger OSM PBF file, using the completeWays option. But when I multi-atlas the sharded atlases and clone into a PackedAtlas, this is again very memory intensive and takes a very long time. |
Hello @flowrean! Both osm pbf and Atlas are not designed to handle large amounts of data in one single place. When developing locally, I try to allocate 10Gb of memory to my processes, and that can only handle a handful of Atlas shards that are ~20Mb each on disk (zipped). Once in memory, those are much bigger, which is the tradeoff we chose to get very fast processing, even on complex problems. To process larger datasets, there is an option which uses Spark: https://github.com/osmlab/atlas-generator. However that requires you to have access to a spark cluster, and some kind of distributed storage where to put all the sharded pbf files to process. In the end, this will distribute the processing of each shard, to produce one atlas per shard, but it will still not generate a single large atlas for you. One other option, if you only care about a specific type of data, is to take each shard individually and serially filter them down. Once this is complete, and the filtering is aggressive enough, you might be able to do a massive multi-atlas with the slimmed down shards in a reasonable amount of memory. See this StackOverflow question. |
Thank you for your clarification @matthieun, it is most valuable to me. I let go of the intention to create a single large Atlas file, but I would still like to process a large OSM PBF. This I would do in shards (created by Osmosis with the completeWays option) as suggested above. But I run into a new problem: a way that crosses a shard boundary is processed more than once. The geometry of the result can also be different (number of times the line is split), if there are incoming or outgoing ways that do not appear in all shards. This also messes up the edge IDs. Have you encountered this too or do you have any idea how to avoid this situation? |
The current OSM PBF to Atlas flow is optimized for sharded PBF files. The process is very memory intensive and results in
OutOfMemoryError
exceptions for large PBF files. There needs to be a way to support any type of PBF files, irrespective of size. Here is one possible option:Given a PBF file location and a sharding tree, shard the PBF file, produce an Atlas for each shard and produce either sharded Atlas output or as a single Atlas file (multi-atlas the sharded atlases and clone into a PackedAtlas)
If no sharding tree is provided, fall back to a slippy tile zoom level and flat sharding case, then follow the same output strategy as outlined above.
This is loosely related to issue #88. An example of a reported use-case can be found here.
The text was updated successfully, but these errors were encountered: