Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Protect sequential read alternative #896

Closed

Conversation

rbuffat
Copy link
Contributor

@rbuffat rbuffat commented May 7, 2020

This is an outline of an alternative approach to protect sequential read that has fewer issues compared to #892

The main idea is that Collection should know about all iterators. When OGR_L_GetFeature() or OGR_L_GetFeatureCount() are called, who potentially interrupt a sequential read, Collection notifies all Iterators that they are interrupted.
In Iterator.next() it is first checked if the sequential read is interrupted. If yes, the correct position is set with OGR_L_SetNextByIndex.

This has the advantage that we do not need to know about the lifecycle of the iterators (are they still active or not). The drawback is, that in a worst-case the performance is bad as often OGR_L_SetNextByIndex is executed, which is potentially costly.

Another issue that is currently not addressed is that a user can potentially create multiple iterators, which can influence each other:

@pytest.fixture(scope="module", params=[driver for driver, raw in supported_drivers.items() if 'w' in raw
                                        and (driver not in driver_mode_mingdal['w'] or
                                             gdal_version >= GDALVersion(*driver_mode_mingdal['w'][driver][:2]))
                                        and driver not in {'DGN', 'MapInfo File', 'GPSTrackMaker', 'GPX', 'BNA', 'DXF',
                                                           'GML'}])
def slice_dataset_path(request):
    """ Create temporary datasets for test_collection_iterator_items_slice()"""

    driver = request.param
    min_id = 0
    max_id = 9
    schema = {'geometry': 'Point', 'properties': [('position', 'int')]}
    records = [{'geometry': {'type': 'Point', 'coordinates': (0.0, float(i))}, 'properties': {'position': i}} for i
               in range(min_id, max_id + 1)]

    tmpdir = tempfile.mkdtemp()
    path = os.path.join(tmpdir, get_temp_filename(driver))

    with fiona.open(path, 'w',
                    driver=driver,
                    schema=schema) as c:
        c.writerecords(records)
    yield path
    shutil.rmtree(tmpdir)

def test_multiple_iterators(slice_dataset_path):
    start = 0
    stop = 9
    step = 1

    with fiona.open(slice_dataset_path, 'r') as c:

        item_iterator = c.items(start, stop, step)
        filter_iterator = c.filter(start, stop, step)

        item = next(item_iterator)
        item = next(item_iterator)
        filter_item = next(filter_iterator)
        item = next(item_iterator)

       # This will fail, as next(filter_iterator) can also increase the position of item_iterator
        assert int(item[1]['properties']['position']) == 2

@rbuffat rbuffat mentioned this pull request May 7, 2020
@coveralls
Copy link

coveralls commented May 7, 2020

Coverage Status

Coverage increased (+0.1%) to 83.467% when pulling 8facdfa on rbuffat:sequential_read_alternative into e1f0e1b on Toblerity:maint-1.8.

@sgillies
Copy link
Member

@rbuffat I appreciate the work here!

In a future version of fiona (2.0) I would like to simplify the relationship between collection and iterator. It's too fancy and fragile now, and I worry about increasing the complexity. I think our time would be better spent discouraging or preventing users from opening multiple iterators on a collection so that we don't have to support it at all.

What would you think about amending this to allow only one iterator per collection/session?

@rbuffat
Copy link
Contributor Author

rbuffat commented May 24, 2020

I agree the relations between Session, Collection, and Iterator are quite complex and the complexity definitely (still) increases with this PR unfortunately.

I adapted the PR to only allow one iterator per session. I did not touch the now failing tests so that you can get a better picture. Most of them are probably cases where it does not matter too much.

I don't think that there are many use cases where there is a need for multiple iterators per Session.
One I can think of is with filter:

with fiona.open() as c:
   c.filter(mask=region1)
   c.filter(mask=region2)
   ...

@rbuffat
Copy link
Contributor Author

rbuffat commented Feb 18, 2021

This should be obsolete now.

@rbuffat rbuffat closed this Feb 18, 2021
@rbuffat rbuffat deleted the sequential_read_alternative branch February 18, 2021 22:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants