[WIP] Protect sequential read alternative #896

rbuffat · 2020-05-07T20:32:04Z

This is an outline of an alternative approach to protect sequential read that has fewer issues compared to #892

The main idea is that Collection should know about all iterators. When OGR_L_GetFeature() or OGR_L_GetFeatureCount() are called, who potentially interrupt a sequential read, Collection notifies all Iterators that they are interrupted.
In Iterator.next() it is first checked if the sequential read is interrupted. If yes, the correct position is set with OGR_L_SetNextByIndex.

This has the advantage that we do not need to know about the lifecycle of the iterators (are they still active or not). The drawback is, that in a worst-case the performance is bad as often OGR_L_SetNextByIndex is executed, which is potentially costly.

Another issue that is currently not addressed is that a user can potentially create multiple iterators, which can influence each other:

@pytest.fixture(scope="module", params=[driver for driver, raw in supported_drivers.items() if 'w' in raw
                                        and (driver not in driver_mode_mingdal['w'] or
                                             gdal_version >= GDALVersion(*driver_mode_mingdal['w'][driver][:2]))
                                        and driver not in {'DGN', 'MapInfo File', 'GPSTrackMaker', 'GPX', 'BNA', 'DXF',
                                                           'GML'}])
def slice_dataset_path(request):
    """ Create temporary datasets for test_collection_iterator_items_slice()"""

    driver = request.param
    min_id = 0
    max_id = 9
    schema = {'geometry': 'Point', 'properties': [('position', 'int')]}
    records = [{'geometry': {'type': 'Point', 'coordinates': (0.0, float(i))}, 'properties': {'position': i}} for i
               in range(min_id, max_id + 1)]

    tmpdir = tempfile.mkdtemp()
    path = os.path.join(tmpdir, get_temp_filename(driver))

    with fiona.open(path, 'w',
                    driver=driver,
                    schema=schema) as c:
        c.writerecords(records)
    yield path
    shutil.rmtree(tmpdir)

def test_multiple_iterators(slice_dataset_path):
    start = 0
    stop = 9
    step = 1

    with fiona.open(slice_dataset_path, 'r') as c:

        item_iterator = c.items(start, stop, step)
        filter_iterator = c.filter(start, stop, step)

        item = next(item_iterator)
        item = next(item_iterator)
        filter_item = next(filter_iterator)
        item = next(item_iterator)

       # This will fail, as next(filter_iterator) can also increase the position of item_iterator
        assert int(item[1]['properties']['position']) == 2

coveralls · 2020-05-07T20:57:24Z

Coverage increased (+0.1%) to 83.467% when pulling 8facdfa on rbuffat:sequential_read_alternative into e1f0e1b on Toblerity:maint-1.8.

sgillies · 2020-05-22T18:31:58Z

@rbuffat I appreciate the work here!

In a future version of fiona (2.0) I would like to simplify the relationship between collection and iterator. It's too fancy and fragile now, and I worry about increasing the complexity. I think our time would be better spent discouraging or preventing users from opening multiple iterators on a collection so that we don't have to support it at all.

What would you think about amending this to allow only one iterator per collection/session?

… where sequential read is interrupted

rbuffat · 2020-05-24T12:31:38Z

I agree the relations between Session, Collection, and Iterator are quite complex and the complexity definitely (still) increases with this PR unfortunately.

I adapted the PR to only allow one iterator per session. I did not touch the now failing tests so that you can get a better picture. Most of them are probably cases where it does not matter too much.

I don't think that there are many use cases where there is a need for multiple iterators per Session.
One I can think of is with filter:

with fiona.open() as c:
   c.filter(mask=region1)
   c.filter(mask=region2)
   ...

rbuffat · 2021-02-18T22:00:37Z

This should be obsolete now.

alternative iterator interrupt implementation

1991e7b

rbuffat mentioned this pull request May 7, 2020

Protect sequential read #892

Closed

allow only one active iterator

8facdfa

rbuffat added 3 commits May 24, 2020 13:31

Merge branch 'maint-1.8' into sequential_read_alternative

f8454e0

allow only creation of one iterator per session

363654a

refactor test_iterator_sequential_read_interrupted to force situation…

8a35d03

… where sequential read is interrupted

Merge branch 'maint-1.8' into sequential_read_alternative

1668e3f

rbuffat mentioned this pull request Sep 3, 2020

[WIP] Add support for FlatGeobuf #924

Merged

rbuffat closed this Feb 18, 2021

rbuffat deleted the sequential_read_alternative branch February 18, 2021 22:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Protect sequential read alternative #896

[WIP] Protect sequential read alternative #896

rbuffat commented May 7, 2020 •

edited

Loading

coveralls commented May 7, 2020 •

edited

Loading

sgillies commented May 22, 2020

rbuffat commented May 24, 2020

rbuffat commented Feb 18, 2021

[WIP] Protect sequential read alternative #896

[WIP] Protect sequential read alternative #896

Conversation

rbuffat commented May 7, 2020 • edited Loading

coveralls commented May 7, 2020 • edited Loading

sgillies commented May 22, 2020

rbuffat commented May 24, 2020

rbuffat commented Feb 18, 2021

rbuffat commented May 7, 2020 •

edited

Loading

coveralls commented May 7, 2020 •

edited

Loading