Give the option to terminate the engine without firing Events.COMPLET… #3309

bonassifabio · 2024-12-02T17:50:10Z

As per this Pytorch discussion post, This pull request aims to allow the user to terminate an engine (via engine.terminate()) without consequently firing Events.COMPLETED. This is done by passing the optional argument skip_event_completed=True to the method terminate().

While the default behavior is still to fire Events.COMPLETED after termination, it is useful to allow the user to do otherwise. This might be useful, for example, when one of handlers of Events.COMPLETED needs to be executed only upon graceful termination of the Engine.

Alternative solutions to the one proposed in this PR could be:

include a flag has_been_terminated that is set to True when Events.TERMINATE is triggered
include a new filter to Events.COMPLETED, like Events.COMPLETED(terminated=True), which would be perhaps the most "ignitic" way to add this functionality.

Other changes:

Fixed the table in the Events docs.

Related issues:

#3308

Check list:

New tests are added (if a new feature is added)
New doc strings: description and/or example code are in RST format
Documentation is updated (if required)

…ED. The default behaviour is not changed. Note that even though Events.COMPLETED is not fired, its timer is updated.

vfdev-5

Thanks for the PR @bonassifabio !
I made the first pass on the code and left some comments, let me know what you think.

ignite/engine/engine.py

ignite/engine/events.py

Co-authored-by: vfdev <[email protected]>

bonassifabio · 2024-12-02T21:25:29Z

Thanks for your comments! At this point skip_event_completed should be renamed to skip_completed also in the docs, as per commit f1d194a.

ignite/engine/engine.py

- Do not update self.state.times[Events.COMPLETED.name] if terminated - Fixed unit test

bonassifabio · 2024-12-02T22:35:19Z

I just pushed 5f2cd88 where I fixed the unit test and hopefully the docs. I also followed your suggestion concerning the update of self.state.times[Events.COMPLETED.name]

ignite/engine/engine.py

vfdev-5 · 2024-12-02T22:40:29Z

Can you also please run the following to fix code formatting problems:

bash ./tests/run_code_style.sh install
bash ./tests/run_code_style.sh fmt

ignite/engine/engine.py

Co-authored-by: vfdev <[email protected]>

- Engine time logging moved out of the if clause. In the log message "completed" has been replaced with "finished" to avoid confusion. - Same changes applied to the method `_internal_run_legacy()`

Sorry for accidentally including it into the previous commit!

ignite/engine/events.py

vfdev-5 · 2024-12-03T00:09:24Z

ignite/engine/events.py

-    - COMPLETED : triggered when engine's run is completed
+    - COMPLETED : triggered when engine's run is completed or terminated with
+      :meth:`~ignite.engine.engine.Engine.terminate()`, unless the flag
+      `skip_completed` is set to True.

    The table below illustrates which events are triggered when various termination methods are called.


@bonassifabio actually, I find myself as well this table very misleading and hard to understand.
I think we can improve it in the following way:

EVENT_COMPLETED -> EPOCH_COMPLETED

Let's add new column on the right after "TERMINATE" and call it "COMPLETED"

Check symbol for terminate() line on TERMINATE_SINGLE_EPOCH column is wrong actually and should be replaced with x.

By the way, here is how it is generated now: https://deploy-preview-3309--pytorch-ignite-preview.netlify.app/generated/ignite.engine.events.events#ignite.engine.events.Events

Actually, I was interpreting the first column as Events.COMPLETED, but from your comment it would seem that's not the case.

What if we change the table to something like this?

Method TERMINATE_SINGLE_EPOCH EPOCH_COMPLETED TERMINATE COMPLETED

No termination x ✔ x ✔

terminate_epoch() ✔ x x x

terminate() x x ✔ ✔

terminate(skip_completed=True) x x ✔ x

Few comments:

To my understanding (please correct me if I'm wrong) if terminate() is called, the epoch is not necessarily completed. If this statement is true, it would perhaps make more sense to move EPOCH_COMPLETED before TERMINATE in the list of Events, as I did in the table, since EPOCH_COMPLETED would almost never be fired after EPOCH_COMPLETED.

The columns would thus be ordered so that the first two are "epoch wise", while the last two happen at the end of the engine run. Not sure I

I included a new row for terminate(skip_completed=True). Not sure if that's really necessary.

Thanks for the new table, looks good!
Few corrections however about its content:

The line terminate_epoch(), column "EPOCH_COMPLETED" and "COMPLETED" should be both checked as v.

About your comments, I'm OK with all your suggestions, let's keep this order and the new line.

Checking code:

from ignite.engine import Engine, Events from ignite.utils import setup_logger, logging train_data = range(10) max_epochs = 5 def train_step(engine, batch): pass trainer = Engine(train_step) # Enable trainer logger for a debug mode trainer.logger = setup_logger("trainer", level=logging.DEBUG) @trainer.on(Events.ITERATION_COMPLETED(once=12)) def call_terminate_epoch(): trainer.terminate_epoch() trainer.run(train_data, max_epochs=max_epochs)

Output:

... 2024-12-03 09:06:33,056 trainer INFO: Terminate current epoch is signaled. Current epoch iteration will stop after current iteration is finished. 2024-12-03 09:06:33,058 trainer DEBUG: 2 | 12, Firing handlers for event Events.TERMINATE_SINGLE_EPOCH 2024-12-03 09:06:33,060 trainer DEBUG: 2 | 12, Firing handlers for event Events.EPOCH_COMPLETED 2024-12-03 09:06:33,061 trainer INFO: Epoch[2] Complete. Time taken: 00:00:00.023 2024-12-03 09:06:33,064 trainer DEBUG: 3 | 12, Firing handlers for event Events.EPOCH_STARTED 2024-12-03 09:06:33,066 trainer DEBUG: 3 | 12, Firing handlers for event Events.GET_BATCH_STARTED ...

You can run it in https://pytorch-ignite.ai/playground

Thank you for pointing this out!

Do you think we should add an argument skip_completed to terminate_epoch() with the same behavior of suppressing the firing of Events.EPOCH_COMPLETED? That might be useful, for example, to avoid running the evaluator, the checkpointer, and the LR scheduler if an epoch has been terminated.

I agree on checking EPOCH_COMPLETED for terminate_epoch(). However, I think that checking COMPLETED might seem to imply that Events.COMPLETED is also fired by terminate_epoch() right after Events.EPOCH_COMPLETED. What if we leave that cell empty?

Do you think we should add an argument skip_completed to terminate_epoch() with the same behavior of suppressing the firing of Events.EPOCH_COMPLETED?

Good idea! Let's make this in a separate PR such that this one does not become too large

I agree on checking EPOCH_COMPLETED for terminate_epoch(). However, I think that checking COMPLETED might seem to imply that Events.COMPLETED is also fired by terminate_epoch() right after Events.EPOCH_COMPLETED. What if we leave that cell empty?

well, I was seeing this table as showing which events are triggered when we call terminate*() functions. I did not understand v mark on COMPLETED for terminate_epoch as the sequence of triggered events, but just whether an event is triggered or not. To avoid misunderstanding we can add a column between EPOCH_COMPLETED and TERMINATE named as for example "Other events" and check it where it is appropriate. What do you think?

Good idea! Let's make this in a separate PR such that this one does not become too large

Cool, I'll create a new PR when we're done with this one

I did not understand v mark on COMPLETED for terminate_epoch as the sequence of triggered events, but just whether an event is triggered or not.

I see the point, but doesn't the firing of TERMINATE and COMPLETED independent of terminate_epoch()?
See for example this code.

from ignite.engine import Engine, Events from ignite.utils import setup_logger, logging train_data = range(10) max_epochs = 5 def train_step(engine, batch): pass trainer = Engine(train_step) # Enable trainer logger for a debug mode trainer.logger = setup_logger("trainer", level=logging.DEBUG) @trainer.on(Events.ITERATION_COMPLETED(once=12)) def call_terminate_epoch(): trainer.terminate_epoch() @trainer.on(Events.ITERATION_COMPLETED(once=15)) def call_terminate(): trainer.terminate(skip_completed=True) trainer.run(train_data, max_epochs=max_epochs)

This is probably an edge case. As a newbie of ignite, it would honestly make more sense to see no entry for TERMINATE and COMPLETED for terminate_epoch(). However, you are for sure more experienced than me, so we can go ahead with your solution if you prefer 🙂

Solution 1

Method TERMINATE_SINGLE_EPOCH EPOCH_COMPLETED TERMINATE COMPLETED

No termination x ✔ x ✔

terminate_epoch() ✔ x

terminate() x x ✔ ✔

terminate(skip_completed=True) x x ✔ x

Solution 2

Method TERMINATE_SINGLE_EPOCH EPOCH_COMPLETED TERMINATE COMPLETED

No termination x ✔ x ✔

terminate_epoch() ✔ x x ✔

terminate() x x ✔ ✔

terminate(skip_completed=True) x x ✔ x

I see the point, but doesn't the firing of TERMINATE and COMPLETED independent of terminate_epoch()?

yes, they should be independent, i would say. If this is not the case, it should be a bug...

As a newbie of ignite, it would honestly make more sense to see no entry for TERMINATE and COMPLETED for terminate_epoch(). However, you are for sure more experienced than me, so we can go ahead with your solution if you prefer 🙂

Actually, your feedback is more important as other users may think the same (and those who know how it works do not read the docs :) ) !

Initially, the goal of the table is to provide visual representation of the info on which events are triggered in each case (when call terminate*()) and also compared with regular run.
Now, seeing its content I find this tabular representation more misleading than useful as triggered events depend on where the terminate function was called.

I would propose to keep the table (either Table 1 or Table 2), and then to open a new issue to seek some input also from other developers/users and discuss the problem more in detail there.
We could for example think to a flow chart diagram or something like that...

Personally I would remove it and maybe added some code output to show exactly how it works...
OK, up to you, either let's revert the changes or keep the most understandable version and update it later.

vfdev-5

Thanks a lot for working on this PR @bonassifabio !
I have few comments how to improve a bit the tests

tests/ignite/engine/test_engine.py

vfdev-5 · 2024-12-03T10:21:00Z

@bonassifabio we have now the CI completely broken due to yesterday change of pynvml package what we should replace (temporary patch is here #3310, once it is merged, please update your PR). By the way, if you have suggestions on a similar package to be used, we are open to suggestions :)

Update requirements-dev.txt (#3310)

bonassifabio · 2024-12-03T12:32:36Z

@vfdev-5 I am not sure why some test is failing, seems unrelated to the changes in this PR. Any input?

vfdev-5 · 2024-12-03T12:35:41Z

@vfdev-5 I am not sure why some test is failing, seems unrelated to the changes in this PR. Any input?

We can ignore failing hvd tests

vfdev-5

Thanks @bonassifabio, LGTM!
We can either fix the table here or in a follow-up PR

bonassifabio · 2024-12-03T12:57:32Z

Thanks @bonassifabio, LGTM! We can either fix the table here or in a follow-up PR

Cool, applied minimal changes to the table and will open an issue to discuss how to make it more clear.

Give the option to terminate the engine without firing Events.COMPLET…

addd578

…ED. The default behaviour is not changed. Note that even though Events.COMPLETED is not fired, its timer is updated.

github-actions bot added the module: engine Engine module label Dec 2, 2024

bonassifabio mentioned this pull request Dec 2, 2024

Add flag to Engine.terminate(skip_complete_event: bool = False) #3308

Closed

vfdev-5 reviewed Dec 2, 2024

View reviewed changes

bonassifabio and others added 6 commits December 2, 2024 22:03

Update ignite/engine/engine.py

5cbca17

Co-authored-by: vfdev <[email protected]>

Update ignite/engine/engine.py

c907d6a

Co-authored-by: vfdev <[email protected]>

Update ignite/engine/engine.py

42d9da6

Co-authored-by: vfdev <[email protected]>

Update ignite/engine/engine.py

5a5babd

Co-authored-by: vfdev <[email protected]>

Update ignite/engine/events.py

b6c9bbf

Co-authored-by: vfdev <[email protected]>

Argument skip_event_completed renamed to skip_completed

f1d194a

vfdev-5 reviewed Dec 2, 2024

View reviewed changes

ignite/engine/engine.py Outdated Show resolved Hide resolved

- Fixed docs broken links.

5f2cd88

- Do not update self.state.times[Events.COMPLETED.name] if terminated - Fixed unit test

vfdev-5 reviewed Dec 2, 2024

View reviewed changes

ignite/engine/engine.py Outdated Show resolved Hide resolved

vfdev-5 reviewed Dec 2, 2024

View reviewed changes

ignite/engine/engine.py Outdated Show resolved Hide resolved

bonassifabio and others added 3 commits December 3, 2024 00:38

Update ignite/engine/engine.py

833b56e

Co-authored-by: vfdev <[email protected]>

Refactoring and patching.

9509e0b

- Engine time logging moved out of the if clause. In the log message "completed" has been replaced with "finished" to avoid confusion. - Same changes applied to the method `_internal_run_legacy()`

Restored .gitignore

44e837e

Sorry for accidentally including it into the previous commit!

vfdev-5 reviewed Dec 2, 2024

View reviewed changes

ignite/engine/events.py Outdated Show resolved Hide resolved

Update ignite/engine/events.py

3f38645

vfdev-5 reviewed Dec 3, 2024

View reviewed changes

Fixed typo in test_engine.py

17e973f

vfdev-5 reviewed Dec 3, 2024

View reviewed changes

tests/ignite/engine/test_engine.py Outdated Show resolved Hide resolved

tests/ignite/engine/test_engine.py Outdated Show resolved Hide resolved

tests/ignite/engine/test_engine.py Outdated Show resolved Hide resolved

Parametrized test for engine.terminate(skip_completed)

7e408b7

Merge pull request #1 from pytorch/master

8ecf8f0

Update requirements-dev.txt (#3310)

vfdev-5 added 2 commits December 3, 2024 13:35

Merge branch 'master' into master

1c493eb

Merge branch 'master' into master

58ad05f

vfdev-5 approved these changes Dec 3, 2024

View reviewed changes

bonassifabio added 2 commits December 3, 2024 13:55

Update event table

cf1cbcd

Merge branch 'master' of https://github.com/bonassifabio/ignite

993678e

Fixed documentation

a133dfb

vfdev-5 merged commit 6f8ad2a into pytorch:master Dec 3, 2024
17 of 18 checks passed

This was referenced Dec 3, 2024

Add flag skip_epoch_completed to Engine.terminate_epoch() #3312

Closed

Allow to terminate an epoch without firing Events.EPOCH_COMPLETED #3313

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Give the option to terminate the engine without firing Events.COMPLET… #3309

Give the option to terminate the engine without firing Events.COMPLET… #3309

bonassifabio commented Dec 2, 2024 •

edited

Loading

vfdev-5 left a comment

bonassifabio commented Dec 2, 2024

bonassifabio commented Dec 2, 2024

vfdev-5 commented Dec 2, 2024

vfdev-5 Dec 3, 2024

bonassifabio Dec 3, 2024

vfdev-5 Dec 3, 2024

bonassifabio Dec 3, 2024

vfdev-5 Dec 3, 2024

bonassifabio Dec 3, 2024

vfdev-5 Dec 3, 2024

bonassifabio Dec 3, 2024

vfdev-5 Dec 3, 2024

vfdev-5 left a comment

vfdev-5 commented Dec 3, 2024 •

edited

Loading

bonassifabio commented Dec 3, 2024

vfdev-5 commented Dec 3, 2024

vfdev-5 left a comment

bonassifabio commented Dec 3, 2024

Method	`TERMINATE_SINGLE_EPOCH`	`EPOCH_COMPLETED`	`TERMINATE`	`COMPLETED`
No termination	x	✔	x	✔
`terminate_epoch()`	✔	x	x	x
`terminate()`	x	x	✔	✔
`terminate(skip_completed=True)`	x	x	✔	x

Give the option to terminate the engine without firing Events.COMPLET… #3309

Give the option to terminate the engine without firing Events.COMPLET… #3309

Conversation

bonassifabio commented Dec 2, 2024 • edited Loading

Other changes:

Related issues:

Check list:

vfdev-5 left a comment

Choose a reason for hiding this comment

bonassifabio commented Dec 2, 2024

bonassifabio commented Dec 2, 2024

vfdev-5 commented Dec 2, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vfdev-5 left a comment

Choose a reason for hiding this comment

vfdev-5 commented Dec 3, 2024 • edited Loading

bonassifabio commented Dec 3, 2024

vfdev-5 commented Dec 3, 2024

vfdev-5 left a comment

Choose a reason for hiding this comment

bonassifabio commented Dec 3, 2024

bonassifabio commented Dec 2, 2024 •

edited

Loading

vfdev-5 commented Dec 3, 2024 •

edited

Loading