-
-
Notifications
You must be signed in to change notification settings - Fork 623
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Give the option to terminate the engine without firing Events.COMPLET… #3309
Conversation
…ED. The default behaviour is not changed. Note that even though Events.COMPLETED is not fired, its timer is updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @bonassifabio !
I made the first pass on the code and left some comments, let me know what you think.
Co-authored-by: vfdev <[email protected]>
Co-authored-by: vfdev <[email protected]>
Co-authored-by: vfdev <[email protected]>
Co-authored-by: vfdev <[email protected]>
Co-authored-by: vfdev <[email protected]>
Thanks for your comments! At this point |
- Do not update self.state.times[Events.COMPLETED.name] if terminated - Fixed unit test
I just pushed 5f2cd88 where I fixed the unit test and hopefully the docs. I also followed your suggestion concerning the update of |
Can you also please run the following to fix code formatting problems: bash ./tests/run_code_style.sh install
bash ./tests/run_code_style.sh fmt |
Co-authored-by: vfdev <[email protected]>
- Engine time logging moved out of the if clause. In the log message "completed" has been replaced with "finished" to avoid confusion. - Same changes applied to the method `_internal_run_legacy()`
Sorry for accidentally including it into the previous commit!
- COMPLETED : triggered when engine's run is completed | ||
- COMPLETED : triggered when engine's run is completed or terminated with | ||
:meth:`~ignite.engine.engine.Engine.terminate()`, unless the flag | ||
`skip_completed` is set to True. | ||
|
||
The table below illustrates which events are triggered when various termination methods are called. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bonassifabio actually, I find myself as well this table very misleading and hard to understand.
I think we can improve it in the following way:
- EVENT_COMPLETED -> EPOCH_COMPLETED
- Let's add new column on the right after "TERMINATE" and call it "COMPLETED"
- Check symbol for
terminate()
line onTERMINATE_SINGLE_EPOCH
column is wrong actually and should be replaced withx
.
By the way, here is how it is generated now: https://deploy-preview-3309--pytorch-ignite-preview.netlify.app/generated/ignite.engine.events.events#ignite.engine.events.Events
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I was interpreting the first column as Events.COMPLETED, but from your comment it would seem that's not the case.
What if we change the table to something like this?
Method | TERMINATE_SINGLE_EPOCH |
EPOCH_COMPLETED |
TERMINATE |
COMPLETED |
---|---|---|---|---|
No termination | x | ✔ | x | ✔ |
terminate_epoch() |
✔ | x | x | x |
terminate() |
x | x | ✔ | ✔ |
terminate(skip_completed=True) |
x | x | ✔ | x |
Few comments:
- To my understanding (please correct me if I'm wrong) if
terminate()
is called, the epoch is not necessarily completed. If this statement is true, it would perhaps make more sense to moveEPOCH_COMPLETED
beforeTERMINATE
in the list of Events, as I did in the table, sinceEPOCH_COMPLETED
would almost never be fired afterEPOCH_COMPLETED
. - The columns would thus be ordered so that the first two are "epoch wise", while the last two happen at the end of the engine run. Not sure I
- I included a new row for
terminate(skip_completed=True)
. Not sure if that's really necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the new table, looks good!
Few corrections however about its content:
The line terminate_epoch()
, column "EPOCH_COMPLETED" and "COMPLETED" should be both checked as v
.
About your comments, I'm OK with all your suggestions, let's keep this order and the new line.
Checking code:
from ignite.engine import Engine, Events
from ignite.utils import setup_logger, logging
train_data = range(10)
max_epochs = 5
def train_step(engine, batch):
pass
trainer = Engine(train_step)
# Enable trainer logger for a debug mode
trainer.logger = setup_logger("trainer", level=logging.DEBUG)
@trainer.on(Events.ITERATION_COMPLETED(once=12))
def call_terminate_epoch():
trainer.terminate_epoch()
trainer.run(train_data, max_epochs=max_epochs)
Output:
...
2024-12-03 09:06:33,056 trainer INFO: Terminate current epoch is signaled. Current epoch iteration will stop after current iteration is finished.
2024-12-03 09:06:33,058 trainer DEBUG: 2 | 12, Firing handlers for event Events.TERMINATE_SINGLE_EPOCH
2024-12-03 09:06:33,060 trainer DEBUG: 2 | 12, Firing handlers for event Events.EPOCH_COMPLETED
2024-12-03 09:06:33,061 trainer INFO: Epoch[2] Complete. Time taken: 00:00:00.023
2024-12-03 09:06:33,064 trainer DEBUG: 3 | 12, Firing handlers for event Events.EPOCH_STARTED
2024-12-03 09:06:33,066 trainer DEBUG: 3 | 12, Firing handlers for event Events.GET_BATCH_STARTED
...
You can run it in https://pytorch-ignite.ai/playground
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for pointing this out!
- Do you think we should add an argument
skip_completed
toterminate_epoch()
with the same behavior of suppressing the firing ofEvents.EPOCH_COMPLETED
? That might be useful, for example, to avoid running the evaluator, the checkpointer, and the LR scheduler if an epoch has been terminated. - I agree on checking
EPOCH_COMPLETED
forterminate_epoch()
. However, I think that checkingCOMPLETED
might seem to imply thatEvents.COMPLETED
is also fired byterminate_epoch()
right afterEvents.EPOCH_COMPLETED
. What if we leave that cell empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we should add an argument skip_completed to terminate_epoch() with the same behavior of suppressing the firing of Events.EPOCH_COMPLETED?
Good idea! Let's make this in a separate PR such that this one does not become too large
I agree on checking EPOCH_COMPLETED for terminate_epoch(). However, I think that checking COMPLETED might seem to imply that Events.COMPLETED is also fired by terminate_epoch() right after Events.EPOCH_COMPLETED. What if we leave that cell empty?
well, I was seeing this table as showing which events are triggered when we call terminate*()
functions. I did not understand v
mark on COMPLETED for terminate_epoch
as the sequence of triggered events, but just whether an event is triggered or not. To avoid misunderstanding we can add a column between EPOCH_COMPLETED
and TERMINATE
named as for example "Other events" and check it where it is appropriate. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea! Let's make this in a separate PR such that this one does not become too large
Cool, I'll create a new PR when we're done with this one
I did not understand v mark on COMPLETED for terminate_epoch as the sequence of triggered events, but just whether an event is triggered or not.
I see the point, but doesn't the firing of TERMINATE
and COMPLETED
independent of terminate_epoch()
?
See for example this code.
from ignite.engine import Engine, Events
from ignite.utils import setup_logger, logging
train_data = range(10)
max_epochs = 5
def train_step(engine, batch):
pass
trainer = Engine(train_step)
# Enable trainer logger for a debug mode
trainer.logger = setup_logger("trainer", level=logging.DEBUG)
@trainer.on(Events.ITERATION_COMPLETED(once=12))
def call_terminate_epoch():
trainer.terminate_epoch()
@trainer.on(Events.ITERATION_COMPLETED(once=15))
def call_terminate():
trainer.terminate(skip_completed=True)
trainer.run(train_data, max_epochs=max_epochs)
This is probably an edge case. As a newbie of ignite, it would honestly make more sense to see no entry for TERMINATE
and COMPLETED
for terminate_epoch()
. However, you are for sure more experienced than me, so we can go ahead with your solution if you prefer 🙂
Solution 1
Method | TERMINATE_SINGLE_EPOCH |
EPOCH_COMPLETED |
TERMINATE |
COMPLETED |
|
---|---|---|---|---|---|
No termination | x | ✔ | x | ✔ | |
terminate_epoch() |
✔ | x | |||
terminate() |
x | x | ✔ | ✔ | |
terminate(skip_completed=True) |
x | x | ✔ | x |
Solution 2
Method | TERMINATE_SINGLE_EPOCH |
EPOCH_COMPLETED |
TERMINATE |
COMPLETED |
|
---|---|---|---|---|---|
No termination | x | ✔ | x | ✔ | |
terminate_epoch() |
✔ | x | x | ✔ | |
terminate() |
x | x | ✔ | ✔ | |
terminate(skip_completed=True) |
x | x | ✔ | x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the point, but doesn't the firing of TERMINATE and COMPLETED independent of terminate_epoch()?
yes, they should be independent, i would say. If this is not the case, it should be a bug...
As a newbie of ignite, it would honestly make more sense to see no entry for TERMINATE and COMPLETED for terminate_epoch(). However, you are for sure more experienced than me, so we can go ahead with your solution if you prefer 🙂
Actually, your feedback is more important as other users may think the same (and those who know how it works do not read the docs :) ) !
Initially, the goal of the table is to provide visual representation of the info on which events are triggered in each case (when call terminate*()
) and also compared with regular run.
Now, seeing its content I find this tabular representation more misleading than useful as triggered events depend on where the terminate function was called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would propose to keep the table (either Table 1 or Table 2), and then to open a new issue to seek some input also from other developers/users and discuss the problem more in detail there.
We could for example think to a flow chart diagram or something like that...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I would remove it and maybe added some code output to show exactly how it works...
OK, up to you, either let's revert the changes or keep the most understandable version and update it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for working on this PR @bonassifabio !
I have few comments how to improve a bit the tests
@bonassifabio we have now the CI completely broken due to yesterday change of pynvml package what we should replace (temporary patch is here #3310, once it is merged, please update your PR). By the way, if you have suggestions on a similar package to be used, we are open to suggestions :) |
Update requirements-dev.txt (#3310)
@vfdev-5 I am not sure why some test is failing, seems unrelated to the changes in this PR. Any input? |
We can ignore failing hvd tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @bonassifabio, LGTM!
We can either fix the table here or in a follow-up PR
Cool, applied minimal changes to the table and will open an issue to discuss how to make it more clear. |
As per this Pytorch discussion post, This pull request aims to allow the user to terminate an engine (via
engine.terminate()
) without consequently firingEvents.COMPLETED
. This is done by passing the optional argumentskip_event_completed=True
to the methodterminate()
.While the default behavior is still to fire
Events.COMPLETED
after termination, it is useful to allow the user to do otherwise. This might be useful, for example, when one of handlers ofEvents.COMPLETED
needs to be executed only upon graceful termination of the Engine.Alternative solutions to the one proposed in this PR could be:
has_been_terminated
that is set to True whenEvents.TERMINATE
is triggeredEvents.COMPLETED
, likeEvents.COMPLETED(terminated=True)
, which would be perhaps the most "ignitic" way to add this functionality.Other changes:
Fixed the table in the Events docs.
Related issues:
#3308
Check list: