Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Service rewrite #99

Draft
wants to merge 117 commits into
base: main
Choose a base branch
from
Draft

Conversation

1ntEgr8
Copy link
Contributor

@1ntEgr8 1ntEgr8 commented Nov 4, 2024

Depends on #97, hence the larger than expected diff.

1ntEgr8 and others added 30 commits September 23, 2024 08:43
- update placement time of pushed placement
- dedup scheduler events with the same timestamp
Dhruv Garg and others added 13 commits December 4, 2024 01:21
Fixes an issue where if start-master.sh or start-worker.sh exits with a
nonzero code, or more generally if an exception happens in
Service.__enter__(), run_service_experiments.py hangs and doesn't report
the exception.
When the last application is deregistered from the spark service,
execute all remaining events from the simulator.  This allows the
final LOG_STATS event to be processed so we can calculate the SLO
attainment.

Unlike normal runs of the simulator, a SIMULATOR_END event is not
inserted as some tasks might not have finished in the simulator and it's
unclear when they will finish.  The simulator is patched to allow an
empty event queue in Simulator.simulate().
On a TASK_FINISH event, set the task completion time to the time of
the event rather than the last time the task was stepped.  Resolves a
bug in the service where tasks that finish later than the simulator's
profiled runtime predicts get assigned the wrong completion time.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants