Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(#9585): teach Sentinel sign language #9658

Merged
merged 16 commits into from
Nov 25, 2024
Merged

Conversation

dianabarsan
Copy link
Member

@dianabarsan dianabarsan commented Nov 19, 2024

Description

No longer start and stop sentinel to stop/start transitions and run scheduled tasks. Instead make it listen to two kill signals.
Kill the test process if API is down in the before hook. This way we might get logs when the process ends up hanging.

#9585

Code review checklist

  • Readable: Concise, well named, follows the style guide, documented if necessary.
  • Documented: Configuration and user documentation on cht-docs
  • Tested: Unit and/or e2e where appropriate
  • Internationalised: All user facing text
  • Backwards compatible: Works with existing data and configuration or includes a migration. Any breaking changes documented in the release notes.

Compose URLs

If Build CI hasn't passed, these may 404:

License

The software is provided under AGPL-3.0. Contributions to this project are accepted under the same license.

tests/utils/index.js Fixed Show fixed Hide fixed
tests/utils/index.js Fixed Show fixed Hide fixed
tests/utils/index.js Dismissed Show dismissed Hide dismissed
@dianabarsan
Copy link
Member Author

hey @garethbowen . Wanna do a PR review for a change?

I've added signals to sentinel so we don't need to stop and start it.
It doesn't entirely stabilize the k3d suites, because it doesn't address moments when API is not starting, but for that I've added one attempt to at least stop the process from hanging and instead quit early and hopefully we'll get more info from logs.

Appreciate your time and your ideas :)

@dianabarsan dianabarsan requested review from garethbowen and latin-panda and removed request for latin-panda November 22, 2024 12:00
@dianabarsan
Copy link
Member Author

Also fat fingered @latin-panda . Sorry for the noise!

Copy link
Contributor

@garethbowen garethbowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work as always.

Looks like the successful builds are pretty quick, though you did have to re-run to get a green build. Any idea why they're still timing out?

tests/e2e/upgrade/wdio.conf.js Outdated Show resolved Hide resolved
tests/utils/allure.js Show resolved Hide resolved
tests/utils/index.js Outdated Show resolved Hide resolved
tests/utils/index.js Outdated Show resolved Hide resolved
@@ -1562,6 +1553,9 @@ const isMinimumChromeVersion = process.env.CHROME_VERSION === MINIMUM_BROWSER_VE

const escapeBranchName = (branch) => branch?.replace(/[/|_]/g, '-');

const toggleSentinelTransitions = () => sendSignal('sentinel', 'USR1');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about toggle. Wouldn't it be better to have an explicit start and an explicit stop? It would be very easy to accidentally call toggle twice (eg: test retry) which leaves sentinel in an unexpected state. This would be very hard to debug because the double call could be in one test, but the failure only show up many tests later. I think having two separate signals would make it explicit. Then we can decide if calling startProcessing when already processing would be ignored, logged, or throw an error because someone forgot to clean up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, we can't send custom signals.


module.exports = {
init: () => {
process.on('SIGUSR1', () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I struggled to find any documentation on this but I think we can call the signals whatever we want, right? If so let's use clearer names. What about SIGLISTEN, SIGSTOPLISTENING, and SIGRUNTASKS? Or even SIGSENTINELLISTEN etc? It's getting long but at least it'll be clear that it's a custom signal and not a Node thing?

Also I think SIGUSR1 is reserved and according to this documentation will trigger Node into debugging mode: https://nodejs.org/en/learn/getting-started/debugging - so if we can't rename as above, let's use a different number at least.

Copy link
Member Author

@dianabarsan dianabarsan Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, you can't send custom signals:

➜  ~ kill -s SIGRUNTASKS 26360
kill: unknown signal: SIGRUNTASKS
kill: type kill -l for a list of signals

So you need to select one of the signals in the list. There are some random ones that might not kill the process, but most do, and it felt weird to send a SIGURG to stop feed listening.
This is why there's a toggle to stop/start processing, because we can't send custom signals and there are only two which are "user customizable" - SIGUSR1 and SIGUSR2, which I am using.

Copy link
Contributor

@garethbowen garethbowen Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh that's disgusting. Ok. The only other way I can see to do it would be to have a single signal which then looks somewhere else (file or db) to reload config, but that almost defeats the purpose!

Let's go with what you have.

sentinel/src/lib/feed.js Show resolved Hide resolved
@dianabarsan
Copy link
Member Author

dianabarsan commented Nov 25, 2024

though you did have to re-run to get a green build. Any idea why they're still timing out?

Yea. Sometimes API fails to start. No repro locally and because the test times out, we don't get an artifact for logs.
I added a kill switch to try to get these logs and figure out what's going on. But for now this stabilizes the Sentinel job.

@dianabarsan dianabarsan merged commit 4090a4b into master Nov 25, 2024
44 checks passed
@dianabarsan dianabarsan deleted the 9585-sentinel-start-stop branch November 25, 2024 12:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants