Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[usage.jenkins.io] TLS Certificate expires on 2025-01-28 #4508

Closed
dduportal opened this issue Jan 22, 2025 · 4 comments
Closed

[usage.jenkins.io] TLS Certificate expires on 2025-01-28 #4508

dduportal opened this issue Jan 22, 2025 · 4 comments

Comments

@dduportal
Copy link
Contributor

Service(s)

Other

Summary

We've been alerted by our monitoring that the TLS certificate used for https://usage.jenkins.io/ expires the 28 January 2025.

echo -n Q | openssl s_client -servername "usage.jenkins.io" -connect "usage.jenkins.io:443" | openssl x509 -noout -dates
# ...
notAfter=Jan 28 05:03:15 2025 GMT

This certificate should be automatically renewed by certbot on this machine. It means something went wrong and it need to be diagnosed and fixed.

Reproduction steps

No response

@dduportal dduportal added the triage Incoming issues that need review label Jan 22, 2025
@dduportal dduportal added this to the infra-team-sync-2025-01-28 milestone Jan 22, 2025
@dduportal dduportal self-assigned this Jan 22, 2025
@dduportal dduportal added usage.jenkins.io letsencrypt and removed triage Incoming issues that need review labels Jan 22, 2025
@dduportal
Copy link
Contributor Author

Audit:

  • The VM has a crontab (user root) which runs certbot daily at 06:00am UTC and writes its logs to /var/log/certbot-renew-all.log
  • Logs shows following errors for renewal:
Failed to renew certificate usage.jenkins.io with error: HTTPSConnectionPool(host='acme-v02.api.letsencrypt.org', port=443): Max retries exceeded with url: /directory (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7ff9b35220b0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution'))

=> It means that:

  • Our renewal system works: it detected the certificate expiration 25 days ago and tried to renew once a day
  • But the issue is related to DNS resolution on the machine

@dduportal
Copy link
Contributor Author

Fix:

  • Restarted the systemd DNS resolution system: service systemd-resolved restart

  • Verified it worked by restarting puppet agent: service puppet restart and checking the resulting logs: journalctl -u puppet -f
    => DNS resolution re-established

  • Then, triggered the crontab command (full command). Exit code was 0 and the logs in /var/log/certbot-renew-all.log now show:

Wed Jan 22 10:16:53 AM UTC 2025
Saving debug log to /var/log/letsencrypt/letsencrypt.log

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Processing /etc/letsencrypt/renewal/usage.jenkins.io.conf
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Renewing an existing certificate for usage.jenkins.io

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Congratulations, all renewals succeeded: 
  /etc/letsencrypt/live/usage.jenkins.io/fullchain.pem (success)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

@dduportal
Copy link
Contributor Author

Confirmed certificate is renewed:

echo -n Q | openssl s_client -servername "usage.jenkins.io" -connect "usage.jenkins.io:443" \| openssl x509 -noout -dates    
# ...
notAfter=Apr 22 09:18:32 2025 GMT

@dduportal
Copy link
Contributor Author

Monitoring is back to green:

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant