Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add KaaS robustness feature tests #714

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

cah-patrickthiem
Copy link

@cah-patrickthiem cah-patrickthiem commented Aug 28, 2024

This PR will add tests for the K8s cluster robustness features defined in the scs standard: scs-0215-v1-robustness-features
Here is a detailed listing of what is tested:

SCS-0215-v1 Robustness Features Test Coverage
1. API Server Rate Limiting
Test_scs_0215_requestLimits

  • Verifies basic request limit configurations

  • Checks API server configuration for required settings

Test_scs_0215_minRequestTimeout

  • Validates min-request-timeout setting

  • Checks configuration in API server args

Test_scs_0215_eventRateLimit

  • Confirms EventRateLimit admission controller configuration

  • Verifies plugin is enabled in API server

Test_scs_0215_apiPriorityAndFairness

  • Checks APF feature gate enablement

  • Validates API server configuration for priority and fairness

Test_scs_0215_rateLimitValues

  • Verifies specific rate limit values

  • Checks recommended settings:

    • QPS: 5000

    • Burst: 20000

2. etcd Management
Test_scs_0215_etcdCompaction

  • Validates compaction configuration:

    • Mode: periodic

    • Retention: 8h

Test_scs_0215_etcdBackup

  • Verifies backup CronJobs setup

  • Checks backup configuration:

    • Hourly backups

    • Daily backups

    • Proper paths and schedules

3. Certificate Management
Test_scs_0215_certificateRotation

  • Check_Certificate_Rotation_Configuration:

    • Verifies kubelet certificate rotation settings

    • Validates serverTLSBootstrap and rotateCertificates

Check_Certificate_Controller:

  • Confirms cert-manager deployment

  • Validates certificate controller functionality

@cah-patrickthiem cah-patrickthiem self-assigned this Aug 28, 2024
@cah-patrickthiem cah-patrickthiem force-pushed the 549-testing-kaas-robustness-features branch from 4e8fc4d to 6d98860 Compare October 16, 2024 13:43
@mbuechse mbuechse linked an issue Nov 4, 2024 that may be closed by this pull request
3 tasks
@cah-patrickthiem cah-patrickthiem force-pushed the 549-testing-kaas-robustness-features branch from 5c2f787 to d0c4d95 Compare November 15, 2024 11:43
@cah-patrickthiem cah-patrickthiem force-pushed the 549-testing-kaas-robustness-features branch from d0c4d95 to cbcca65 Compare November 15, 2024 11:44
@cah-patrickthiem
Copy link
Author

For reference, here the successful test logs of sonobuoy:

cat results/plugins/scs-kaas-conformance/sonobuoy_results.yaml | yq
name: scs-kaas-conformance
status: passed
meta:
type: summary
items:

  • name: out.json
    status: passed
    meta:
    file: results/global/out.json
    type: file
    items:
    • name: Test_scs_0200_smoke
      status: passed
    • name: Test_scs_0215_requestLimits/Check_Request_Limit_Configuration
      status: passed
    • name: Test_scs_0215_requestLimits
      status: passed
    • name: Test_scs_0215_minRequestTimeout/Check_minRequestTimeout_Configuration
      status: passed
    • name: Test_scs_0215_minRequestTimeout
      status: passed
    • name: Test_scs_0215_eventRateLimit/Check_EventRateLimit_Configuration
      status: passed
    • name: Test_scs_0215_eventRateLimit
      status: passed
    • name: Test_scs_0215_apiPriorityAndFairness/Check_APF_Configuration
      status: passed
    • name: Test_scs_0215_apiPriorityAndFairness
      status: passed
    • name: Test_scs_0215_rateLimitValues/Check_Rate_Limit_Values
      status: passed
    • name: Test_scs_0215_rateLimitValues
      status: passed
    • name: Test_scs_0215_etcdCompaction/Check_Etcd_Compaction_Settings
      status: passed
    • name: Test_scs_0215_etcdCompaction
      status: passed
    • name: Test_scs_0215_etcdBackup/Check_Etcd_Backup_Configuration
      status: passed
    • name: Test_scs_0215_etcdBackup
      status: passed
    • name: Test_scs_0215_certificateRotation/Check_Certificate_Controller
      status: passed
    • name: Test_scs_0215_certificateRotation
      status: passed

[Displaying results...]
sonobuoy results *.tar.gz
Plugin: scs-kaas-conformance
Status: passed
Total: 17
Passed: 17
Failed: 0
Skipped: 0

@cah-patrickthiem
Copy link
Author

cah-patrickthiem commented Nov 15, 2024

In order to make the tests pass on your K8s cluster, you would need to apply the following configurations:

  1. API Server Configuration
    Location: /etc/kubernetes/manifests/kube-apiserver.yaml
apiVersion: v1
kind: Pod
metadata:
  name: kube-apiserver
  namespace: kube-system
spec:
  containers:
  - command:
    - kube-apiserver
    # Admission Control
    - --enable-admission-plugins=NodeRestriction,EventRateLimit
    - --admission-control-config-file=/etc/kubernetes/admission-config.yaml
    # API Priority
    - --feature-gates=APIPriorityAndFairness=true
    - --enable-priority-and-fairness=true
  1. Admission Configuration
    Location: /etc/kubernetes/admission-config.yaml
# event-ratelimit-config.yaml
kind: Configuration
apiVersion: eventratelimit.admission.k8s.io/v1alpha1
limits:
- burst: 20000
  qps: 5000
  type: Server
  1. etcd Configuration
    Location: /etc/kubernetes/manifests/etcd.yaml
apiVersion: v1
kind: Pod
metadata:
  name: etcd
  namespace: kube-system
spec:
  containers:
  - command:
    - etcd
    - --auto-compaction-mode=periodic
    - --auto-compaction-retention=8h
  1. Kubelet Configuration
    Location: /var/lib/kubelet/config.yaml
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
serverTLSBootstrap: true
rotateCertificates: true
  1. Certificate Management
    Install cert-manager
    kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.12.0/cert-manager.yaml
  2. etcd Backup CronJobs
    etcd-cronjobs.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
 name: etcd-backup-hourly
spec:
 schedule: "0 * * * *"
 jobTemplate:
   spec:
     template:
       spec:
         containers:
         - name: etcd-backup
           image: k8s.gcr.io/etcd:3.5.9-0
           command:
           - /bin/sh
           - -c
           - |
             ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 \
               --cacert=/etc/kubernetes/pki/etcd/ca.crt \
               --cert=/etc/kubernetes/pki/etcd/server.crt \
               --key=/etc/kubernetes/pki/etcd/server.key \
               snapshot save /backup/etcd-snapshot-$(date +%Y%m%d-%H%M%S).db
           volumeMounts:
           - name: etcd-certs
             mountPath: /etc/kubernetes/pki/etcd
             readOnly: true
           - name: backup
             mountPath: /backup
         volumes:
         - name: etcd-certs
           hostPath:
             path: /etc/kubernetes/pki/etcd
             type: Directory
         - name: backup
           hostPath:
             path: /var/lib/etcd/backup/hourly
             type: DirectoryOrCreate
         restartPolicy: OnFailure
---
apiVersion: batch/v1
kind: CronJob
metadata:
 name: etcd-backup-daily
spec:
 schedule: "0 0 * * *"
 jobTemplate:
   spec:
     template:
       spec:
         containers:
         - name: etcd-backup
           image: k8s.gcr.io/etcd:3.5.9-0
           command:
           - /bin/sh
           - -c
           - |
             ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 \
               --cacert=/etc/kubernetes/pki/etcd/ca.crt \
               --cert=/etc/kubernetes/pki/etcd/server.crt \
               --key=/etc/kubernetes/pki/etcd/server.key \
               snapshot save /backup/etcd-snapshot-$(date +%Y%m%d).db
           volumeMounts:
           - name: etcd-certs
             mountPath: /etc/kubernetes/pki/etcd
             readOnly: true
           - name: backup
             mountPath: /backup
         volumes:
         - name: etcd-certs
           hostPath:
             path: /etc/kubernetes/pki/etcd
             type: Directory
         - name: backup
           hostPath:
             path: /var/lib/etcd/backup/daily
             type: DirectoryOrCreate
         restartPolicy: OnFailure
---
apiVersion: batch/v1
kind: CronJob
metadata:
 name: etcd-compaction
spec:
 schedule: "0 */8 * * *"
 jobTemplate:
   spec:
     template:
       spec:
         containers:
         - name: etcd-compaction
           image: k8s.gcr.io/etcd:3.5.9-0
           command:
           - /bin/sh
           - -c
           - |
             ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 \
               --cacert=/etc/kubernetes/pki/etcd/ca.crt \
               --cert=/etc/kubernetes/pki/etcd/server.crt \
               --key=/etc/kubernetes/pki/etcd/server.key \
               compact $(etcdctl endpoint status --write-out="json" | awk -F'"' '{print $4}')
           volumeMounts:
           - name: etcd-certs
             mountPath: /etc/kubernetes/pki/etcd
             readOnly: true
         volumes:
         - name: etcd-certs
           hostPath:
             path: /etc/kubernetes/pki/etcd
             type: Directory
         restartPolicy: OnFailure

Location: Apply via kubectl
kubectl apply -f etcd-cronjobs.yaml

@cah-patrickthiem cah-patrickthiem marked this pull request as ready for review November 15, 2024 11:56
@cah-patrickthiem
Copy link
Author

For reference, I used a self configured KubeAdm cluster to develop those tests.

@mbuechse
Copy link
Contributor

Impressive! I'm not sure I am competent to review it, but I will give it a shot. About these preconditions, wouldn't it be good to put them into a 'Testing and implementation notes' supplement? This can happen within this same PR.

@mbuechse
Copy link
Contributor

For reference, I used a self configured KubeAdm cluster to develop those tests.

Impressive again! Just for increased safety, could you please also test on moin once we have the necessary permissions?

@cah-patrickthiem
Copy link
Author

Impressive! I'm not sure I am competent to review it, but I will give it a shot. About these preconditions, wouldn't it be good to put them into a 'Testing and implementation notes' supplement? This can happen within this same PR.

I talked about including the configurations with @tonifinger. We came to the same conclusion. Also, my guess is that there will be more configuration snippets from the other tested features in other PRs.

@cah-patrickthiem
Copy link
Author

For reference, I used a self configured KubeAdm cluster to develop those tests.

Impressive again! Just for increased safety, could you please also test on moin once we have the necessary permissions?

Sure, I can do that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Testing] KaaS Robustness features
2 participants