Skip to content

Commit

Permalink
Add cni-repair-controller to linkerd-cni DaemonSet (#11699)
Browse files Browse the repository at this point in the history
Followup to linkerd/linkerd2-proxy-init#306
Fixes #11073

This adds the `reinitialize-pods` container to the `linkerd-cni`
DaemonSet, along with its config in `values.yaml`.

Also the `linkerd-cni`'s version is bumped, to contain the new binary
for this controller.
  • Loading branch information
alpeb authored and adleong committed Jan 18, 2024
1 parent 3af6563 commit 73a43f2
Show file tree
Hide file tree
Showing 5 changed files with 122 additions and 19 deletions.
14 changes: 12 additions & 2 deletions charts/linkerd2-cni/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ Kubernetes: `>=1.21.0-0`
| ignoreOutboundPorts | string | `""` | Default set of outbound ports to skip via iptables |
| image.name | string | `"cr.l5d.io/linkerd/cni-plugin"` | Docker image for the CNI plugin |
| image.pullPolicy | string | `"IfNotPresent"` | Pull policy for the linkerd-cni container |
| image.version | string | `"v1.2.2"` | Tag for the CNI container Docker image |
| image.version | string | `"v1.3.0"` | Tag for the CNI container Docker image |
| imagePullSecrets | list | `[]` | |
| inboundProxyPort | int | `4143` | Inbound port for the proxy container |
| logLevel | string | `"info"` | Log level for the CNI plugin |
Expand All @@ -43,7 +43,17 @@ Kubernetes: `>=1.21.0-0`
| proxyAdminPort | int | `4191` | Admin port for the proxy container |
| proxyControlPort | int | `4190` | Control port for the proxy container |
| proxyUID | int | `2102` | User id under which the proxy shall be ran |
| resources | object | `{"cpu":{"limit":"","request":""},"ephemeral-storage":{"limit":"","request":""},"memory":{"limit":"","request":""}}` | Resource requests and limits for linkerd-cni daemonset containers |
| repairController.enableSecurityContext | bool | `true` | Include a securityContext in the repair-controller container |
| repairController.enabled | bool | `false` | Enables the repair-controller container |
| repairController.logFormat | string | plain | Log format (`plain` or `json`) for the repair-controller container |
| repairController.logLevel | string | info | Log level for the repair-controller container |
| repairController.resources.cpu.limit | string | `""` | Maximum amount of CPU units that the repair-controller container can use |
| repairController.resources.cpu.request | string | `""` | Amount of CPU units that the repair-controller container requests |
| repairController.resources.ephemeral-storage.limit | string | `""` | Maximum amount of ephemeral storage that the repair-controller container can use |
| repairController.resources.ephemeral-storage.request | string | `""` | Amount of ephemeral storage that the repair-controller container requests |
| repairController.resources.memory.limit | string | `""` | Maximum amount of memory that the repair-controller container can use |
| repairController.resources.memory.request | string | `""` | Amount of memory that the repair-controller container requests |
| resources | object | `{"cpu":{"limit":"","request":""},"ephemeral-storage":{"limit":"","request":""},"memory":{"limit":"","request":""}}` | Resource requests and limits for linkerd-cni daemonset container |
| resources.cpu.limit | string | `""` | Maximum amount of CPU units that the cni container can use |
| resources.cpu.request | string | `""` | Amount of CPU units that the cni container requests |
| resources.ephemeral-storage.limit | string | `""` | Maximum amount of ephemeral storage that the cni container can use |
Expand Down
61 changes: 61 additions & 0 deletions charts/linkerd2-cni/templates/cni-plugin.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,14 @@ rules:
- apiGroups: [""]
resources: ["pods", "nodes", "namespaces", "services"]
verbs: ["list", "get", "watch"]
{{- if .Values.repairController.enabled }}
- apiGroups: [""]
resources: ["pods"]
verbs: ["delete"]
- apiGroups: ["events.k8s.io"]
resources: ["events"]
verbs: ["create"]
{{- end }}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
Expand Down Expand Up @@ -274,6 +282,59 @@ spec:
{{- if .Values.resources }}
{{- include "partials.resources" .Values.resources | nindent 8 }}
{{- end }}
{{- if .Values.repairController.enabled }}
# This container watches over pods whose linkerd-network-validator
# container failed, probably because of a race condition while setting up
# the CNI plugin chain, and deletes those pods so they can try acquiring a
# proper network config again
- name: repair-controller
image: {{ .Values.image.name -}}:{{- .Values.image.version }}
imagePullPolicy: {{ .Values.image.pullPolicy }}
{{- if .Values.repairController.enableSecurityContext }}
env:
- name: LINKERD_CNI_REPAIR_CONTROLLER_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: LINKERD_CNI_REPAIR_CONTROLLER_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
command:
- /usr/lib/linkerd/linkerd-cni-repair-controller
args:
- --admin-addr=0.0.0.0:9990
- --log-format
- {{ .Values.repairController.logFormat }}
- --log-level
- {{ .Values.repairController.logLevel }}
livenessProbe:
httpGet:
path: /live
port: admin-http
readinessProbe:
failureThreshold: 7
httpGet:
path: /ready
port: admin-http
initialDelaySeconds: 10
ports:
- containerPort: 9990
name: admin-http
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
privileged: false
readOnlyRootFilesystem: true
seccompProfile:
type: RuntimeDefault
{{- end }}
{{- if .Values.resources }}
{{- include "partials.resources" .Values.resources | nindent 8 }}
{{- end }}
{{- end }}
volumes:
{{- if ne .Values.destCNIBinDir .Values.destCNINetDir }}
- name: cni-bin-dir
Expand Down
54 changes: 38 additions & 16 deletions charts/linkerd2-cni/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ image:
# -- Docker image for the CNI plugin
name: "cr.l5d.io/linkerd/cni-plugin"
# -- Tag for the CNI container Docker image
version: "v1.2.2"
version: "v1.3.0"
# -- Pull policy for the linkerd-cni container
pullPolicy: IfNotPresent

Expand All @@ -71,22 +71,44 @@ imagePullSecrets: []

# -- Add additional initContainers to the daemonset
extraInitContainers: []
# - name: wait-for-other-cni
# image: busybox:1.33
# command:
# - /bin/sh
# - -xc
# - |
# for i in $(seq 1 180); do
# test -f /host/etc/cni/net.d/10-aws.conflist && exit 0
# sleep 1
# done
# exit 1
# volumeMounts:
# - mountPath: /host/etc/cni/net.d
# name: cni-net-dir

# -- Resource requests and limits for linkerd-cni daemonset containers
# The cni-repair-controller scans pods in each node to find those that have
# been injected by linkerd, and whose linkerd-network-validator container has
# failed. This is usually caused by a race between linkerd-cni and the CNI
# plugin used in the cluster. This controller deletes those failed pods so they
# can restart and rety re-acquiring a proper network config.
repairController:
# -- Enables the repair-controller container
enabled: false

# -- Log level for the repair-controller container
# @default -- info
logLevel: info
# -- Log format (`plain` or `json`) for the repair-controller container
# @default -- plain
logFormat: plain

# -- Include a securityContext in the repair-controller container
enableSecurityContext: true

resources:
cpu:
# -- Maximum amount of CPU units that the repair-controller container can use
limit: ""
# -- Amount of CPU units that the repair-controller container requests
request: ""
memory:
# -- Maximum amount of memory that the repair-controller container can use
limit: ""
# -- Amount of memory that the repair-controller container requests
request: ""
ephemeral-storage:
# -- Maximum amount of ephemeral storage that the repair-controller container can use
limit: ""
# -- Amount of ephemeral storage that the repair-controller container requests
request: ""

# -- Resource requests and limits for linkerd-cni daemonset container
resources:
cpu:
# -- Maximum amount of CPU units that the cni container can use
Expand Down
2 changes: 1 addition & 1 deletion cli/cmd/testdata/install_cni_helm_default_output.golden

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

10 changes: 10 additions & 0 deletions pkg/charts/cni/values.go
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,15 @@ type Resources struct {
EphemeralStorage Constraints `json:"ephemeral-storage"`
}

// RepairController contains the config for the repair-controller container
type RepairController struct {
Image Image `json:"image"`
LogLevel string `json:"logLevel"`
LogFormat string `json:"logFormat"`
EnableSecurityContext bool `json:"enableSecurityContext"`
Resources Resources `json:"resources"`
}

// Values contains the top-level elements in the cni Helm chart
type Values struct {
InboundProxyPort uint `json:"inboundProxyPort"`
Expand All @@ -60,6 +69,7 @@ type Values struct {
EnablePSP bool `json:"enablePSP"`
Privileged bool `json:"privileged"`
Resources Resources `json:"resources"`
RepairController RepairController `json:"repairController"`
}

// NewValues returns a new instance of the Values type.
Expand Down

0 comments on commit 73a43f2

Please sign in to comment.