Skip to content

Commit

Permalink
Merge branch 'master' into riscv64
Browse files Browse the repository at this point in the history
  • Loading branch information
chazapis committed Nov 1, 2024
2 parents cc2d56d + 4adcdf8 commit 7e82786
Show file tree
Hide file tree
Showing 50 changed files with 1,987 additions and 411 deletions.
2 changes: 1 addition & 1 deletion .drone.yml
Original file line number Diff line number Diff line change
Expand Up @@ -719,7 +719,7 @@ steps:
UPGRADE_CHANNEL="latest"
fi
fi
E2E_RELEASE_CHANNEL=$UPGRADE_CHANNEL go test -v -timeout=45m ./upgradecluster_test.go -ci -local
E2E_RELEASE_CHANNEL=$UPGRADE_CHANNEL go test -v -timeout=45m ./upgradecluster_test.go -ci -local -ginkgo.v
cp ./coverage.out /tmp/artifacts/upgrade-coverage.out
fi
- docker stop registry && docker rm registry
Expand Down
8 changes: 2 additions & 6 deletions .github/actions/vagrant-setup/action.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,26 +8,22 @@ runs:
run: |
curl -fsSL https://apt.releases.hashicorp.com/gpg | sudo gpg --dearmor -o /usr/share/keyrings/hashicorp-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list
sudo sed -i 's/^# deb-src/deb-src/' /etc/apt/sources.list
- name: Install vagrant and libvirt
shell: bash
run: |
sudo apt-get update
sudo apt-get install -y libvirt-daemon libvirt-daemon-system vagrant
sudo apt-get install -y libvirt-daemon libvirt-daemon-system vagrant ruby-libvirt
sudo systemctl enable --now libvirtd
- name: Build vagrant dependencies
- name: Install vagrant dependencies
shell: bash
run: |
sudo apt-get build-dep -y vagrant ruby-libvirt
sudo apt-get install -y --no-install-recommends libxslt-dev libxml2-dev libvirt-dev ruby-bundler ruby-dev zlib1g-dev
# This is a workaround for the libvirt group not being available in the current shell
# https://github.com/actions/runner-images/issues/7670#issuecomment-1900711711
- name: Make the libvirt socket rw accessible to everyone
shell: bash
run: |
sudo chmod a+rw /var/run/libvirt/libvirt-sock

- name: Install vagrant-libvirt plugin
shell: bash
run: vagrant plugin install vagrant-libvirt
4 changes: 3 additions & 1 deletion .github/workflows/e2e.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ on:
- "!tests/e2e**"
- "!tests/docker**"
- ".github/**"
- "!.github/actions/**"
- "!.github/workflows/e2e.yaml"
pull_request:
paths-ignore:
Expand All @@ -19,6 +20,7 @@ on:
- "!tests/e2e**"
- "!tests/docker**"
- ".github/**"
- "!.github/actions/**"
- "!.github/workflows/e2e.yaml"
workflow_dispatch: {}

Expand All @@ -33,7 +35,7 @@ jobs:
e2e:
name: "E2E Tests"
needs: build
runs-on: ubuntu-latest
runs-on: ubuntu-24.04
timeout-minutes: 40
strategy:
fail-fast: false
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/integration.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ jobs:
strategy:
fail-fast: false
matrix:
itest: [certrotation, etcdrestore, localstorage, startup, custometcdargs, etcdsnapshot, kubeflags, longhorn, secretsencryption, flannelnone]
itest: [certrotation, cacertrotation, etcdrestore, localstorage, startup, custometcdargs, etcdsnapshot, kubeflags, longhorn, secretsencryption, flannelnone]
max-parallel: 3
steps:
- name: Checkout
Expand All @@ -56,7 +56,7 @@ jobs:
run: |
chmod +x ./dist/artifacts/k3s
mkdir -p $GOCOVERDIR
sudo -E env "PATH=$PATH" go test -v -timeout=45m ./tests/integration/${{ matrix.itest }}/... -run Integration
sudo -E env "PATH=$PATH" go test -timeout=45m ./tests/integration/${{ matrix.itest }}/... -run Integration -ginkgo.v -test.v
- name: On Failure, Launch Debug Session
uses: lhotari/action-upterm@v1
if: ${{ failure() }}
Expand All @@ -71,4 +71,4 @@ jobs:
token: ${{ secrets.CODECOV_TOKEN }}
files: ./${{ matrix.itest }}.out
flags: inttests # optional
verbose: true # optional (default = false)
verbose: true # optional (default = false)
10 changes: 8 additions & 2 deletions .github/workflows/trivy.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -40,13 +40,19 @@ jobs:
make package-image
make tag-image-latest
- name: Download Rancher's VEX Hub report
run: curl -fsSO https://raw.githubusercontent.com/rancher/vexhub/refs/heads/main/reports/rancher.openvex.json

- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@0.24.0
uses: aquasecurity/trivy-action@0.28.0
with:
image-ref: 'rancher/k3s:latest'
format: 'table'
severity: "HIGH,CRITICAL"
output: "trivy-report.txt"
env:
TRIVY_VEX: rancher.openvex.json
TRIVY_SHOW_SUPPRESSED: true

- name: Upload Trivy Report
uses: actions/upload-artifact@v4
Expand Down Expand Up @@ -93,4 +99,4 @@ jobs:
steps:
- name: Report Failure
run: |
gh issue comment ${{ github.event.issue.number }} -b ":x: Trivy scan action failed, check logs :x:"
gh issue comment ${{ github.event.issue.number }} -b ":x: Trivy scan action failed, check logs :x:"
2 changes: 1 addition & 1 deletion .github/workflows/unitcoverage.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ permissions:
jobs:
test:
name: Unit Tests
runs-on: ubuntu-latest
runs-on: ubuntu-24.04
timeout-minutes: 20
steps:
- name: Checkout
Expand Down
4 changes: 2 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,9 @@ If you're interested in contributing documentation, please note the following:

If you're interested in contributing new tests, please see the [TESTING.md](./tests/TESTING.md).

## Code Convetion
## Code Convention

See the [code convetions documentation](./docs/contrib/code_conventions.md) for more information on how to write code for K3s.
See the [code conventions documentation](./docs/contrib/code_conventions.md) for more information on how to write code for K3s.

### Opening PRs and organizing commits
PRs should generally address only 1 issue at a time. If you need to fix two bugs, open two separate PRs. This will keep the scope of your pull requests smaller and allow them to be reviewed and merged more quickly.
Expand Down
4 changes: 2 additions & 2 deletions Dockerfile.dapper
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
ARG GOLANG=golang:1.22.6-alpine3.20
ARG GOLANG=golang:1.22.8-alpine3.20
FROM ${GOLANG}

# Set proxy environment variables
Expand All @@ -22,7 +22,7 @@ RUN apk -U --no-cache add \
RUN PIPX_BIN_DIR=/usr/local/bin pipx install awscli

# Install Trivy
ENV TRIVY_VERSION="0.55.2"
ENV TRIVY_VERSION="0.56.2"
RUN case "$(go env GOARCH)" in \
arm64) TRIVY_ARCH="ARM64" ;; \
amd64) TRIVY_ARCH="64bit" ;; \
Expand Down
2 changes: 1 addition & 1 deletion Dockerfile.local
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
ARG GOLANG=golang:1.22.6-alpine3.19
ARG GOLANG=golang:1.22.8-alpine3.19
FROM ${GOLANG} as infra

ARG http_proxy=$http_proxy
Expand Down
2 changes: 1 addition & 1 deletion Dockerfile.manifest
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
ARG GOLANG=golang:1.22.6-alpine3.20
ARG GOLANG=golang:1.22.8-alpine3.20
FROM ${GOLANG}

COPY --from=plugins/manifest:1.2.3 /bin/* /bin/
Expand Down
2 changes: 1 addition & 1 deletion Dockerfile.test
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
ARG GOLANG=golang:1.22.6-alpine3.20
ARG GOLANG=golang:1.22.8-alpine3.20
FROM ${GOLANG} as test-base

RUN apk -U --no-cache add bash jq
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ What's with the name?

We wanted an installation of Kubernetes that was half the size in terms of memory footprint. Kubernetes is a
10 letter word stylized as k8s. So something half as big as Kubernetes would be a 5 letter word stylized as
K3s. There is neither a long-form of K3s nor official pronunciation.
K3s. A '3' is also an '8' cut in half vertically. There is neither a long-form of K3s nor official pronunciation.

Is this a fork?
---------------
Expand Down
2 changes: 1 addition & 1 deletion channel.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Example channels config
channels:
- name: stable
latest: v1.30.5+k3s1
latest: v1.30.6+k3s1
- name: latest
latestRegexp: .*
excludeRegexp: (^[^+]+-|v1\.25\.5\+k3s1|v1\.26\.0\+k3s1)
Expand Down
58 changes: 50 additions & 8 deletions cmd/k3s/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,10 @@ var criDefaultConfigPath = "/etc/crictl.yaml"

// main entrypoint for the k3s multicall binary
func main() {
if findDebug(os.Args) {
logrus.SetLevel(logrus.DebugLevel)
}

dataDir := findDataDir(os.Args)

// Handle direct invocation via symlink alias (multicall binary behavior)
Expand Down Expand Up @@ -87,6 +91,24 @@ func main() {
}
}

// findDebug reads debug settings from the environment, CLI args, and config file.
func findDebug(args []string) bool {
debug, _ := strconv.ParseBool(os.Getenv(version.ProgramUpper + "_DEBUG"))
if debug {
return debug
}
fs := pflag.NewFlagSet("debug-set", pflag.ContinueOnError)
fs.ParseErrorsWhitelist.UnknownFlags = true
fs.SetOutput(io.Discard)
fs.BoolVarP(&debug, "debug", "", false, "(logging) Turn on debug logs")
fs.Parse(args)
if debug {
return debug
}
debug, _ = strconv.ParseBool(configfilearg.MustFindString(args, "debug"))
return debug
}

// findDataDir reads data-dir settings from the environment, CLI args, and config file.
// If not found, the default will be used, which varies depending on whether
// k3s is being run as root or not.
Expand Down Expand Up @@ -280,31 +302,45 @@ func extract(dataDir string) (string, error) {
return "", err
}

// Rename the new directory into place, before updating symlinks
if err := os.Rename(tempDest, dir); err != nil {
return "", err
}

// Create a stable CNI bin dir and place it first in the path so that users have a
// consistent location to drop their own CNI plugin binaries.
cniPath := filepath.Join(dataDir, "data", "cni")
cniBin := filepath.Join(dir, "bin", "cni")
if err := os.MkdirAll(cniPath, 0755); err != nil {
return "", err
}
// Create symlink that points at the cni multicall binary itself
logrus.Debugf("Creating symlink %s -> %s", filepath.Join(cniPath, "cni"), cniBin)
os.Remove(filepath.Join(cniPath, "cni"))
if err := os.Symlink(cniBin, filepath.Join(cniPath, "cni")); err != nil {
return "", err
}

// Find symlinks that point to the cni multicall binary, and clone them in the stable CNI bin dir.
ents, err := os.ReadDir(filepath.Join(dir, "bin"))
// Non-symlink plugins in the stable CNI bin dir will not be overwritten, to allow users to replace our
// CNI plugins with their own versions if they want. Note that the cni multicall binary itself is always
// symlinked into the stable bin dir and should not be replaced.
ents, err := os.ReadDir(filepath.Join(tempDest, "bin"))
if err != nil {
return "", err
}
for _, ent := range ents {
if info, err := ent.Info(); err == nil && info.Mode()&fs.ModeSymlink != 0 {
if target, err := os.Readlink(filepath.Join(dir, "bin", ent.Name())); err == nil && target == "cni" {
if err := os.Symlink(cniBin, filepath.Join(cniPath, ent.Name())); err != nil {
if target, err := os.Readlink(filepath.Join(tempDest, "bin", ent.Name())); err == nil && target == "cni" {
src := filepath.Join(cniPath, ent.Name())
// Check if plugin already exists in stable CNI bin dir
if info, err := os.Lstat(src); err == nil {
if info.Mode()&fs.ModeSymlink != 0 {
// Exists and is a symlink, remove it so we can create a new symlink for the new bin.
os.Remove(src)
} else {
// Not a symlink, leave it alone
logrus.Debugf("Not replacing non-symlink CNI plugin %s with mode %O", src, info.Mode())
continue
}
}
logrus.Debugf("Creating symlink %s -> %s", src, cniBin)
if err := os.Symlink(cniBin, src); err != nil {
return "", err
}
}
Expand All @@ -324,6 +360,12 @@ func extract(dataDir string) (string, error) {
return "", err
}

// Rename the new directory into place after updating symlinks, so that the k3s binary check at the start
// of this function only succeeds if everything else has been completed successfully.
if err := os.Rename(tempDest, dir); err != nil {
return "", err
}

return dir, nil
}

Expand Down
96 changes: 96 additions & 0 deletions docs/adrs/remove-svclb-daemonset.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
# Remove svclb daemonset

Date: 2024-09-26

## Status

Not approved

## Context

There are three types of services in Kubernetes:
* ClusterIP
* NodePort
* LoadBalancer

If we want to expose the service to external clients, i.e. clients outside of the Kubernetes cluster, we need to use NodePort or Loadbalancer types. The latter uses an externalIP, normally a publicIP, which can be easily reached from external clients. To support Loadbalancer service types, an external controller (loadbalancer controller) is required.

The loadbalancer controller takes care of three tasks:
1 - Watches the kube-api for services of type LoadBalancer
2 - Sets up the infrastructure to provide the connectivity (externalIP ==> service)
3 - Sets the externalIP

K3s embeds a simple [loadbalancer controller](https://github.com/k3s-io/k3s/tree/master/pkg/cloudprovider) that we call svclb, which has been part of K3s since its inception. When a new service of type LoadBalancer comes up, this svclb [creates a daemonset](https://github.com/k3s-io/k3s/blob/master/pkg/cloudprovider/loadbalancer.go#L35). That daemonset uses [hostPort](https://github.com/k3s-io/k3s/blob/master/pkg/cloudprovider/servicelb.go#L526-L531) to reserve the service port in all nodes. Subsequently, the serviceLB controller queries the daemonset pods [to know the node ips](https://github.com/k3s-io/k3s/blob/master/pkg/cloudprovider/servicelb.go#L291) and sets those node ips as [the externalIPs for the service](https://github.com/k3s-io/k3s/blob/master/pkg/cloudprovider/servicelb.go#L299)

When an external client wants to reach the service, it needs to point to any of the node ips and use the service port. The flow of traffic would be the following:
1 - Traffic reaches the node
2 - Because hostport is reserving the service port in the node, traffic is forwarded to the daemonset pod
3 - The daemonset pod, [using klipper-lb image](https://github.com/k3s-io/klipper-lb), applies some iptables magic which replaces the destination IP with the clusterIP of the desired service
4 - Traffic gets routed to the service using regular kubernetes networking

However, after some investigation, it was found that traffic is never reaching the daemonset pod. The reason for this is that when a service gets an externalIP, kube-proxy reacts to this and adds a new rule in iptables chain `KUBE-SERVICES`. This rule also replaces the destination IP with the clusterIP of the desired service. Moreover, the `KUBE-SERVICES` chain comes before the hostPort logic and hence this is the path the traffic takes.

EXAMPLE:

Imagine a two node cluster. The traefik service uses type LoadBalancer for two ports: 80 and 443. It gets 4 external ips (2 IPv4 and 2 IPv6)
```
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default kubernetes ClusterIP 10.43.0.1 <none> 443/TCP 56m
kube-system kube-dns ClusterIP 10.43.0.10 <none> 53/UDP,53/TCP,9153/TCP 56m
kube-system metrics-server ClusterIP 10.43.55.117 <none> 443/TCP 56m
kube-system traefik LoadBalancer 10.43.206.216 10.1.1.13,10.1.1.16,fd56:5da5:a285:eea0::6,fd56:5da5:a285:eea0::8 80:30235/TCP,443:32373/TCP 56m
```

In iptables, in the chain OUTPUT, we can observe that the `KUBE-SERVICES` chain comes before the `CNI-HOSTPORT-DNAT`, which is the chain taking care of the hostport functionality:
```
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
KUBE-SERVICES all -- 0.0.0.0/0 0.0.0.0/0 /* kubernetes service portals */
CNI-HOSTPORT-DNAT all -- 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL
```

In the KUBE-SERVICES chain, we can observe that there is one rule for each of the external-IP & port pairs, which start with `KUBE-EXT-`:
```
Chain KUBE-SERVICES (2 references)
target prot opt source destination
KUBE-SVC-Z4ANX4WAEWEBLCTM tcp -- 0.0.0.0/0 10.43.55.117 /* kube-system/metrics-server:https cluster IP */ tcp dpt:443
KUBE-SVC-UQMCRMJZLI3FTLDP tcp -- 0.0.0.0/0 10.43.206.216 /* kube-system/traefik:web cluster IP */ tcp dpt:80
KUBE-EXT-UQMCRMJZLI3FTLDP tcp -- 0.0.0.0/0 10.1.1.13 /* kube-system/traefik:web loadbalancer IP */ tcp dpt:80
KUBE-EXT-UQMCRMJZLI3FTLDP tcp -- 0.0.0.0/0 10.1.1.16 /* kube-system/traefik:web loadbalancer IP */ tcp dpt:80
KUBE-SVC-CVG3OEGEH7H5P3HQ tcp -- 0.0.0.0/0 10.43.206.216 /* kube-system/traefik:websecure cluster IP */ tcp dpt:443
KUBE-EXT-CVG3OEGEH7H5P3HQ tcp -- 0.0.0.0/0 10.1.1.13 /* kube-system/traefik:websecure loadbalancer IP */ tcp dpt:443
KUBE-EXT-CVG3OEGEH7H5P3HQ tcp -- 0.0.0.0/0 10.1.1.16 /* kube-system/traefik:websecure loadbalancer IP */ tcp dpt:443
KUBE-SVC-NPX46M4PTMTKRN6Y tcp -- 0.0.0.0/0 10.43.0.1 /* default/kubernetes:https cluster IP */ tcp dpt:443
KUBE-SVC-JD5MR3NA4I4DYORP tcp -- 0.0.0.0/0 10.43.0.10 /* kube-system/kube-dns:metrics cluster IP */ tcp dpt:9153
KUBE-SVC-TCOU7JCQXEZGVUNU udp -- 0.0.0.0/0 10.43.0.10 /* kube-system/kube-dns:dns cluster IP */ udp dpt:53
KUBE-SVC-ERIFXISQEP7F7OF4 tcp -- 0.0.0.0/0 10.43.0.10 /* kube-system/kube-dns:dns-tcp cluster IP */ tcp dpt:53
KUBE-NODEPORTS all -- 0.0.0.0/0 0.0.0.0/0 /* kubernetes service nodeports; NOTE: this must be the last rule in this chain */ ADDRTYPE match dst-type LOCAL
```

Those `KUBE-EXT` chains, end up calling the rule starting with `KUBE-SVC-` which replaces the destination IP with the IP of one of pods implementing the service. For example:
```
Chain KUBE-EXT-CVG3OEGEH7H5P3HQ (4 references)
target prot opt source destination
KUBE-MARK-MASQ all -- 0.0.0.0/0 0.0.0.0/0 /* masquerade traffic for kube-system/traefik:websecure external destinations */
KUBE-SVC-CVG3OEGEH7H5P3HQ all -- 0.0.0.0/0 0.0.0.0/0
```

As a consequence, the traffic never gets into the svclb daemonset pod. This can be additionally demonstrated by running a tcpdump on the svclb daemonset pod and no traffic will appear. This can also be demonstrated by tracing the iptables flow, where we will see how traffic is following the described path.

Therefore, if we replace the logic to find the node IPs of the serviceLB controller by something which does not require the svclb daemonset, we could get rid of that daemonset since traffic is never reaching it. That replacement should be easy because in the end a daemonset means all nodes, so we could basically query kube-api to provide the IPs of all nodes.


## Decision

There is one use case where klipper-lb is used. When deploying in a public cloud and using the publicIP as the --node-external-ip, kube-proxy is expecting the publicIP to be the destination IP. However, public clouds are normally doing a DNAT, so the kube-proxy's rule will never be used because the incoming packet does not have the publicIP anymore. In that case, the packet is capable of reaching the service because of the hostPort functionality on the daemonset svclb pushing the packet to svclb and then, klipper-lb routing the packet to the service. Conclusion: klipper-lb is needed

## Consequences

### Positives
* Less resource consumption as we won't need one daemonset per LoadBalancer type of service
* One fewer repo to maintain (klipper-lb)
* Easier to understand flow of traffic

### Negatives
* Possible confusion for users that have been using this feature for a long time ("Where is the daemonset?") or users relying on that daemonset for their automation
* In today's solution, if two LoadBalancer type services are using the same port, it is rather easy to notice that things don't work because the second daemonset will not deploy, as the port is already being used by the first daemonset. Kube-proxy does not check if two services are using the same port and it will create both rules without any error. The service that gets its rules higher in the chain is the one that will be reached when querying $nodeIP:$port. Perhaps we could add some logic in the controller that warns users about a duplication of the pair ip&port
Loading

0 comments on commit 7e82786

Please sign in to comment.