Sart is Kubernetes network load-balancer and CNI plugin for Kubernetes using BGP in Rust. This project is inspired by Metallb and Coil.
Warning
This project is under experimental.
Sart has following progarams.
sart
- CLI tool to control and describe
sartd
.
- CLI tool to control and describe
sartd
- Daemon program that has following features.
- bgp: BGP daemon
- fib: FIB daemon
- agent: Kubernetes agent running as DaemonSet
- controller: Kubernetes custom controller running as Deployment
- Daemon program that has following features.
sart-cni
- CLI tool for CNI
- gRPC client to communicate with
sartd agent
Sart's BGP feature(sartd-bgp) is based on RFC 4271 - A Border Gateway Protocol 4 (BGP-4). And sartd-bgp is configurable via gRPC interface.
Sartd-bgp has following features. Now sartd-bgp support minimal features to work as BGP speaker.
- eBGP
- iBGP
- 4 Bytes ASN
- Policy injection
- Multiple paths
- IPv6 features
- BGP unnumbered
- Route Reflector
This figure shows a basic model of sartd-bgp.
Sartd-bgp is the event driven model.
A peer has one event handler to wait for given events. And peer has channels to handle some events. Events include the following.
- Message
- Timer
- Rib
- Admin
Once an event is generated, it is queued to the channel. Then the event handler retrieves and processes its event. After that, according to the result, it move the state of BGP FSM.
BGP has an independent FSM for each peer. This diagram shows how BGP states move by given events. Please refer RFC 4271: Section 8.2.2 for details such as event numbers.
stateDiagram-v2
Idle --> Connect: 1
Idle --> Idle: others
Connect --> Connect: 1,9,14
Connect --> Active: 18
Connect --> OpenSent: 16,17
Connect --> Idle: others
Active --> OpenSent: 16,17
Active --> OpenConfirm: 19
Active --> Connect: 9
Active --> Active: 1,14
Active --> Idle: others
OpenSent --> OpenSent: 1,14,16,17
OpenSent --> Active: 18
OpenSent --> OpenConfirm: 19
OpenSent --> Idle: others
OpenConfirm --> OpenConfirm: 1,11,14,16,17
OpenConfirm --> Established: 26
OpenConfirm --> Idle: others
Established --> Established: 1,11,26,27
Established --> Idle: others
This figure shows how sart-dbgp handle routing information. RIB-Manager has the event handler and wait for RIB event from peers.
When Adj-RIB-In gets some paths, its peer filters by Input policies, and publishes the event to store into the Loc-RIB. And Loc-RIB gets paths, Loc-RIB selects new best path and if the best path is changed, publishes the event to all peers. If a peer receives its event from RIB-Manager, it filter paths by Output policies, and store into Adj-RIB-Out.
TBD
Sart has Kubernetes network load-balancer feature.
This is named sartd-kubernetes
.
Sartd-kubernetes is Kubernetes custom controller.
Sartd-kubernetes consists of following two components.
controller
agent
controller
runs as Deployment
to manage non node local resources such as Service
and EndpointSlice
.
And this also manages admission webhooks.
This is responsible to control load-balancer addresses.
agent
runs as DaemonSet
on each node to control node local resources.
This interact with the node local BGP speaker to announce load-balancer addresses to external networks.
Sartd-kubernetes
defines following custom resources.
AddressPool
AddressBlock
ClusterBGP
NodeBGP
BGPPeer
BGPPeerTemplate
BGPAdvertisement
These CRD's API group and version is sart.terassyi.net
, v1alpha2
This figure shows the relationships between CRDs.
AddressPool
defines a CIDR usable in itself.
This has type
parameter to specify for which this pool will be use.
This also has allocType
parameter to choose a method to allocate.
apiVersion: sart.terassyi.net/v1alpha2
kind: AddressPool
metadata:
name: default-lb-pool
spec:
cidr: 10.0.1.0/24
type: service
allocType: bit
blockSize: 24
autoAssign: true
AddressBlock
defines a subset of the AddressPool
and its CIDR must be contained by AddressPool's CIDR.
The allocator entity is tied to AddressBlock
.
This is automatically generated by AddressPool
.
If the parent AddressPool
's type is service
, AddressPool
must be one for its pool.
apiVersion: sart.terassyi.net/v1alpha2
kind: AddressBlock
metadata:
name: non-default-lb-pool
spec:
autoAssign: false
cidr: 10.0.100.0/24
nodeRef: null
poolRef: non-default-lb-pool
type: service
ClusterBGP
defines a cluster wide BGP configurations.
We can select nodes to be configurable by nodeSelector
.
This can template node local ASN and Router-Id settings. And this also provide a way to template BGP peer settings.
apiVersion: sart.terassyi.net/v1alpha2
kind: ClusterBGP
metadata:
name: clusterbgp-sample-a
spec:
nodeSelector:
bgp: a
asnSelector:
from: label
routerIdSelector:
from: internalAddress
speaker:
path: 127.0.0.1:5000
peers:
- peerTemplateRef: bgppeertemplate-sample
nodeBGPSelector:
bgp: a
NodeBGP
defines a node local BGP configuration.
This is automatically generated by ClusterBGP
.
This is managed by agent
.
apiVersion: sart.terassyi.net/v1alpha2
kind: NodeBGP
metadata:
labels:
bgp: a
name: sart-worker
spec:
asn: 65000
peers:
- name: bgppeertemplate-sample-sart-worker-65000-9.9.9.9
...
routerId: 172.18.0.2
speaker:
path: 127.0.0.1:5000
BGPPeer
defines a local BGP peer abstraction.
This resource will be created by NodeBGP
and we can also create manually.
This is manged by agent
.
apiVersion: sart.terassyi.net/v1alpha2
kind: BGPPeer
metadata:
labels:
bgp: a
bgppeer.sart.terassyi.net/node: sart-worker
name: bgppeer-sample-sart-worker
spec:
addr: 9.9.9.9
asn: 65000
nodeBGPRef: sart-worker
speaker:
path: 127.0.0.1:5000
BGPPeerTemplate
defines a template BGP peer configuration.
apiVersion: sart.terassyi.net/v1alpha2
kind: BGPPeerTemplate
metadata:
name: bgppeertemplate-sample
spec:
asn: 65000
addr: 9.9.9.9
groups:
- to-router0
BGPAdvertisement
defines a prefix announced by BGPPeers.
This is automatically generated when the controller
allocates new addresses.
apiVersion: sart.terassyi.net/v1alpha2
kind: BGPAdvertisement
metadata:
labels:
kubernetes.io/service-name: app-svc-cluster
name: app-svc-cluster-hv4l5-10.0.1.0
namespace: test
spec:
attrs: null
cidr: 10.0.1.0/32
protocol: ipv4
type: service
Sartd-kubernetes
has IPAM for Service type LoadBalancer.
To allocate IP addresses to LoadBalancers, we can create pools of IP addresses.
To create pools for LoadBalancer, we have to apply AddressPool
resource with type: service
.
We can create multiple pools.
We create multiple pools in one cluster. If there are multiple pools, we can choose the pool from which the controller allocates to the LoadBalancer. Choosing the pool, we have to add the annotation.
When we specify multiple pools in the annotation, we can assign multiple addresses from specified pools.
We also define a default assignable pool to specify autoAssign: true
in the spec.
If we want to use the pool its autoAssign
is true, we can omit to specify its pool name in the annotation.
We cannot create multiple default assignable pools.
When we specify multiple pools, and one of specified pools is default assignable, we must specify its name.
We can assign specific addresses to the LoadBalancer. To specify it, we also use the annotation.
We can give multiple addresses from one pool or multiple pools.
AddressBlock
is automatically generated by AddressPool
and is a subset of a pool.
In case of a pool for LoadBalancer(type: service
), AddressBlock
is created only one by the pool.
Therefore, CIDR field of AddressBlock
must be equal to AddressPool
's CIDR.
And its allocator is managed by the controller
.
Sartd-kubernetes's controller
watches Service
and EndpointSlice
resources.
When it detects the event for LoadBalancer, it triggers the reconciliation logic.
First, when the LoadBalancer is created, the controller picks addresses from desired pools if addresses are requested, it picks its addresses or if not, automatically.
If its addresses are valid, the controller allocates these.
And it updates the status(status.loadBalancer.ingress[].ip
) of given Service
resource.
The controller creates BGPAdvertisement
resources depending on allocated addresses and the externalTrafficPolicy
of the LoadBalancer.
When externalTrafficPolicy
is Cluster
, advertisements must have all peers which matches group label.
If group label is not specified, all existing peers are picked.
When externalTrafficPolicy
is Local
, advertisements must have peers on the node where backend pods exist.
And target peers must be changed depending on the change of backend pods scheduling.
Sart has a CNI plugin feature for Kubernetes.
This feature is named sart-cni
.
Sart-cni related programs are below.
controller
agent
bgp
sartcni
controller
and agent
is the same used in sartd-kubernetes
.
sartcni
is CNI interface that is called by kubelet
.
Sart-cni is also the Custom Resource Definition(CRD) based architecture like sart-kubernetes. And sart-cni shares almost resources with sart-kubernetes.
The following figure shows the CRDs model of sart-cni.
The difference between sart-kubernetes's model is that AddressBlock
resources belong to nodes.
And agent
creates a BGPAdvertisement
resource that targets only peers that is on the same node as the node on its agent.
Sart-cni
has the IPAM feature for Pods.
To allocate IP addresses to pods, we have to create at least one AddressPool
resource with type: pod
.
We also create multiple pools for pods.
As described above, we can create multiple pools in one cluster. If there are multiple pools, we can choose the pool we want to use per a pod. To choose the pool, we have to specify the annotation to the pod.
Unlike the case of LoadBalancer, we cannot specify multiple pools to one pod.
Even if its annotation is not specified, we have to assign an address to a pod.
Therefore, we can make some pool default assignable to specify autoAssign: true
in the spec.
The default pool must be one in one cluster.
AddressBlock
is automatically created by AddressPool
and is a subset of its pool.
In case of a pool for Pod(type: pod
), AddressBlocks
are created according to the number of nodes and belong to its node.
A BGPAdvertisement
is created per AddressBlock
.
And its CIDR field is same as the AddressBlocks
's CIDR.
When some pod is scheduled on the some node, its address is picked from the AddressBlock
belongs to its node.
Sart-cni is implemented based on CNI Specification.
Supported CNI versions are v1.0.0
, v0.4.0
and v0.3.1
.
The program satisfy this specification is sartcni
.
Sartcni
is a stand-alone executable binary.
And this is called by kubelet
to create a pod and delegates a given request to sartd-agent
via gRPC API like following.
The flow looks like following.
- The admin creates the AddressPool.
- The user creates a pod.
- Kubelet on a node creates container processes.
- Kubelet(or container runtime) call
sart-cni
binary to set up the interface in the container. Sart-cni
calls gRPC API to thesartd-agent
on the same node to configure the interface.Sartd-agent
gets the desired pool to assign an IP address and refers to the block on the node.- If a block doesn't exist on the node,
sartd-agent
requests to allocate andsart-controller
creates it. Sartd-agent
assigns an IP address to the pod from the block and configures an interface and routing information.Sartd-agent
returns the result tosart-cni
andsart-cni
also propagates its result to the caller.