You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Since the SGX device plugin (or SGX "support" for Kubernetes in general) was added, we've wanted to be able to enforce per-container sgx.intel.com/epc: <limit> values as the hard limits for how much each container gets to use Encrypted Page Cache (EPC) memory but the kernel has not provided the mechanisms for it.
Finally, the Linux kernel has work ongoing to use misc cgroup controller to limit sgx_epc usage. However, this work alone is not sufficient but we also need a mechanism to get sgx.intel.com/epc: <limit> values configured to containers' misc.max.
To get "sgx.intel.com/epc: 1Mi" passed over CRI to CRI-O/containerd, annotations can be used. Most (all?) deployments using the setup from this repository, end up using the SGX mutating webhook that is responsible for pod mutations during CREATE. The existing webhook can be modified to add the necessary "EPC limit" values as annotations.
NRI: have an NRI plug-in that registers to 'create container' events and updates config.json's "unified": { ... } with the values defined by the annotations. This depends on cgroup-v2 / unified hierarchy to be used by the nodes.
OCI hooks: have a custom EPC hook that writes to misc.maxbefore the container is started. The setup works for both cgroup-v1 and cgroup-v2 but containerd does not have proper support to enable OCI hooks. CDI (prepare device plugin API changes for CDI injection #1457) can be used but it's in early stages still.
The proposal is to start with 1.
For telemetry/monitoring, misc.current can be used to read per-container EPC statistics. For telemetry, the proposal is to start with cAdvisor+Prometheus+Gafana dashboards.
Since the SGX device plugin (or SGX "support" for Kubernetes in general) was added, we've wanted to be able to enforce per-container
sgx.intel.com/epc: <limit>
values as the hard limits for how much each container gets to use Encrypted Page Cache (EPC) memory but the kernel has not provided the mechanisms for it.Finally, the Linux kernel has work ongoing to use misc cgroup controller to limit
sgx_epc
usage. However, this work alone is not sufficient but we also need a mechanism to getsgx.intel.com/epc: <limit>
values configured to containers'misc.max
.The chain is roughly:
To get
"sgx.intel.com/epc: 1Mi"
passed over CRI to CRI-O/containerd, annotations can be used. Most (all?) deployments using the setup from this repository, end up using the SGX mutating webhook that is responsible for pod mutations duringCREATE
. The existing webhook can be modified to add the necessary "EPC limit" values as annotations.OCI (Linux) Runtime specification has limited configuration options to set
misc.max
values:config.json
's"unified": { ... }
with the values defined by the annotations. This depends on cgroup-v2 / unified hierarchy to be used by the nodes.misc.max
before the container is started. The setup works for both cgroup-v1 and cgroup-v2 butcontainerd
does not have proper support to enable OCI hooks. CDI (prepare device plugin API changes for CDI injection #1457) can be used but it's in early stages still.The proposal is to start with 1.
For telemetry/monitoring,
misc.current
can be used to read per-container EPC statistics. For telemetry, the proposal is to start with cAdvisor+Prometheus+Gafana dashboards.Tasks:
pkg/webhooks/sgx
to add NRI annotations: SGX: set EPC limits via NRI annotations #1582ValidatingAdmissionPolicy
using CEL language)misc stats
and prometheusFuture work:
Testing:
The text was updated successfully, but these errors were encountered: