Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOCA workflow support #1469

Merged
merged 30 commits into from
Feb 18, 2025
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
adb4152
DOCA workflow support
assumptionsandg Oct 8, 2024
e26298b
Try use kernel meta package
assumptionsandg Jan 15, 2025
9650481
Add DOCA install playbook
Jan 21, 2025
c271261
Release note
assumptionsandg Jan 21, 2025
9986dfc
Fix bad merge conflict
assumptionsandg Jan 21, 2025
2a93d4d
Fix whitespace
assumptionsandg Jan 21, 2025
fb78583
Use command instead
assumptionsandg Jan 21, 2025
f83772f
Default to false
assumptionsandg Jan 21, 2025
37fa44d
Merge branch 'stackhpc/2024.1' into ofed-fixes
assumptionsandg Jan 22, 2025
d224b74
Fix release note
assumptionsandg Jan 29, 2025
7b27767
Merge branch 'ofed-fixes' of github.com:stackhpc/stackhpc-kayobe-conf…
assumptionsandg Jan 29, 2025
f66506c
Create DOCA builder env
assumptionsandg Jan 30, 2025
931e82c
Disable DOCA by default in group_vars
assumptionsandg Jan 30, 2025
aed1a30
Fix environment
assumptionsandg Feb 4, 2025
4aa733b
Add Release Train documentation
assumptionsandg Feb 4, 2025
05b7150
FIx typos
assumptionsandg Feb 7, 2025
155d0ba
Update doc/source/contributor/ofed.rst
assumptionsandg Feb 10, 2025
7ed300b
Apply suggestions from code review
assumptionsandg Feb 10, 2025
39f74db
Update doc/source/contributor/ofed.rst
assumptionsandg Feb 10, 2025
af83055
Address review comments
assumptionsandg Feb 10, 2025
d744c46
Disable docker repo
assumptionsandg Feb 10, 2025
0706f6b
Fix package list
assumptionsandg Feb 10, 2025
e92d361
Test docker
assumptionsandg Feb 10, 2025
9766b10
Seed configure fix
assumptionsandg Feb 10, 2025
248270b
Update doc/source/contributor/ofed.rst
assumptionsandg Feb 11, 2025
7fdaab6
Reboot
assumptionsandg Feb 11, 2025
abb14a4
Fix reboot doc
assumptionsandg Feb 11, 2025
ce973ac
Default pulp ofed sync
assumptionsandg Feb 11, 2025
1e9f1ab
Update Pulp sync condition
assumptionsandg Feb 13, 2025
343014f
Fixup DOCA DNF install variable
assumptionsandg Feb 13, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 28 additions & 18 deletions .github/workflows/package-build-ofed.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
name: Build OFED packages
name: Build OFED kernel modules
on:
workflow_dispatch:
inputs:
Expand All @@ -19,11 +19,11 @@ on:

env:
ANSIBLE_FORCE_COLOR: True
KAYOBE_ENVIRONMENT: ci-builder
KAYOBE_ENVIRONMENT: ci-doca-builder
KAYOBE_VAULT_PASSWORD: ${{ secrets.KAYOBE_VAULT_PASSWORD }}
jobs:
overcloud-ofed-packages:
name: Build OFED packages
name: Build OFED kernel modules
if: github.repository == 'stackhpc/stackhpc-kayobe-config'
runs-on: arc-skc-host-image-builder-runner
permissions: {}
Expand All @@ -48,6 +48,11 @@ jobs:
BRANCH=$(awk -F'=' '/defaultbranch/ {print $2}' src/kayobe-config/.gitreview)
echo "openstack_release=${BRANCH}" | sed -E "s,(stable|unmaintained)/,," >> $GITHUB_OUTPUT

- name: Generate OFED tag
id: ofed_tag
run: |
echo "ofed_tag=$(date +%Y%m%dT%H%M%S)" >> $GITHUB_OUTPUT

- name: Clone StackHPC Kayobe repository
uses: actions/checkout@v4
with:
Expand Down Expand Up @@ -86,6 +91,7 @@ jobs:
id: image_tag
run: |
echo image_tag=$(grep stackhpc_rocky_9_overcloud_host_image_version: etc/kayobe/pulp-host-image-versions.yml | awk '{print $2}') >> $GITHUB_OUTPUT
working-directory: ${{ github.workspace }}/src/kayobe-config

# Use the image override if set, otherwise use overcloud-os_distribution-os_release-tag
- name: Output image name
Expand Down Expand Up @@ -145,13 +151,13 @@ jobs:

- name: Write Terraform outputs
run: |
cat << EOF > src/kayobe-config/etc/kayobe/environments/ci-builder/tf-outputs.yml
cat << EOF > src/kayobe-config/etc/kayobe/environments/ci-doca-builder/tf-outputs.yml
${{ steps.tf_outputs.outputs.stdout }}
EOF

- name: Write Terraform network config
run: |
cat << EOF > src/kayobe-config/etc/kayobe/environments/ci-builder/tf-network-allocation.yml
cat << EOF > src/kayobe-config/etc/kayobe/environments/ci-doca-builder/tf-network-allocation.yml
---
aio_ips:
builder: "{{ access_ip_v4.value }}"
Expand All @@ -176,37 +182,40 @@ jobs:
- name: Bootstrap the control host
run: |
source venvs/kayobe/bin/activate &&
source src/kayobe-config/kayobe-env --environment ci-builder &&
source src/kayobe-config/kayobe-env --environment ci-doca-builder &&
kayobe control host bootstrap

- name: Run growroot playbook
run: |
source venvs/kayobe/bin/activate &&
source src/kayobe-config/kayobe-env --environment ci-builder &&
kayobe playbook run src/kayobe-config/etc/kayobe/ansible/growroot.yml
source src/kayobe-config/kayobe-env --environment ci-doca-builder &&
kayobe playbook run src/kayobe-config/etc/kayobe/ansible/growroot.yml \
-e seed_bootstrap_user="cloud-user" \
-e controller_bootstrap_user="cloud-user" \
env:
KAYOBE_VAULT_PASSWORD: ${{ secrets.KAYOBE_VAULT_PASSWORD }}

- name: Configure the seed host (Builder VM)
run: |
source venvs/kayobe/bin/activate &&
source src/kayobe-config/kayobe-env --environment ci-builder &&
kayobe seed host configure --skip-tags network,docker
source src/kayobe-config/kayobe-env --environment ci-doca-builder &&
kayobe seed host configure \
--skip-tags network,docker,docker-registry
env:
KAYOBE_VAULT_PASSWORD: ${{ secrets.KAYOBE_VAULT_PASSWORD }}

- name: Run a distro-sync
run: |
source venvs/kayobe/bin/activate &&
source src/kayobe-config/kayobe-env --environment ci-builder &&
kayobe seed host command run --become --command "dnf distro-sync --refresh"
source src/kayobe-config/kayobe-env --environment ci-doca-builder &&
kayobe seed host command run --become --command "dnf distro-sync --refresh --assumeyes"
env:
KAYOBE_VAULT_PASSWORD: ${{ secrets.KAYOBE_VAULT_PASSWORD }}

- name: Reset BLS entries on the seed host
run: |
source venvs/kayobe/bin/activate &&
source src/kayobe-config/kayobe-env --environment ci-builder &&
source src/kayobe-config/kayobe-env --environment ci-doca-builder &&
kayobe playbook run src/kayobe-config/etc/kayobe/ansible/reset-bls-entries.yml \
-e "reset_bls_host=ofed-builder"
env:
Expand All @@ -215,32 +224,33 @@ jobs:
- name: Disable noexec in /var/tmp
run: |
source venvs/kayobe/bin/activate &&
source src/kayobe-config/kayobe-env --environment ci-builder &&
source src/kayobe-config/kayobe-env --environment ci-doca-builder &&
kayobe seed host command run --become --command "sed -i 's/noexec,//g' /etc/fstab"
env:
KAYOBE_VAULT_PASSWORD: ${{ secrets.KAYOBE_VAULT_PASSWORD }}

- name: Reboot to apply the kernel update
run: |
source venvs/kayobe/bin/activate &&
source src/kayobe-config/kayobe-env --environment ci-builder &&
source src/kayobe-config/kayobe-env --environment ci-doca-builder &&
kayobe playbook run src/kayobe-config/etc/kayobe/ansible/reboot.yml
env:
KAYOBE_VAULT_PASSWORD: ${{ secrets.KAYOBE_VAULT_PASSWORD }}

- name: Run OFED builder playbook
run: |
source venvs/kayobe/bin/activate &&
source src/kayobe-config/kayobe-env --environment ci-builder &&
source src/kayobe-config/kayobe-env --environment ci-doca-builder &&
kayobe playbook run src/kayobe-config/etc/kayobe/ansible/build-ofed-rocky.yml
env:
KAYOBE_VAULT_PASSWORD: ${{ secrets.KAYOBE_VAULT_PASSWORD }}

- name: Run OFED upload playbook
run: |
source venvs/kayobe/bin/activate &&
source src/kayobe-config/kayobe-env --environment ci-builder &&
kayobe playbook run src/kayobe-config/etc/kayobe/ansible/push-ofed.yml
source src/kayobe-config/kayobe-env --environment ci-doca-builder &&
kayobe playbook run src/kayobe-config/etc/kayobe/ansible/push-ofed.yml \
-e "ofed_tag=${{ steps.ofed_tag.outputs.ofed_tag }}"
env:
KAYOBE_VAULT_PASSWORD: ${{ secrets.KAYOBE_VAULT_PASSWORD }}

Expand Down
85 changes: 62 additions & 23 deletions doc/source/contributor/ofed.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,19 +4,17 @@ OFED

Warning: Experimental workflow subject to change

This section documents the workflow for building OFED packages for Release train integration.

The workflow builds the OFED kernel modules against the latest available kernel in Release train
(as configured in SKC) and compiles them into RPM packages to be uploaded to Ark. Addtionally,
this workflow downloads the userspace OFED packages from the Nvidia repository and uploads these
to Ark.
The Nvidia DOCA framework is distributed as part of StackHPC Release Train for OFED driver support,
this repository is synced into Ark as part of the Release Train worfkflows, however to ensure
compatibility with Release Train packages, we are required to build OFED modules with support for
the latest Release Train kernel.

Workflow
========

The workflow uses workflow_dispatch to manually request an OFED build, which will deploy a builder
VM, apply kayobe config to the builder, upgrade the kernel, reboot, then run two Ansible playbooks
for building and uploading OFED to Ark.
for building and uploading OFED modules to Ark.

Pre-requisites
--------------
Expand All @@ -25,31 +23,72 @@ Before building OFED packages, the workflow will ensure that:

* A full distro-sync has taken place, ensuring the kernel is upgraded.

* The bootloader has been configured to use the latest kernel
* The bootloader has been configured to use the latest kernel (reset-bls-entries.yml)

* noexec is disabled in the temporary logical volume.

build-ofed
----------

Currently we only support building Rocky Linux 9 OFED packages.

In order to setup OFED, we're required to build kernel modules for the OFED drivers as
the kernels we provide in release train are unsupported by OFED. To accomplish this we
will need to use the doca-kernel-support from the doca-extra repository.
Currently we only support building Rocky Linux 9 OFED kerenl module packages.

We will need to instll dependencies in order to build the OFED kernel modules, and these
are installed at the beginning of the build playbook. We also install base and appstream
dependencies of userspace OFED packages here, this is intended to stop these dependencies
being pulled in later when we download the OFED packages from the doca-host repository.
The Build OFED module workflow will check that the filesystem is configured (noexec disabled)
to allow the DOCA build script to run. The workflow will also install any necessary dependencies
for the module build.

At the end of the playbook following the kernel module build, the OFED userspace packages
are downloaded from the upstream repository in order to upload these to Ark.
The build script will output a ``doca-kernel-repo`` RPM which contains all kernel modules built
as part of the workflow. When this RPM is installed, the repofile is created pointing to the
modules in `/usr/share/doca-host-<doca-version>/Modules/<kernel-version>/` on the host.

push-ofed
---------

As we're not syncing OFED from any upstream source, and are instead creating our own
repository of custom packages, we will be required to setup the Pulp distribution/publication
and upload the content directly to Ark. This playbook uses the Pulp CLI to upload the RPMs
to Ark.
As mentioned above, the DOCA repository is synced into the `doca` repository in Ark. This workflow
will upload the ``doca-kernel-repo`` RPM to a seperate repository named `doca-modules`. The version
for this repository is set in `pulp-repo-versions.yml` and is disabled for local pulp syncs by
default.

Install process
===============

Relase Train configuration
--------------------------

The DOCA kernel module repository will need to be synced to the local Pulp service. This can be enabled
in `ofed.yml`:

.. code-block:: yaml

stackhpc_pulp_sync_ofed_modules: true

With kernel module syncing enabled, the local Pulp can be synced with Ark by running:

.. code-block:: console

kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/pulp-repo-sync.yml
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/pulp-repo-publish.yml

DOCA repositories can be templated to hosts by running Kayobe host configure.

.. code-block:: console

kayobe overcloud host configure -t dnf

StackHPC DOCA kernel modules will require the latest kernel version available in Ark for
the current Rocky minor version. You should ensure that packages are up to date by running
a package update, which can also be limited to hosts in the `mlnx` group.

.. code-block:: console

kayobe overcloud host package update --packages "*" --limit mlnx

install-doca
------------

A playbook is provided to install DOCA on hosts in the `mlnx` group. Ensure this group
is configured to include the hosts you wish to install DOCA on. To run the install
playbook:

.. code-block:: console

kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/install-doca.yml
36 changes: 4 additions & 32 deletions etc/kayobe/ansible/build-ofed-rocky.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
- name: Build OFED packages
- name: Build OFED kernel modules
become: true
hosts: ofed-builder
gather_facts: false
Expand All @@ -22,7 +22,6 @@
- rpm-build
- automake
- patch
- kernel
- kernel-devel
- autoconf
- pciutils
Expand All @@ -37,37 +36,10 @@
- cmake-filesystem
- libnl3-devel
- python3-devel
- doca-extra
state: latest
update_cache: true

- name: Add DOCA host repository package
ansible.builtin.dnf:
name: "https://developer.nvidia.com/downloads/networking/secure/doca-sdk/DOCA_2.8/doca-host-2.8.0-204000_{{ stackhpc_pulp_doca_ofed_version }}_rhel9{{ stackhpc_pulp_repo_rocky_9_minor_version }}.x86_64.rpm"
disable_gpg_check: true

- name: Install DOCA extra packages
ansible.builtin.dnf:
name: doca-extra

- name: Create build directory
ansible.builtin.file:
path: /home/cloud-user/ofed
state: directory
mode: "0777"

- name: Set build directory
ansible.builtin.replace:
path: /opt/mellanox/doca/tools/doca-kernel-support
regexp: TMP_DIR=\$1
replace: TMP_DIR=/home/cloud-user/ofed

- name: Build OFED kernel modules
ansible.builtin.shell:
cmd: |
/opt/mellanox/doca/tools/doca-kernel-support

- name: Download OFED userspace packages
ansible.builtin.dnf:
name: doca-ofed-userspace
download_only: true
download_dir: /home/cloud-user/ofed
ansible.builtin.command:
cmd: /opt/mellanox/doca/tools/doca-kernel-support
28 changes: 28 additions & 0 deletions etc/kayobe/ansible/install-doca.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
---
- name: Install DOCA
become: true
hosts: mlnx
gather_facts: true
tasks:
- name: Get running kernel
ansible.builtin.command:
cmd: "uname -r"
register: kernel

- name: Install kernel repo
ansible.builtin.dnf:
name: doca-kernel-repo
state: latest
update_cache: true

- name: Ensure correct priority for DOCA modules
ansible.builtin.lineinfile:
line: "priority=-2"
insertafter: EOF
path: "/etc/yum.repos.d/doca-kernel-{{ kernel.stdout }}.repo"

- name: Install DOCA OFED
ansible.builtin.dnf:
name: doca-ofed
state: latest
update_cache: true
Loading
Loading