Add k8s workload testing agent #669

stmcginnis · 2022-11-23T20:25:09Z

Issue number:

Closes: #616

Description of changes:

This adds a test agent for running cluster workload tests. These are tests that run some sort of workload on the cluster to verify system functionality.

Testing done:

Created test container image, manually stepped through the sonobuoy commands in the agent and verified the job runs and is successful on aws-k8s-nvidia nodes.
Followed the developer instructions for running a test, replacing example-test-agent with k8s-workload-agent and its settings. Unable to run the NVIDIA tests against a local kind cluster, but this verified things were deployed and status reported back. This along with the previous step that verified the workload test itself indicates things would perform correctly in an actual test (non-kind) cluster.
More testing to come!

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

webern

Cool

Dockerfile

ecpullen · 2022-11-28T13:56:12Z

bottlerocket/agents/src/bin/k8s-workload-agent/main.rs

+      plugins:
+      - name: nvidia-workload
+        image: testsys-nvidia-workload-test:v0.0.3
+    image: <your k8s-workload-agent image URI>


Should we put the expected image URI?

We don't include that in any of the other test agent examples. Maybe after this we can go through and update them all of we have a concrete image URI for all agents.

bottlerocket/agents/src/sonobuoy.rs

ecpullen · 2022-11-28T14:05:04Z

bottlerocket/agents/src/workload.rs

+    let kubeconfig_arg = vec!["--kubeconfig", kubeconfig_path];
+    let status = Command::new(SONOBUOY_BIN_PATH)
+        .args(kubeconfig_arg.to_owned())
+        .arg("run")
+        .arg("--wait")
+        .arg("--namespace")
+        .arg("testsys-workload")
+        .args(plugin_test_args)
+        .status()
+        .context(error::WorkloadProcessSnafu)?;
+    info!("Workload testing has completed, checking results");
+


It probably shouldn't be in this pr, but it would be nice to not use --wait on the run command incase connection to the cluster is broken for a short period of time. It would be better to sonobuoy run and then routinely check sonobuoy status. (This also should be fixed in the sonobuoy agent)

bottlerocket/types/src/agent_config.rs

bottlerocket/agents/src/bin/k8s-workload-agent/main.rs

webern · 2022-11-28T19:02:28Z

bottlerocket/agents/src/sonobuoy.rs

+    let mut num_failed: u64 = 0;
+    let mut num_skipped: u64 = 0;
+    let mut progress = Vec::new();
+    let mut outcome_summary = HashMap::from([


I don't love this map. A struct that implements default would be nicer imo.

webern · 2022-11-28T19:04:30Z

bottlerocket/agents/src/sonobuoy.rs

+        ("pass", 0),
+        ("passed", 0),
+        ("fail", 0),
+        ("failed", 0),
+        ("timeout", 0),
+        ("timed-out", 0),


In the theoretical struct you wouldn't need separate fields for the alternate spellings.

I may need a clearer picture of what you're envisioning here. It looks like plugins can report either variation for each result, so somewhere we need to evaluate them.

My thought with this structure was we wouldn't have to check each value every time, just collect it all up and get the totals at the end.

webern · 2022-11-28T19:07:45Z

bottlerocket/agents/src/sonobuoy.rs

+        .context(error::MissingSonobuoyStatusFieldSnafu { field: "plugins" })?;
+
+    for result in plugin_results {
+        let plugin = result


Do we have somewhere in TestResults that we could report the list of plugins that reported failuers? I guess we could put in OtherInfo along with the progress report(s)?

So maybe something like:

if (results_status == "fail" || results_status == "failed") && progress_status.is_empty() { progress.push(format!("{}: Failed", plugin); }

bottlerocket/agents/src/sonobuoy.rs

webern · 2022-11-28T19:27:21Z

bottlerocket/agents/src/workload.rs

+        );
+
+        // Write out the output to a file we can reference later
+        let mut f = File::create(plugin_yaml.clone()).context(error::WorkloadProcessSnafu)?;


Seeing this path could save somebody some troubleshooting time someday.

Suggested change

let mut f = File::create(plugin_yaml.clone()).context(error::WorkloadProcessSnafu)?;

let plugin_yaml = PathBuf::from(".")

.join(format!("{}-plugin.yaml", plugin.name))

.canonicalize()

.context(error::BadPathSnafu{ path: plugin_yaml.clone())?;

let mut f = File::create(&plugin_yaml).context(error::FileWrite{ path: plugin_yaml.clone()})?;

Done, with some slight modifications to make this work right. Please confirm though.

bottlerocket/agents/src/workload.rs

ecpullen

One small nit

bottlerocket/agents/src/bin/sonobuoy-test-agent/main.rs

bottlerocket/agents/src/sonobuoy.rs

bottlerocket/agents/src/workload.rs

webern

Looks like the "integration" tests are not happy.

This adds a test agent for running cluster workload tests. These are tests that run some sort of workload on the cluster to verify system functionality. Signed-off-by: Sean McGinnis <[email protected]>

webern

Was the failed integ test run just a fluke?

stmcginnis · 2022-12-06T19:40:08Z

Was the failed integ test run just a fluke?

No, it was a legitimate unit test failure due to a quick "safe" change I had made in the last update. Sorry, I thought I had commented on this PR, but no comment here. (Where did I post that?!). If you look at the Compare from the last force-push, it was a small change due to an optional value we look for in the results.

stmcginnis marked this pull request as draft November 23, 2022 20:25

stmcginnis force-pushed the workload-testing branch from 70fa21f to 37baac9 Compare November 23, 2022 21:35

webern reviewed Nov 24, 2022

View reviewed changes

Dockerfile Show resolved Hide resolved

ecpullen reviewed Nov 28, 2022

View reviewed changes

webern reviewed Nov 28, 2022

View reviewed changes

stmcginnis force-pushed the workload-testing branch from 37baac9 to 7cfdb40 Compare November 29, 2022 22:45

stmcginnis marked this pull request as ready for review December 1, 2022 14:33

ecpullen approved these changes Dec 1, 2022

View reviewed changes

bottlerocket/agents/src/bin/sonobuoy-test-agent/main.rs Show resolved Hide resolved

bottlerocket/agents/src/sonobuoy.rs Outdated Show resolved Hide resolved

bottlerocket/agents/src/workload.rs Show resolved Hide resolved

etungsten approved these changes Dec 1, 2022

View reviewed changes

stmcginnis force-pushed the workload-testing branch from 7cfdb40 to 4184499 Compare December 1, 2022 18:06

webern suggested changes Dec 2, 2022

View reviewed changes

feature: Add workload testing agent

1e269d2

This adds a test agent for running cluster workload tests. These are tests that run some sort of workload on the cluster to verify system functionality. Signed-off-by: Sean McGinnis <[email protected]>

stmcginnis force-pushed the workload-testing branch from 4184499 to 1e269d2 Compare December 2, 2022 23:16

ecpullen requested a review from webern December 6, 2022 15:41

webern approved these changes Dec 6, 2022

View reviewed changes

stmcginnis merged commit e25fc10 into bottlerocket-os:develop Dec 6, 2022

stmcginnis deleted the workload-testing branch December 6, 2022 19:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add k8s workload testing agent #669

Add k8s workload testing agent #669

stmcginnis commented Nov 23, 2022 •

edited

Loading

webern left a comment

ecpullen Nov 28, 2022

stmcginnis Nov 29, 2022

ecpullen Nov 28, 2022

webern Nov 28, 2022

webern Nov 28, 2022

stmcginnis Nov 29, 2022

webern Nov 28, 2022

stmcginnis Nov 29, 2022

webern Nov 28, 2022

stmcginnis Nov 29, 2022

ecpullen left a comment

webern left a comment

webern left a comment

stmcginnis commented Dec 6, 2022

-        let mut f = File::create(plugin_yaml.clone()).context(error::WorkloadProcessSnafu)?;
+        let plugin_yaml =  PathBuf::from(".")
+             .join(format!("{}-plugin.yaml", plugin.name))
+             .canonicalize()
+             .context(error::BadPathSnafu{ path: plugin_yaml.clone())?;
+        let mut f = File::create(&plugin_yaml).context(error::FileWrite{ path: plugin_yaml.clone()})?;

Add k8s workload testing agent #669

Add k8s workload testing agent #669

Conversation

stmcginnis commented Nov 23, 2022 • edited Loading

webern left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ecpullen left a comment

Choose a reason for hiding this comment

webern left a comment

Choose a reason for hiding this comment

webern left a comment

Choose a reason for hiding this comment

stmcginnis commented Dec 6, 2022

stmcginnis commented Nov 23, 2022 •

edited

Loading