[DISCO-3210] [load test: abort] Use async gcs client for manifest #793

Herraj · 2025-02-13T22:04:21Z

References

JIRA: DISCO-3210

Description

PR Review Checklist

Put an x in the boxes that apply

This PR conforms to the Contribution Guidelines
The PR title starts with the JIRA issue reference, format example [DISCO-####], and has the same title (if applicable)
[load test: (abort|skip|warn)] keywords are applied to the last commit message (if applicable)
Documentation has been updated (if applicable)
Functional and performance test coverage has been expanded and maintained (if applicable)

Herraj · 2025-02-19T16:42:56Z

merino/providers/manifest/backends/filemanager.py

+            return GetManifestResultCode.FAIL, None
+        except ValidationError as val_err:
+            logger.error(f"Invalid manifest content: {val_err}")
+            return GetManifestResultCode.FAIL, None


As you can see I've trimmed out a lot of nested logic here but more importantly, I've gotten rid of the SKIP branch.

I'm trying to understand if we really need it? We return the same None data as we do with a FAIL but only the code is different. So logically, I'm seeing it as just another FAIL 🤔

Herraj · 2025-02-19T16:43:43Z

merino/providers/manifest/backends/manifest.py

-
-            case GetManifestResultCode.SKIP:
-                logger.info("Manifest data was not updated (SKIP).")
-                return result_code, None


Related to the above comment, we aren't doing much here with SKIP except have a different log.

Herraj · 2025-02-19T16:44:12Z

merino/providers/manifest/provider.py

@@ -75,12 +75,9 @@ async def _fetch_data(self) -> None:
                    }
                    self.last_fetch_at = time.time()

-                case GetManifestResultCode.SKIP:
-                    return None
-


see above two comments!

Herraj · 2025-02-19T16:47:26Z

merino/providers/manifest/backends/filemanager.py

        self.blob_name = blob_name
+        self.bucket = Bucket(storage=self.gcs_client, name=gcs_bucket_path)
+
+        # TODO figure out this
        self.blob_generation = 0


Trying to understand how this helps us? Do we still need it with the async logic? This ties into the SKIP logic for which I've added comments below!

yeah we should keep the SKIP because when we call get_blob, we can add a clause bucket.get_blob(blob_name, if_generation_not_match=blob_generation) so we're not fetching the same file if the version hasn't changed. it's a slight performance improvement

the SKIP code exists to catch cases where we didn't fetch a file because we didn't have an updated file to fetch.

misaniwere · 2025-02-20T09:00:18Z

merino/providers/manifest/backends/manifest.py

@@ -30,27 +29,22 @@ async def fetch(self) -> tuple[GetManifestResultCode, ManifestData | None]:
            (SKIP, None): If there's no new generation (blob unchanged).


we should probably update the docstring for this method

I'm wondering if we even need this method since all it does now is just call fetch_manifest_data. We could just take this out and rename the below method to fetch

[DISCO-3210] [load test: abort] Use async gcs client for manifest

6d1714c

Herraj self-assigned this Feb 13, 2025

Herraj added 5 commits February 13, 2025 17:10

clean up integ test

d72dafe

fix and refactor unit tests

79b0559

add new fixture and fight with unit tests

5fc07d0

fix formatting

0b7596e

refactor provider tests

ac70ba6

Herraj commented Feb 19, 2025

View reviewed changes

misaniwere reviewed Feb 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DISCO-3210] [load test: abort] Use async gcs client for manifest #793

[DISCO-3210] [load test: abort] Use async gcs client for manifest #793

Herraj commented Feb 13, 2025 •

edited by jira bot

Loading

Herraj Feb 19, 2025

Herraj Feb 19, 2025

Herraj Feb 19, 2025

Herraj Feb 19, 2025

misaniwere Feb 20, 2025 •

edited

Loading

misaniwere Feb 20, 2025

misaniwere Feb 20, 2025

misaniwere Feb 20, 2025

		@@ -30,27 +29,22 @@ async def fetch(self) -> tuple[GetManifestResultCode, ManifestData \| None]:
		(SKIP, None): If there's no new generation (blob unchanged).

[DISCO-3210] [load test: abort] Use async gcs client for manifest #793

Are you sure you want to change the base?

[DISCO-3210] [load test: abort] Use async gcs client for manifest #793

Conversation

Herraj commented Feb 13, 2025 • edited by jira bot Loading

References

Description

PR Review Checklist

Herraj Feb 19, 2025

Choose a reason for hiding this comment

Herraj Feb 19, 2025

Choose a reason for hiding this comment

Herraj Feb 19, 2025

Choose a reason for hiding this comment

Herraj Feb 19, 2025

Choose a reason for hiding this comment

misaniwere Feb 20, 2025 • edited Loading

Choose a reason for hiding this comment

misaniwere Feb 20, 2025

Choose a reason for hiding this comment

misaniwere Feb 20, 2025

Choose a reason for hiding this comment

misaniwere Feb 20, 2025

Choose a reason for hiding this comment

Herraj commented Feb 13, 2025 •

edited by jira bot

Loading

misaniwere Feb 20, 2025 •

edited

Loading