Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactored the download module to have reusable clients #817

Merged
merged 11 commits into from
Nov 4, 2024

Conversation

ethantang-db
Copy link
Contributor

@ethantang-db ethantang-db commented Oct 28, 2024

Description of changes:

Refactored the download.py from constantly remaking clients to reusing an existing client for the same object. This should substantially cut down the number of possible active clients which can cause issue with certain cloud providers.

Base run: streaming-oci-socket-testing-f7UHTM
Branch run: streaming-oci-socket-testing-Hq3QSQ

Issue #, if available:

Fixes #728

Merge Checklist:

Put an x without space in the boxes that apply. If you are unsure about any checklist, please don't hesitate to ask. We are here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

  • I have read the contributor guidelines
  • This is a documentation change or typo fix. If so, skip the rest of this checklist.
  • I certify that the changes I am introducing will be backward compatible, and I have discussed concerns about this, if any, with the MosaicML team.
  • I have updated any necessary documentation, including README and API docs (if appropriate).

Tests

  • I ran pre-commit on my change. (check out the pre-commit section of prerequisites)
  • I have added tests that prove my fix is effective or that my feature works (if appropriate).
  • I ran the tests locally to make sure it pass. (check out testing)
  • I have added unit and/or integration tests as appropriate to ensure backward compatibility of the changes.

Copy link
Collaborator

@snarayan21 snarayan21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is great! left a few comments, and would like @karan6181 and @XiaohanZhangCMU to also take a look.

streaming/base/storage/__init__.py Show resolved Hide resolved
streaming/base/storage/download.py Outdated Show resolved Hide resolved
streaming/base/storage/download.py Outdated Show resolved Hide resolved
streaming/base/storage/download.py Show resolved Hide resolved
streaming/base/storage/download.py Outdated Show resolved Hide resolved
tests/test_download.py Show resolved Hide resolved
tests/test_download.py Show resolved Hide resolved
streaming/multimodal/webvid.py Show resolved Hide resolved
streaming/base/stream.py Show resolved Hide resolved
tests/test_download.py Show resolved Hide resolved
streaming/base/storage/download.py Outdated Show resolved Hide resolved
streaming/base/storage/download.py Show resolved Hide resolved
streaming/base/storage/download.py Show resolved Hide resolved
streaming/base/storage/download.py Show resolved Hide resolved
streaming/base/storage/download.py Outdated Show resolved Hide resolved
streaming/base/storage/download.py Outdated Show resolved Hide resolved
tests/test_download.py Show resolved Hide resolved
tests/test_download.py Show resolved Hide resolved
Copy link
Collaborator

@snarayan21 snarayan21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I think this is mostly good with me. To make sure that we're not causing throughput / cache regressions, let's run the streaming regression tests off this branch. Will message you directions for that!

streaming/base/constant.py Outdated Show resolved Hide resolved
streaming/base/storage/download.py Show resolved Hide resolved
streaming/base/storage/download.py Show resolved Hide resolved
streaming/base/storage/download.py Show resolved Hide resolved
streaming/base/storage/download.py Outdated Show resolved Hide resolved
tests/test_download.py Show resolved Hide resolved
tests/test_download.py Show resolved Hide resolved
streaming/base/storage/download.py Outdated Show resolved Hide resolved
Copy link
Member

@XiaohanZhangCMU XiaohanZhangCMU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. thanks for the massive refactoring!

tests/test_download.py Show resolved Hide resolved
Copy link
Collaborator

@snarayan21 snarayan21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment on the doctest you had, doesn't seem like the clean_up() function does anything, which is why I'm kinda confused about the output...

Other than that, lgtm.

streaming/base/util.py Show resolved Hide resolved
@ethantang-db ethantang-db force-pushed the ethantang-db/stream_clients_refactoring branch from 827e7e5 to 3e9bc89 Compare November 4, 2024 19:37
Copy link
Collaborator

@karan6181 karan6181 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank You!

@ethantang-db ethantang-db merged commit 06b1d7f into main Nov 4, 2024
7 checks passed
@ethantang-db ethantang-db deleted the ethantang-db/stream_clients_refactoring branch November 4, 2024 21:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GCS: Mosiacml-streaming overloads the GCP metadata service when too many processes are used.
4 participants