Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flytekit remote updates for notebooks and versions #3109

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

kumare3
Copy link
Contributor

@kumare3 kumare3 commented Feb 5, 2025

Problems:

  1. Version had to be provided when registering certain entities from the notebook
  2. Execute worked, but register did not work in the same way
  3. In cases when same entity is registered in non interactive mode, remote entity was auto selected to be latest registered if version field was empty. This is not explicitly called latest as a special version
  4. Makes version handling uniform across all entities

Example cases, now works
Screenshot 2025-02-04 at 9 35 13 PM

Screenshot 2025-02-04 at 9 35 47 PM Screenshot 2025-02-04 at 9 36 12 PM

Summary by Bito

This PR implements standardized version handling in Flytekit's remote operations, introducing 'latest' as the default version option. The changes include refining the _resolve_version method for consistent version resolution and making version parameters optional with proper defaults. The update improves behavior across interactive and non-interactive modes, particularly enhancing user experience in notebook environments while improving error messaging and documentation.

Unit tests added: False

Estimated effort to review (1-5, lower is better): 2

@flyte-bot
Copy link
Contributor

flyte-bot commented Feb 5, 2025

Code Review Agent Run #7eedad

Actionable Suggestions - 1
  • flytekit/remote/remote.py - 1
Additional Suggestions - 1
  • flytekit/remote/remote.py - 1
    • Consider consolidating version check conditions · Line 166-167
Review Details
  • Files reviewed - 1 · Commit Range: 9e0af8a..9e0af8a
    • flytekit/remote/remote.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

AI Code Review powered by Bito Logo

@flyte-bot
Copy link
Contributor

flyte-bot commented Feb 5, 2025

Changelist by Bito

This pull request implements the following key changes.

Key Change Files Impacted
Feature Improvement - Enhanced Version Handling in Remote Operations

remote.py - Standardized version handling with 'latest' as default and improved version resolution logic

Comment on lines 2055 to 2067
def _resolve_version(
self, version: typing.Optional[str], entity: typing.Any, ss: SerializationSettings
) -> typing.Tuple[str, typing.Optional[PickledEntity]]:
if version is None and self.interactive_mode_enabled:
md5_bytes, pickled_target_dict = _get_pickled_target_dict(entity)
return self._version_from_hash(
md5_bytes, ss, entity.python_interface.default_inputs_as_kwargs, *self._get_image_names(entity)
), pickled_target_dict
elif version is not None:
return version, None
raise ValueError(
"Version must be provided when not in interactive mode. If you want to use latest version pass 'latest'"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider splitting version resolution logic

Consider extracting the version resolution logic into a separate method for better code organization. The _resolve_version method currently handles both version resolution and pickled entity management, which could be split for better maintainability.

Code suggestion
Check the AI-generated fix before applying
Suggested change
def _resolve_version(
self, version: typing.Optional[str], entity: typing.Any, ss: SerializationSettings
) -> typing.Tuple[str, typing.Optional[PickledEntity]]:
if version is None and self.interactive_mode_enabled:
md5_bytes, pickled_target_dict = _get_pickled_target_dict(entity)
return self._version_from_hash(
md5_bytes, ss, entity.python_interface.default_inputs_as_kwargs, *self._get_image_names(entity)
), pickled_target_dict
elif version is not None:
return version, None
raise ValueError(
"Version must be provided when not in interactive mode. If you want to use latest version pass 'latest'"
)
def _get_pickled_entity(self, entity: typing.Any) -> typing.Optional[PickledEntity]:
if not self.interactive_mode_enabled:
return None
md5_bytes, pickled_target_dict = _get_pickled_target_dict(entity)
return pickled_target_dict
def _resolve_version(
self, version: typing.Optional[str], entity: typing.Any, ss: SerializationSettings
) -> typing.Tuple[str, typing.Optional[PickledEntity]]:
if version is not None:
return version, None
if not self.interactive_mode_enabled:
raise ValueError(
"Version must be provided when not in interactive mode. If you want to use latest version pass 'latest'"
)
md5_bytes, pickled_target_dict = _get_pickled_target_dict(entity)
version = self._version_from_hash(
md5_bytes, ss, entity.python_interface.default_inputs_as_kwargs, *self._get_image_names(entity)
)
return version, pickled_target_dict

Code Review Run #7eedad


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

@flyte-bot
Copy link
Contributor

flyte-bot commented Feb 7, 2025

Code Review Agent Run #dda473

Actionable Suggestions - 2
  • flytekit/remote/remote.py - 2
Review Details
  • Files reviewed - 1 · Commit Range: 9e0af8a..e020bd1
    • flytekit/remote/remote.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

AI Code Review powered by Bito Logo

Comment on lines +809 to +821
def _resolve_version(
self, version: typing.Optional[str], entity: typing.Any, ss: SerializationSettings
) -> typing.Tuple[str, typing.Optional[PickledEntity]]:
if version is None and self.interactive_mode_enabled:
md5_bytes, pickled_target_dict = _get_pickled_target_dict(entity)
return self._version_from_hash(
md5_bytes, ss, entity.python_interface.default_inputs_as_kwargs, *self._get_image_names(entity)
), pickled_target_dict
elif version is not None:
return version, None
raise ValueError(
"Version must be provided when not in interactive mode. If you want to use latest version pass 'latest'"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding specific type hints

Consider adding type hints for entity parameter in _resolve_version(). The current implementation accepts typing.Any which could lead to runtime errors if invalid types are passed.

Code suggestion
Check the AI-generated fix before applying
 -    def _resolve_version(self, version: typing.Optional[str], entity: typing.Any, ss: SerializationSettings
 +    def _resolve_version(self, version: typing.Optional[str], entity: typing.Union[WorkflowBase, PythonTask], ss: SerializationSettings

Code Review Run #dda473


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

project=self.default_project,
domain=self.default_domain,
)

version, _ = self._resolve_version(version, entity, serialization_settings)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider moving version resolution earlier

Consider moving the version resolution logic before the serialization settings initialization to avoid potential inconsistencies between version and settings.

Code suggestion
Check the AI-generated fix before applying
 -            serialization_settings = SerializationSettings(
 -                image_config=ImageConfig.auto_default_image(),
 -                project=self.default_project,
 -                domain=self.default_domain,
 -            )
 -
 -            version, _ = self._resolve_version(version, entity, serialization_settings)
 +            version, _ = self._resolve_version(version, entity, None)
 +            serialization_settings = SerializationSettings(
 +                image_config=ImageConfig.auto_default_image(),
 +                project=self.default_project,
 +                domain=self.default_domain,
 +                version=version
 +            )

Code Review Run #dda473


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

@wild-endeavor
Copy link
Contributor

There's a breaking change here I wanted to highlight which may or may not be okay. I think it's technically more correct, but it is a change in behavior.

Assuming you have a simple file like this

from flytekit import task, workflow, ImageSpec, FlyteRemote, Config

image = ImageSpec(
    name="yt_public",
    builder="default",
    registry="ghcr.io/wild-endeavor",
    packages=["flytekit==v1.15.0b2"],
)

@task(container_image=image, enable_deck=True)
def print_hello():
    print("hello")

@workflow
def wf():
    print_hello()

if __name__ == "__main__":
    remote = FlyteRemote(Config.for_sandbox(), default_project="flytesnacks", default_domain="development")
    remote.register_workflow(wf, version="from_main_v4_pr")

the behavior on master is as follows

  • if the user does pyflyte register, fast register is used, and the code is copied into the tarball. Code is not built into the image.
  • if the user does python file.py, fast register is not used, and the code is built into the image.

The reason the code is built into the image in the second case is because the code currently is setting the project root. When the register workflow call triggers the downstream register task call, it gets set on the image spec, causing the user code to be built in.

With this pr, without setting project root, pyflyte register remains the same, but python file.py will no longer copy code - when the workflow is run, the task will fail.

If the user wants to make this work, they'd have to change to fast_register_workflow, which actually feels more correct.

I think we can make a better experience actually - the register_workflow function right now never does fast register... even if the user passes in a serialization settings object with fast register enabled, so there's some discrepancy here to clean up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants