Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates from_functions and from_modules to use the same base "compile" #1275

Open
wants to merge 1 commit into
base: refactor/build-from-functions
Choose a base branch
from

Conversation

elijahbenizzy
Copy link
Collaborator

Buildng off the changes.

  • added with_functions in driver
  • added a base .compile function to replace from_functions, from_modules

TODO:

  • Add tests for driver-level changes
  • Wire through to hooks (as needed)
  • Maybe think about the observability/tracker piece (capturing code, etc...) -- right now it just won't show
  • Go through and change from_modules to compile in the tests/other places it's used.
  • Consider removing from_modules and from_functions (ensuring it's not used externally, it should be an internal API)
  • Wire through to async driver

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Reviewed everything up to 7d7fbfe in 1 minute and 27 seconds

More details
  • Looked at 243 lines of code in 4 files
  • Skipped 0 files when reviewing.
  • Skipped posting 9 drafted comments based on config settings.
1. pyproject.toml:1
  • Draft comment:
    Overall, the pyproject.toml looks well organized. A minor note: the dependency for furo has been simplified from a Git URL to just 'furo'. Verify that this meets your version requirements; if a current release is critical, consider pinning a version or adding a note in the docs.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
2. pyproject.toml:10
  • Draft comment:
    Ensure that 'requires-python = ">=3.8, <4"' meets your expected Python support. This range may need updating if planning for future Python 4 releases.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
3. hamilton/graph.py:736
  • Draft comment:
    In FunctionGraph.compile() and related factory methods, the parameter 'allow_module_overrides' is passed into compile_to_nodes via the mismatched parameter 'allow_node_leveL_overrides'. It might be worth standardizing these names to reduce confusion.
  • Reason this comment was not posted:
    Marked as duplicate.
4. hamilton/graph.py:243
  • Draft comment:
    The custom_style_function is checked to enforce that it only accepts keyword arguments. This validation is helpful, but it might also be useful to document the expected signature in the Sphinx docs so users know how to implement their own.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
5. hamilton/graph.py:393
  • Draft comment:
    In _get_node_style and _get_function_modifier_style, a series of if/elif clauses define style dictionaries. Consider refactoring to use a mapping/dictionary lookup so that adding new modifiers or node types can be done in a more DRY and declarative fashion.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
6. hamilton/graph.py:1096
  • Draft comment:
    The directional DFS traversal function creates and returns a tuple of sets (nodes and user_nodes). Consider adding more explicit type hints for parameters and return values to improve maintainability and readability.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
7. hamilton/graph.py:1147
  • Draft comment:
    The execute method of FunctionGraph delegates to graph_functions.execute_subdag. Ensure that any exceptions or edge cases (e.g. cyclic dependencies) are properly handled, and consider adding unit tests for this functionality if not already present.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
8. pyproject.toml:29
  • Draft comment:
    The dependency specifications and optional-dependencies in pyproject.toml are extensive. It would be helpful to document in the Sphinx docs the purpose of various optional extras (e.g. 'dask', 'experiments', 'pandera', etc.) so that users know when to install which extras.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
9. pyproject.toml:161
  • Draft comment:
    Consider reviewing pinned dependency versions (e.g. in the 'dev' and 'docs' extras) to ensure compatibility with the broader ecosystem; if there are known issues, document them in the release notes or docs.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_Yjpdua4T2avAbOob


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

*functions: List[Tuple[str, Callable]],
config: Dict[str, Any],
adapter: lifecycle_base.LifecycleAdapterSet = None,
fg: Optional["FunctionGraph"] = None,
allow_module_overrides: bool = False,
allow_node_leveL_overrides: bool = False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter name 'allow_node_leveL_overrides' appears to have an inconsistent capitalization. Consider renaming it (e.g. to 'allow_node_overrides') for consistency with other parts of the code and to avoid confusion.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 typo in the var name.

Comment on lines +1934 to +1943
def with_functions(self, *functions: FunctionType) -> "Builder":
"""Adds the specified functions to the list.
This can be called multiple times. If you have allow_module_overrides
set this will enabl overwriting modules or previously added functions.

:param functions:
:return: self
"""
self.functions.extend(functions)
return self
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want this? Why can't we just put it in the .with_modules?

E.g. this opens up issues of precedence when using both modules and functions.

@zilto
Copy link
Collaborator

zilto commented Feb 4, 2025

Some thoughts (no hard stance):

  • not ideal to have .with_modules() accept both modules and functions (that are not ModuleType)
  • having two entrypoints .with_modules() and .with_functions() adds burden for backwards compatibility. It's a bit more verbose when using both, but clearer (easier to test, easier to type annotate)
  • wasn't a fan of allow_module_override because it isn't related to modules. As the docstring indicates, it has to do with duplicate function names
  • would unify allow_node_overrides and allow_node_level_overrides naming
  • the new allow_node_overrides act at the node-level whereas allow_module_overrides acted at the function-level. The two are not equivalent. For example, module1.foo() and module2.foo() where module2.foo() has a @parameterize decorator. With function-level (current behavior) module1.foo() would be overriden / disappear. With node-level override (this PR), module1.foo() could coexist with the parameterized nodes produced by module2.foo().
    • Could have particularly weird cases if both module1.foo() and module2.foo() use @pipe operators that could accidentally collide
    • Function-level overrides seems generally safer / more desirable, but it can be manually achieved (e.g., iterate over function modules, and filter them by name). OTOH, node-level overrides needs to be enabled by the framework. I think it's ok to break backwards compatibility on such a narrow edge case.

I'm a big fan of this change. I think it will make testing Hamilton easier, but also testing for end users

@elijahbenizzy elijahbenizzy changed the title Updates from_functions and from_molues to use the same base "compile" Updates from_functions and from_modules to use the same base "compile" Feb 4, 2025
@elijahbenizzy
Copy link
Collaborator Author

Some thoughts (no hard stance):

  • not ideal to have .with_modules() accept both modules and functions (that are not ModuleType)

  • having two entrypoints .with_modules() and .with_functions() adds burden for backwards compatibility. It's a bit more verbose when using both, but clearer (easier to test, easier to type annotate)

  • wasn't a fan of allow_module_override because it isn't related to modules. As the docstring indicates, it has to do with duplicate function names

  • would unify allow_node_overrides and allow_node_level_overrides naming

  • the new allow_node_overrides act at the node-level whereas allow_module_overrides acted at the function-level. The two are not equivalent. For example, module1.foo() and module2.foo() where module2.foo() has a @parameterize decorator. With function-level (current behavior) module1.foo() would be overriden / disappear. With node-level override (this PR), module1.foo() could coexist with the parameterized nodes produced by module2.foo().

    • Could have particularly weird cases if both module1.foo() and module2.foo() use @pipe operators that could accidentally collide
    • Function-level overrides seems generally safer / more desirable, but it can be manually achieved (e.g., iterate over function modules, and filter them by name). OTOH, node-level overrides needs to be enabled by the framework. I think it's ok to break backwards compatibility on such a narrow edge case.

I'm a big fan of this change. I think it will make testing Hamilton easier, but also testing for end users

Ok, to clarify, I'm pretty sure allow_module_overrides is not horribly named, it's just confusing:

  1. Overrides are always at a node level (see
    if n.name in nodes and not allow_module_overrides:
    )
  2. What this does is it allows functions/nodes from one module to override those from another
  3. In the standard case (1:1 fn:node) mapping, this means that it's the same as function-level overrides
  4. In the non-standard (decorator) case (1:n fn:node) this means it's done at a node level
  5. The naming makes sense because everything is specified as modules, so in the base case it means that "nodes from one module can override those from another". As in, we're not overriding modules we're overriding with modules -- hence allow_module_overrides.

But yeah, confusing. IMO, allow_overrides is probably better, but I think it makes sense to push it down to the node level. Just don't want to use node in the public-facing API.

Also, FunctionGraph is not public facing, whereas driver is -- to check, does having it at the driver level unblock what you've been trying to do? compile would be an internal-facing API (and we'd likely kill from_functions and from_modules as they're also internal facing). When you look at it from a node perspective, I think this clears up some of your edge-cases.

@elijahbenizzy
Copy link
Collaborator Author

From an internal slack discussions (thanks @zilto):

Desired behavior:

  • .with_modules() is the main entrypoint, users pass code that's stored in versionable file
  • .allow_module_overrides() (current behavior) allows .with_modules(module_a, module_b) to have module_b override nodes from module_a
  • .with_functions() allows to build a DAG from functions alone or in combination with .with_modules(). Node names must be unique (current behavior)
  • .with_functions() + .allow_module_overrides() means the content from .with_functions() is applied last.
    • nodes produced by .with_functions() are overriden in order if duplicates

@elijahbenizzy elijahbenizzy force-pushed the build-from-function-changes branch from 7d7fbfe to 229c3de Compare February 4, 2025 17:15
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 229c3de in 25 seconds

More details
  • Looked at 243 lines of code in 4 files
  • Skipped 0 files when reviewing.
  • Skipped posting 7 drafted comments based on config settings.
1. pyproject.toml:29
  • Draft comment:
    Consider aligning version specifiers for dependencies consistently. For example, using an operator like '>=' might help prevent unexpected breakages later.
  • Reason this comment was not posted:
    Confidence changes required: 33%
    None
2. pyproject.toml:60
  • Draft comment:
    The optional dependency for 'furo' is now unpinned. For stability, consider pinning or specifying a version range.
  • Reason this comment was not posted:
    Confidence changes required: 33%
    None
3. pyproject.toml:68
  • Draft comment:
    Optional dependencies such as for Dask and others are defined well. Consider documenting in Sphinx docs how these extras are used.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    None
4. pyproject.toml:241
  • Draft comment:
    Dynamic versioning using hamilton.version looks good; ensure the attribute is maintained properly to avoid version mismatches.
  • Reason this comment was not posted:
    Confidence changes required: 0%
    None
5. pyproject.toml:32
  • Draft comment:
    Consider replacing the version constraint for typing_extensions. Instead of 'typing_extensions > 4.0.0', using 'typing_extensions>=4.0.0' is a more conventional expression of minimum version requirements.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
6. pyproject.toml:40
  • Draft comment:
    Verify that the 'dask-core' optional dependency is available on PyPI as a standalone package. If not, consider revising this extra to avoid potential installation issues.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
7. pyproject.toml:60
  • Draft comment:
    Optional: Consider pinning versions for some documentation-related dependencies (e.g. 'furo') in the [project.optional-dependencies] 'docs' extra to improve reproducibility of builds.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    None

Workflow ID: wflow_s1B63DSbN9RtMQyp


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

function

This makes it so that we reduce code paths -- we'll eventually use
compile for everything. This also allows us to push functions up to the
user level, enabling `with_functions`

Note the following desired behavior:

Desired behavior:

- .with_modules() is the main entrypoint, users pass code that's stored in versionable file
- .allow_module_overrides() (current behavior) allows .with_modules(module_a, module_b) to have module_b override nodes from module_a
- .with_functions() allows to build a DAG from functions alone or in combination with .with_modules(). Node names must be unique (current behavior)
- .with_functions() + .allow_module_overrides() means the content from .with_functions() is applied last.
    - nodes produced by .with_functions() are overriden in order if duplicates
@elijahbenizzy elijahbenizzy force-pushed the build-from-function-changes branch from 229c3de to 18442ee Compare February 4, 2025 21:49
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Incremental review on 18442ee in 1 minute and 22 seconds

More details
  • Looked at 242 lines of code in 4 files
  • Skipped 0 files when reviewing.
  • Skipped posting 6 drafted comments based on config settings.
1. hamilton/graph.py:718
  • Draft comment:
    The static methods FunctionGraph.compile, from_modules and from_functions now all delegate to the same underlying implementation. Please add or update Sphinx documentation (e.g., in docs/api.rst or docs/graph.rst) to document these unified entry points and explain the intended internal API changes.
  • Reason this comment was not posted:
    Comment was on unchanged code.
2. pyproject.toml:60
  • Draft comment:
    The dependency entry for 'furo' has been changed from a Git URL to a PyPI package. Please confirm that this matches your intended release strategy, as the Git ref might have served to get the latest preview features.
  • Reason this comment was not posted:
    Comment did not seem useful.
3. hamilton/graph.py:64
  • Draft comment:
    Consider updating the error message in add_dependency to use function names (e.g. f.name) rather than printing raw function objects. This will help improve clarity when duplicate or incompatible types are detected.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
4. hamilton/graph.py:172
  • Draft comment:
    In compile_to_nodes, when a duplicate node is detected, the error message currently prints the conflicting function object. Consider formatting the message to use the function's name for better readability.
  • Reason this comment was not posted:
    Comment was on unchanged code.
5. hamilton/graph.py:286
  • Draft comment:
    Typo in the docstring for _get_input_label: 'formatted aspyer a table' should be corrected.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
6. pyproject.toml:32
  • Draft comment:
    Consider specifying the version constraint for typing_extensions using '>=' instead of '>' for clarity.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_HYJDmwHMqckFv93y


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@@ -142,17 +142,18 @@ def update_dependencies(
return nodes


def create_function_graph(
def compile_to_nodes(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new parameter 'allow_node_level_overrides' and corresponding error message look good. Consider clarifying in the docstring that this option controls node‐level duplicate detection, as opposed to module overrides.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants