Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: enable handling of nested fields when injecting request_option in request body_json #201

Draft
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

ChristoGrab
Copy link
Collaborator

@ChristoGrab ChristoGrab commented Jan 6, 2025

WIP

What

To support GraphQL queries natively in the CDK, we need to add handling of nested structures when injecting a RequestOption into a request body (specifically with body_json injection).

How

  • The RequestOption class now includes an optional field_path param, and each instance must declare a field_path OR field_name.
  • The new inject_into_request method determines how to handle the injection based on the existense of field_path and the specific RequestOptionType. As stated above, only body_json injections will support nested fields.
  • The combine_mappings utility function now handles nested structures when checking for mapping conflicts.
  • The following components now call on inject_into_request rather than handling the logic themselves:
    1. ApiKeyAuthenticator
    2. DatetimeBasedCursor

Suggested review order

  1. request_option.py: Has the main updated logic for validation and the new method inject_into_request
  2. mapping_helpers.py: Added a couple helper methods and extended the main function for catching conflicts in nested structures.
  3. token.py and datetime_based_cursor.py: Small updates to leverage the new injection logic.
  4. declarative_component_schema.yaml and declarative_component_schema.py: Updated schema + model
  5. Everything else is unit testing

TODO

Update the remaining components that allow request_option injection to leverage the new logic. The list is:

  1. ListPartitionRouter
  2. SubstreamPartitionRouter
  3. HttpRequester
  4. DefaultPaginator
  5. DatetimeBasedRequestOptionsProvider
  6. SimpleRetriever

@github-actions github-actions bot added the enhancement New feature or request label Jan 6, 2025
@ChristoGrab ChristoGrab changed the title feat: handle nested feat: enable handling of nested field_paths in request_option injection Jan 6, 2025
@ChristoGrab ChristoGrab changed the title feat: enable handling of nested field_paths in request_option injection feat: enable handling of nested fields when injecting request_option in request body_json Jan 9, 2025
Copy link
Contributor

@maxi297 maxi297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding some comments on this PR even if it is draft. Poke me when you want me to check it again

@@ -1057,11 +1057,17 @@ class InjectInto(Enum):

class RequestOption(BaseModel):
type: Literal["RequestOption"]
field_name: str = Field(
...,
field_name: Optional[str] = Field(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we slowly deprecate this field in favor of field_path and therefore add this information in the description/title?

if self.field_name is not None and not isinstance(
self.field_name, (str, InterpolatedString)
):
raise TypeError(f"field_name expects a string, but got {type(self.field_name)}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is overly cautious and wouldn't validate that. In Python, any parameter can be of any type but we don't validate them everywhere. We have mypy for that which should hopefully catch most of these. I think at some point, there are readability concerns that we need to balance. Was there a specific gap we wanted to address?

The some comment applies to the self.field_path condition below.

# Convert field_name and field_path into InterpolatedString objects if they are strings
if self.field_name is not None:
self.field_name = InterpolatedString.create(self.field_name, parameters=parameters)
if self.field_path is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: since only field_name or field_path will be defined, we can probably use elif as elif self.field_path is not None: instead of if self.field_path is not None: to be more explicit

value: The value to inject
config: The config object to use for interpolation
"""
if self._is_field_path:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we simplify the logic by having in post_init:

        if self.field_name is not None:
            self.field_path = [InterpolatedString.create(self.field_name, parameters=parameters)]

This way, we would only have one logic to maintain and it would be the field_path one.

that string. Raise errors in the following cases:
* If there are duplicate keys across mappings
Combine multiple mappings into a single mapping, supporting nested structures.
If any of the mappings are a string, return that string. Raise errors in the following cases:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would precise that it is "If there only one mapping that is a string, it will return this mapping regardless of the other mappings".

That being said, I don't understand this behavior though. Maybe I'm wrong but before, I think it would fail if there was a string and a mapping, right?


# If any mapping is a string, return it
# Convert all mappings to flat (path, value) pairs for conflict detection
all_paths: List[List[tuple[List[str], Any]]] = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like the following logic is a bit complex. Would there be a way to simplify this? Right now, I see that we:

  • Extract all paths
  • Compare all paths to find conflicts
  • Create a new mapping that we can return

Would it be simpler to do just the last one and validate for conflicts while we do that? Something like:

def combine_mappings(
    mappings: List[Optional[Union[Mapping[str, Any], str]]],
) -> Union[Mapping[str, Any], str]:
    """
    Combine multiple mappings into a single mapping, supporting nested structures.
    If any of the mappings are a string, return that string. Raise errors in the following cases:
    * If there are conflicting paths across mappings (including nested conflicts)
    * If there are multiple string mappings
    * If there are multiple mappings containing keys and one of them is a string
    """
    if not mappings:
        return {}

    # Count how many string options we have, ignoring None values
    string_options = sum(isinstance(mapping, str) for mapping in mappings if mapping is not None)
    if string_options > 1:
        raise ValueError("Cannot combine multiple string options")

    # Filter out None values and empty mappings
    non_empty_mappings = [
        m for m in mappings if m is not None and not (isinstance(m, Mapping) and not m)
    ]

    # If there is only one string option, return it
    if string_options == 1:
        if len(non_empty_mappings) > 1:
            raise ValueError("Cannot combine multiple options if one is a string")
        return next(m for m in non_empty_mappings if isinstance(m, str))

    # Convert all mappings to flat (path, value) pairs for conflict detection
    for other in mappings[1:]:
        if other:
            merge(mappings[0], other)

    return mappings[0]


def merge(a: dict, b: dict, path=[]):
    """
    Blindly and shamelessly taken from https://stackoverflow.com/a/7205107
    """
    for key in b:
        if key in a:
            if isinstance(a[key], dict) and isinstance(b[key], dict):
                merge(a[key], b[key], path + [str(key)])
            elif a[key] != b[key]:
                raise ValueError('Duplicate keys')
        else:
            a[key] = b[key]
    return a

If we are afraid of modifying the mappings in memory, we can create a deepcopy of it as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants