-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: sanitize transformation error message to hide record values #221
Conversation
📝 WalkthroughWalkthroughThe changes enhance the error messaging mechanism in the type transformation utility within the Airbyte CDK. The Changes
Sequence DiagramsequenceDiagram
participant Transformer as TypeTransformer
participant ValidationError as Validation Error
participant ErrorMessage as Error Message Generator
Transformer->>ValidationError: Receive validation error
Transformer->>ErrorMessage: Call get_error_message()
ErrorMessage->>ErrorMessage: _get_type_structure()
ErrorMessage-->>Transformer: Return detailed error message
Hey! It looks like the changes really focus on making the error reporting clearer and more informative. Would you like to dive deeper into how the new type structure analysis works? It seems like a fascinating approach! Wdyt? 😊 Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
airbyte_cdk/sources/utils/transform.py (1)
228-236
: LGTM! The recursive type structure analysis looks solid.The implementation effectively sanitizes sensitive data by focusing on type information rather than actual values. I particularly like how it handles nested dictionaries recursively.
One thought: Would it be helpful to add a docstring explaining the purpose and return format of
_get_type_structure
? This could help future maintainers understand its security implications, wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
airbyte_cdk/sources/utils/transform.py
(1 hunks)unit_tests/sources/utils/test_transform.py
(6 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: Check: 'source-the-guardian-api' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
🔇 Additional comments (2)
airbyte_cdk/sources/utils/transform.py (1)
237-240
: The error message format looks good and secure!The new format effectively communicates type transformation issues without exposing sensitive data. I like how it:
- Uses dot notation for clear field paths
- Shows type structure instead of actual values
- Maintains helpful context for debugging
unit_tests/sources/utils/test_transform.py (1)
107-107
: The test coverage looks comprehensive!The updated test cases effectively verify that:
- Error messages follow the new secure format
- Type information is correctly structured
- Various edge cases are handled properly
- No sensitive data is exposed
I particularly appreciate how the test cases cover nested objects, arrays, and complex type scenarios. This helps ensure the security fix is robust across different data structures.
Also applies to: 114-114, 199-199, 208-208, 215-215, 222-222, 255-255, 266-266, 275-275
…irbytehq/airbyte-python-cdk into christo/sanitize-error-message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
airbyte_cdk/sources/utils/transform.py (2)
237-237
: How about adding a return type hint? 🤔The method could benefit from an explicit return type annotation. Maybe something like this? wdyt?
- def _get_type_structure(self, input_data: Any, current_depth: int = 0) -> Any: + def _get_type_structure(self, input_data: Any, current_depth: int = 0) -> Union[str, Dict[str, Any]]:
246-254
: Would you consider simplifying this nested condition? 🤔The nested if-else structure could be simplified using a helper function or dictionary mapping. Here's one way to do it, wdyt?
if current_depth >= MAX_NESTING_DEPTH: - return ( - "object" - if isinstance(input_data, dict) - else "array" - if isinstance(input_data, list) - else python_to_json[type(input_data)] - ) + if isinstance(input_data, dict): + return "object" + if isinstance(input_data, list): + return "array" + return python_to_json[type(input_data)]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
airbyte_cdk/sources/utils/transform.py
(2 hunks)unit_tests/sources/utils/test_transform.py
(6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- unit_tests/sources/utils/test_transform.py
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-the-guardian-api' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Analyze (python)
🔇 Additional comments (2)
airbyte_cdk/sources/utils/transform.py (2)
12-12
: Nice addition of the MAX_NESTING_DEPTH constant! 👍This addresses the previous concern about infinite recursion nicely. The value of 3 seems reasonable for type structure representation.
229-236
: Clean implementation of the sanitized error message! 🎯The new message format effectively hides sensitive values while still providing useful debugging context through type information and field path.
Co-authored-by: Ben Church <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
airbyte_cdk/sources/utils/transform.py (1)
229-236
: Consider truncating very long field paths for readability?The error message looks good and safely avoids exposing sensitive data! For better readability, what do you think about truncating very long field paths (e.g., showing only the last N segments with an ellipsis prefix)? wdyt?
- field_path = ".".join(map(str, e.path)) + MAX_PATH_SEGMENTS = 3 + path_segments = list(map(str, e.path)) + if len(path_segments) > MAX_PATH_SEGMENTS: + field_path = "..." + ".".join(path_segments[-MAX_PATH_SEGMENTS:]) + else: + field_path = ".".join(path_segments)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte_cdk/sources/utils/transform.py
(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-the-guardian-api' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Analyze (python)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
🔇 Additional comments (2)
airbyte_cdk/sources/utils/transform.py (2)
12-12
: LGTM! Good call on adding the MAX_NESTING_DEPTH constant.The constant helps prevent stack overflow from deeply nested structures while maintaining readable error messages.
229-262
: Great job on the security improvements! 🔒The changes effectively prevent sensitive data leakage while maintaining useful error messages. The type structure provides enough context for debugging without exposing actual values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
airbyte_cdk/sources/utils/transform.py (2)
229-235
: How about enhancing the docstring to be more descriptive? wdyt?The docstring could better explain the method's purpose and return value format. Here's a suggestion:
def get_error_message(self, e: ValidationError) -> str: """ - Construct a sanitized error message from a ValidationError instance. + Construct a sanitized error message from a ValidationError instance that describes + the type conversion failure without exposing sensitive data values. + + Args: + e: The ValidationError instance containing error details + + Returns: + A string message in the format: "Failed to transform value from type 'X' to type 'Y' at path: 'path'" """
237-237
: Should we add a more specific return type hint? wdyt?The return type
Any
could be more specific to better document the possible return values:- def _get_type_structure(self, input_data: Any, current_depth: int = 0) -> Any: + def _get_type_structure(self, input_data: Any, current_depth: int = 0) -> Union[str, dict[str, Any]]:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
airbyte_cdk/sources/utils/transform.py
(2 hunks)unit_tests/sources/utils/test_transform.py
(6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- unit_tests/sources/utils/test_transform.py
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-the-guardian-api' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Analyze (python)
🔇 Additional comments (2)
airbyte_cdk/sources/utils/transform.py (2)
12-12
: LGTM! Good choice on the nesting depth.The constant is well-placed and the value of 3 provides a good balance between detail and brevity in error messages.
255-256
: 🛠️ Refactor suggestionShould we add graceful handling for custom types? wdyt?
The current implementation might raise KeyError for custom types not in
python_to_json
. Consider adding a fallback:else: - return python_to_json[type(input_data)] + try: + return python_to_json[type(input_data)] + except KeyError: + return f"custom_type({type(input_data).__name__})"Likely invalid or redundant comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What
A user alerted us to a Record transformation error that was exposing sensitive email values in their Builder test reads. Although we're not overly concerned about the security implications given these logs are not persisted long-term and should only be available to the project owner, we should err on the side of caution when logging source data. This PR updates the error message constructor to sanitize data values, only displaying the structure/type of the field causing the failure.
Summary by CodeRabbit
Bug Fixes
Documentation
Tests