Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add regex replace functionality to transformation filter extractors [Revised] #309

Merged
merged 23 commits into from
Mar 8, 2024

Conversation

ben-taussig-solo
Copy link
Contributor

@ben-taussig-solo ben-taussig-solo commented Feb 26, 2024

Description

  • This is a revised version of Add regex replace functionality to transformation filter extractors #301 , which previously merged and was then reverted with Revert new Extractor Regex Replace functionality after early merge #302
  • The original PR was accidentally merged before OCTO had the opportunity to review, which is why we decided to revert
  • During the meeting in which we decided to revert Add regex replace functionality to transformation filter extractors #301, I committed to implementing some suggested changes:
    • Addition of support for RE2 as a supported regex engine in addition to C++ std::regex
      • I did not end up doing this, for the following reasons:
        • This is a significant expansion of scope of this RFE. Adding support for this would require a significant time investment
        • RE2 is not designed for or performant relative to other regex engines when using capturing groups, which are central to the standard extraction and single value replace use cases. The goal of adding support for RE2 would be to improve the performance of extractor functionality, but there are other regex engines that are better suited for this use case
        • RE2 has differences in syntax and processing behavior that will require a significant amount of work to test effectively
    • Support for dynamic metadata as an Extractor input
      • This has been added with this PR
      • Update: we did not end up including this functionality in the final PR
    • Refactoring new replace all/replace individual value functionality to take and return std::string values, instead of absl::string_view
      • This has been added in this PR

API Changes

  • The Extraction (see Gloo Edge docs here) message has been updated to include 3 new fields:
    • mode
      • This is an enum which selects the mode of operation that selects the mode of operation that the extraction will use. Introduced with this PR are the SINGLE_REPLACE and REPLACE_ALL modes
        enum Mode {
        // Default mode. Extract the value of the subgroup-th capturing group.
        EXTRACT = 0;
        // Replace the value of the subgroup-th capturing group with the replacement_text.
        // Note: replacement_text must be set for this mode.
        SINGLE_REPLACE = 1;
        // Replace all matches of the regex in the source with the replacement_text.
        // Note: replacement_text must be set for this mode.
        // Note: subgroup is ignored for this mode. configuration will fail if subgroup is set.
        // Note: restrictions on the regex are different for this mode. See the regex field for more details.
        REPLACE_ALL = 2;
        }
    • replacement_text
      • when SINGLE_REPLACE or REPLACE_ALL modes are selected, the content matching the extractor's regex will be replaced with the value in replacement_text
    • source.dynamic_metadata
      • a dynamic_metadata field has been added to the source oneof, which specifies the source of the extraction. This value contains a key, the value of which in dynamic metadata will be provided to the extractor as input
      • Update: After discussion with Jacob, I have decide to remove this portion from the PR. I will put up a separate PR with the work that was done here. We decided that this functionality was too unstable to include in this PR

Feature Overview

  • This feature adds two new extractor "modes", which allow users to replace content in extracted values
    • These are SINGLE_REPLACE and REPLACE_ALL modes
    • In SINGLE_REPLACE mode, regex semantics are the same as with EXTRACT mode, i.e., the previous and now default extractor behavior. Users specify a regex that must match the entire input, and the content in the specified capture group is replaced with a configured string value (replacement_text)
    • in REPLACE_ALL mode, the regex is allowed to match the input multiple times. All matches are replaced by a configured string value (replacement_text)
  • This feature adds support for dynamic metadata as an extractor input
    • Dynamic metadata can be used as an input in all extraction modes

ben-taussig-solo and others added 6 commits February 5, 2024 15:48
…301)

* initial extractor implementation of replace functionality

* minor changes, testing against mergeExtractorsToBody and extraction callback

* add changelog entry

* update API to use new mode selector

update transformation_filter proto

* minor updates to comments in transformation_filter.proto

* remove existing references to no-longer-existing replace_all setting

* update replacement_text_ to a std::optional<std::string>

* remove duplicate mode enum

* update comment indicating that subgroup should never exceed regex_result size

* add AttemptReplaceFromNoMatchNonNilSubgroup test

* prevent string reallocation

* remove unnecessary if block + variable in replaceAllValues

* clean up new tests

* inline replacement_text in inja_transformer_test.cc

* more test cleanup
@solo-changelog-bot
Copy link

Issues linked to changelog:
solo-io/gloo#8706

@nfuden
Copy link
Collaborator

nfuden commented Feb 29, 2024

Step #0: Step #0 - "do_ci": �[1mtest/extensions/filters/http/transformation/inja_transformer_test.cc:64:29: �[0m�[0;1;31merror: �[0m�[1mincompatible pointer types assigning to 'const std::unordered_map<std::string, std::string> *' (aka 'const unordered_map<basic_string<char, char_traits, allocator>, basic_string<char, char_traits, allocator>> *') from 'const std::unordered_map<std::string, absl::string_view> *' (aka 'const unordered_map<basic_string<char, char_traits, allocator>, basic_string_view> *')�[0m
Step #0: Step #0 - "do_ci": typed_slot.extractions_ = &extractions;

@ben-taussig-solo
Copy link
Contributor Author

Step #0: Step #0 - "do_ci": �[1mtest/extensions/filters/http/transformation/inja_transformer_test.cc:64:29: �[0m�[0;1;31merror: �[0m�[1mincompatible pointer types assigning to 'const std::unordered_map<std::string, std::string> *' (aka 'const unordered_map<basic_string<char, char_traits, allocator>, basic_string<char, char_traits, allocator>> *') from 'const std::unordered_map<std::string, absl::string_view> *' (aka 'const unordered_map<basic_string<char, char_traits, allocator>, basic_string_view> *')�[0m Step #0: Step #0 - "do_ci": typed_slot.extractions_ = &extractions;

resolved with 741ea24

I had added a switch statement to select different modes of extractor operation, and left out break;s at the end of each case, causing us to always hit the default PANIC_DUE_TO_CORRUPT_ENUM case

Copy link
Contributor

@jbohanon jbohanon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great!

Copy link
Contributor

@jbohanon jbohanon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

nfuden
nfuden previously approved these changes Feb 29, 2024
@soloio-bulldozer soloio-bulldozer bot merged commit 5ddcf40 into main Mar 8, 2024
3 checks passed
@soloio-bulldozer soloio-bulldozer bot deleted the regex_extractor_revision branch March 8, 2024 18:50
ben-taussig-solo added a commit that referenced this pull request Mar 11, 2024
…Revised] (#309)

* Add regex replace functionality to transformation filter extractors (#301)

* initial extractor implementation of replace functionality

* minor changes, testing against mergeExtractorsToBody and extraction callback

* add changelog entry

* update API to use new mode selector

update transformation_filter proto

* minor updates to comments in transformation_filter.proto

* remove existing references to no-longer-existing replace_all setting

* update replacement_text_ to a std::optional<std::string>

* remove duplicate mode enum

* update comment indicating that subgroup should never exceed regex_result size

* add AttemptReplaceFromNoMatchNonNilSubgroup test

* prevent string reallocation

* remove unnecessary if block + variable in replaceAllValues

* clean up new tests

* inline replacement_text in inja_transformer_test.cc

* more test cleanup

* update function signatures, remove replaced_value_

* support dynamic metadata as extractor input

* update changelog location

* add API changes to go with 3175ca9

* revert support for dynamic metadata as an extractor input 3175ca9 and e2668be

* refactor calls to extract/replace

* rename replace to extractDestructive, add breaks to switch statement

* update data types to match updated function signatures in inja_transformer_test.cc

* respond to review comments

* update changelog location

* update changelog location

* separate destructive extractors and non-destructive extractors

* fix match_not_null edge case

* update inline documentation for new proto field

* add test demonstrating use of format specifiers

* update REPLACE_ALL mode to return input on no match

* return input on no match in single replace case
ben-taussig-solo added a commit that referenced this pull request Mar 11, 2024
…Revised] (#309)

* Add regex replace functionality to transformation filter extractors (#301)

* initial extractor implementation of replace functionality

* minor changes, testing against mergeExtractorsToBody and extraction callback

* add changelog entry

* update API to use new mode selector

update transformation_filter proto

* minor updates to comments in transformation_filter.proto

* remove existing references to no-longer-existing replace_all setting

* update replacement_text_ to a std::optional<std::string>

* remove duplicate mode enum

* update comment indicating that subgroup should never exceed regex_result size

* add AttemptReplaceFromNoMatchNonNilSubgroup test

* prevent string reallocation

* remove unnecessary if block + variable in replaceAllValues

* clean up new tests

* inline replacement_text in inja_transformer_test.cc

* more test cleanup

* update function signatures, remove replaced_value_

* support dynamic metadata as extractor input

* update changelog location

* add API changes to go with 3175ca9

* revert support for dynamic metadata as an extractor input 3175ca9 and e2668be

* refactor calls to extract/replace

* rename replace to extractDestructive, add breaks to switch statement

* update data types to match updated function signatures in inja_transformer_test.cc

* respond to review comments

* update changelog location

* update changelog location

* separate destructive extractors and non-destructive extractors

* fix match_not_null edge case

* update inline documentation for new proto field

* add test demonstrating use of format specifiers

* update REPLACE_ALL mode to return input on no match

* return input on no match in single replace case
soloio-bulldozer bot pushed a commit that referenced this pull request Mar 13, 2024
* Add regex replace functionality to transformation filter extractors [Revised] (#309)

* Add regex replace functionality to transformation filter extractors (#301)

* initial extractor implementation of replace functionality

* minor changes, testing against mergeExtractorsToBody and extraction callback

* add changelog entry

* update API to use new mode selector

update transformation_filter proto

* minor updates to comments in transformation_filter.proto

* remove existing references to no-longer-existing replace_all setting

* update replacement_text_ to a std::optional<std::string>

* remove duplicate mode enum

* update comment indicating that subgroup should never exceed regex_result size

* add AttemptReplaceFromNoMatchNonNilSubgroup test

* prevent string reallocation

* remove unnecessary if block + variable in replaceAllValues

* clean up new tests

* inline replacement_text in inja_transformer_test.cc

* more test cleanup

* update function signatures, remove replaced_value_

* support dynamic metadata as extractor input

* update changelog location

* add API changes to go with 3175ca9

* revert support for dynamic metadata as an extractor input 3175ca9 and e2668be

* refactor calls to extract/replace

* rename replace to extractDestructive, add breaks to switch statement

* update data types to match updated function signatures in inja_transformer_test.cc

* respond to review comments

* update changelog location

* update changelog location

* separate destructive extractors and non-destructive extractors

* fix match_not_null edge case

* update inline documentation for new proto field

* add test demonstrating use of format specifiers

* update REPLACE_ALL mode to return input on no match

* return input on no match in single replace case

* update changelog location
soloio-bulldozer bot pushed a commit that referenced this pull request Mar 13, 2024
* Add regex replace functionality to transformation filter extractors [Revised] (#309)

* Add regex replace functionality to transformation filter extractors (#301)

* initial extractor implementation of replace functionality

* minor changes, testing against mergeExtractorsToBody and extraction callback

* add changelog entry

* update API to use new mode selector

update transformation_filter proto

* minor updates to comments in transformation_filter.proto

* remove existing references to no-longer-existing replace_all setting

* update replacement_text_ to a std::optional<std::string>

* remove duplicate mode enum

* update comment indicating that subgroup should never exceed regex_result size

* add AttemptReplaceFromNoMatchNonNilSubgroup test

* prevent string reallocation

* remove unnecessary if block + variable in replaceAllValues

* clean up new tests

* inline replacement_text in inja_transformer_test.cc

* more test cleanup

* update function signatures, remove replaced_value_

* support dynamic metadata as extractor input

* update changelog location

* add API changes to go with 3175ca9

* revert support for dynamic metadata as an extractor input 3175ca9 and e2668be

* refactor calls to extract/replace

* rename replace to extractDestructive, add breaks to switch statement

* update data types to match updated function signatures in inja_transformer_test.cc

* respond to review comments

* update changelog location

* update changelog location

* separate destructive extractors and non-destructive extractors

* fix match_not_null edge case

* update inline documentation for new proto field

* add test demonstrating use of format specifiers

* update REPLACE_ALL mode to return input on no match

* return input on no match in single replace case

* update changelog

* empty commit to kick changelog bot after switching target branch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants