-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add regex replace functionality to transformation filter extractors #301
Conversation
source/extensions/filters/http/transformation/inja_transformer.cc
Outdated
Show resolved
Hide resolved
// confusing edge case in std::regex_replace when the regex is .* | ||
// apparently, this regex matches the whole input string __AND__ the | ||
// line ending, so the replacement is applied twice | ||
EXPECT_EQ("BAZBAZ", res); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to reviewers: this edge case frustrates me, trying to figure out how to handle it at the moment. Any input would be appreciated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does changing line 114 to extractor.set_regex(".*$")
change anything? or maybe "^.*$"
?
source/extensions/filters/http/transformation/inja_transformer.h
Outdated
Show resolved
Hide resolved
source/extensions/filters/http/transformation/inja_transformer.cc
Outdated
Show resolved
Hide resolved
api/envoy/config/filter/http/transformation/v2/transformation_filter.proto
Outdated
Show resolved
Hide resolved
Issues linked to changelog: |
api/envoy/config/filter/http/transformation/v2/transformation_filter.proto
Outdated
Show resolved
Hide resolved
update transformation_filter proto
767bae1
to
3554ba1
Compare
api/envoy/config/filter/http/transformation/v2/transformation_filter.proto
Outdated
Show resolved
Hide resolved
api/envoy/config/filter/http/transformation/v2/transformation_filter.proto
Outdated
Show resolved
Hide resolved
source/extensions/filters/http/transformation/inja_transformer.cc
Outdated
Show resolved
Hide resolved
source/extensions/filters/http/transformation/inja_transformer.h
Outdated
Show resolved
Hide resolved
} | ||
|
||
// If a match was found, replace all instances of the regex in the input value with the replacement_text_ value | ||
replaced = std::regex_replace(input, extract_regex_, replacement_text_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar performance concerns here since we have had issues with std::regex
in the past
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some notes/questions:
- Extractor regexes are compiled at config time, which is good (or really, the alternative is unacceptable)
extract_regex_(Solo::Regex::Utility::parseStdRegex(extractor.regex())) {
- The replaceAll path (i.e. the one we're discussing here) is not maximally efficient -- it does an efficient check to identify whether there are any matches, and then does the call you've highlighted here, which will replace all matches.
- I considered extending the efficient check to identify matches to identify all matches, after which we could manually replace matches with the replacement_text_, but I chose not to as it seemed like it would be difficult to implement/read/maintain, and would potentially be prone to bugs and edge cases. But I could go that route if the performance of
regex_replace
is problematically bad. That being said I think I would prefer to get some performance metrics before doing so
- I considered extending the efficient check to identify matches to identify all matches, after which we could manually replace matches with the replacement_text_, but I chose not to as it seemed like it would be difficult to implement/read/maintain, and would potentially be prone to bugs and edge cases. But I could go that route if the performance of
- Can you link any issues or PRs you can recall where there's discussion about
std::regex
performance issues?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you link any issues or PRs you can recall where there's discussion about std::regex performance issues?
Here's an issue in DLP. Similar situation - regexes are compiled at the time of configuration, but are still extremely slow. There are a few benchmarks linked in the issue that show how... hopeless std::regex
can be.
Happy to leave this as is for now, but if this issue does bubble up, there's a bit of additional and (hopefully) helpful information on that thread
source/extensions/filters/http/transformation/inja_transformer.h
Outdated
Show resolved
Hide resolved
test/extensions/filters/http/transformation/inja_transformer_replace_test.cc
Show resolved
Hide resolved
api/envoy/config/filter/http/transformation/v2/transformation_filter.proto
Outdated
Show resolved
Hide resolved
source/extensions/filters/http/transformation/inja_transformer.cc
Outdated
Show resolved
Hide resolved
source/extensions/filters/http/transformation/inja_transformer.cc
Outdated
Show resolved
Hide resolved
source/extensions/filters/http/transformation/inja_transformer.h
Outdated
Show resolved
Hide resolved
source/extensions/filters/http/transformation/inja_transformer.cc
Outdated
Show resolved
Hide resolved
Note to reviewers: I've put up a draft PR in Gloo OSS with changes to support this functionality: solo-io/gloo#9124 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly nits; looking really good
source/extensions/filters/http/transformation/inja_transformer.cc
Outdated
Show resolved
Hide resolved
test/extensions/filters/http/transformation/inja_transformer_replace_test.cc
Outdated
Show resolved
Hide resolved
test/extensions/filters/http/transformation/inja_transformer_test.cc
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good - only one (potentially) major comment that i found (replaced_value_
seems unnecessary), but looks good to me
test/extensions/filters/http/transformation/inja_transformer_replace_test.cc
Outdated
Show resolved
Hide resolved
// confusing edge case in std::regex_replace when the regex is .* | ||
// apparently, this regex matches the whole input string __AND__ the | ||
// line ending, so the replacement is applied twice | ||
EXPECT_EQ("BAZBAZ", res); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does changing line 114 to extractor.set_regex(".*$")
change anything? or maybe "^.*$"
?
test/extensions/filters/http/transformation/inja_transformer_replace_test.cc
Outdated
Show resolved
Hide resolved
test/extensions/filters/http/transformation/inja_transformer_replace_test.cc
Outdated
Show resolved
Hide resolved
Yes -- either of those regexes results in normal/expected. The issue only exists when the regex is set to Note that this is caused by |
extraction_func_ = std::bind(&Extractor::replaceAllValues, this, _1, _2); | ||
break; | ||
default: | ||
throw EnvoyException("Unknown mode"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is probably fine, but you could also consider using disregard, i think this macro is intended for other usesPANIC_DUE_TO_CORRUPT_ENUM
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
…301) * initial extractor implementation of replace functionality * minor changes, testing against mergeExtractorsToBody and extraction callback * add changelog entry * update API to use new mode selector update transformation_filter proto * minor updates to comments in transformation_filter.proto * remove existing references to no-longer-existing replace_all setting * update replacement_text_ to a std::optional<std::string> * remove duplicate mode enum * update comment indicating that subgroup should never exceed regex_result size * add AttemptReplaceFromNoMatchNonNilSubgroup test * prevent string reallocation * remove unnecessary if block + variable in replaceAllValues * clean up new tests * inline replacement_text in inja_transformer_test.cc * more test cleanup
…Revised] (#309) * Add regex replace functionality to transformation filter extractors (#301) * initial extractor implementation of replace functionality * minor changes, testing against mergeExtractorsToBody and extraction callback * add changelog entry * update API to use new mode selector update transformation_filter proto * minor updates to comments in transformation_filter.proto * remove existing references to no-longer-existing replace_all setting * update replacement_text_ to a std::optional<std::string> * remove duplicate mode enum * update comment indicating that subgroup should never exceed regex_result size * add AttemptReplaceFromNoMatchNonNilSubgroup test * prevent string reallocation * remove unnecessary if block + variable in replaceAllValues * clean up new tests * inline replacement_text in inja_transformer_test.cc * more test cleanup * update function signatures, remove replaced_value_ * support dynamic metadata as extractor input * update changelog location * add API changes to go with 3175ca9 * revert support for dynamic metadata as an extractor input 3175ca9 and e2668be * refactor calls to extract/replace * rename replace to extractDestructive, add breaks to switch statement * update data types to match updated function signatures in inja_transformer_test.cc * respond to review comments * update changelog location * update changelog location * separate destructive extractors and non-destructive extractors * fix match_not_null edge case * update inline documentation for new proto field * add test demonstrating use of format specifiers * update REPLACE_ALL mode to return input on no match * return input on no match in single replace case
…Revised] (#309) * Add regex replace functionality to transformation filter extractors (#301) * initial extractor implementation of replace functionality * minor changes, testing against mergeExtractorsToBody and extraction callback * add changelog entry * update API to use new mode selector update transformation_filter proto * minor updates to comments in transformation_filter.proto * remove existing references to no-longer-existing replace_all setting * update replacement_text_ to a std::optional<std::string> * remove duplicate mode enum * update comment indicating that subgroup should never exceed regex_result size * add AttemptReplaceFromNoMatchNonNilSubgroup test * prevent string reallocation * remove unnecessary if block + variable in replaceAllValues * clean up new tests * inline replacement_text in inja_transformer_test.cc * more test cleanup * update function signatures, remove replaced_value_ * support dynamic metadata as extractor input * update changelog location * add API changes to go with 3175ca9 * revert support for dynamic metadata as an extractor input 3175ca9 and e2668be * refactor calls to extract/replace * rename replace to extractDestructive, add breaks to switch statement * update data types to match updated function signatures in inja_transformer_test.cc * respond to review comments * update changelog location * update changelog location * separate destructive extractors and non-destructive extractors * fix match_not_null edge case * update inline documentation for new proto field * add test demonstrating use of format specifiers * update REPLACE_ALL mode to return input on no match * return input on no match in single replace case
…Revised] (#309) * Add regex replace functionality to transformation filter extractors (#301) * initial extractor implementation of replace functionality * minor changes, testing against mergeExtractorsToBody and extraction callback * add changelog entry * update API to use new mode selector update transformation_filter proto * minor updates to comments in transformation_filter.proto * remove existing references to no-longer-existing replace_all setting * update replacement_text_ to a std::optional<std::string> * remove duplicate mode enum * update comment indicating that subgroup should never exceed regex_result size * add AttemptReplaceFromNoMatchNonNilSubgroup test * prevent string reallocation * remove unnecessary if block + variable in replaceAllValues * clean up new tests * inline replacement_text in inja_transformer_test.cc * more test cleanup * update function signatures, remove replaced_value_ * support dynamic metadata as extractor input * update changelog location * add API changes to go with 3175ca9 * revert support for dynamic metadata as an extractor input 3175ca9 and e2668be * refactor calls to extract/replace * rename replace to extractDestructive, add breaks to switch statement * update data types to match updated function signatures in inja_transformer_test.cc * respond to review comments * update changelog location * update changelog location * separate destructive extractors and non-destructive extractors * fix match_not_null edge case * update inline documentation for new proto field * add test demonstrating use of format specifiers * update REPLACE_ALL mode to return input on no match * return input on no match in single replace case
* Add regex replace functionality to transformation filter extractors [Revised] (#309) * Add regex replace functionality to transformation filter extractors (#301) * initial extractor implementation of replace functionality * minor changes, testing against mergeExtractorsToBody and extraction callback * add changelog entry * update API to use new mode selector update transformation_filter proto * minor updates to comments in transformation_filter.proto * remove existing references to no-longer-existing replace_all setting * update replacement_text_ to a std::optional<std::string> * remove duplicate mode enum * update comment indicating that subgroup should never exceed regex_result size * add AttemptReplaceFromNoMatchNonNilSubgroup test * prevent string reallocation * remove unnecessary if block + variable in replaceAllValues * clean up new tests * inline replacement_text in inja_transformer_test.cc * more test cleanup * update function signatures, remove replaced_value_ * support dynamic metadata as extractor input * update changelog location * add API changes to go with 3175ca9 * revert support for dynamic metadata as an extractor input 3175ca9 and e2668be * refactor calls to extract/replace * rename replace to extractDestructive, add breaks to switch statement * update data types to match updated function signatures in inja_transformer_test.cc * respond to review comments * update changelog location * update changelog location * separate destructive extractors and non-destructive extractors * fix match_not_null edge case * update inline documentation for new proto field * add test demonstrating use of format specifiers * update REPLACE_ALL mode to return input on no match * return input on no match in single replace case * update changelog location
* Add regex replace functionality to transformation filter extractors [Revised] (#309) * Add regex replace functionality to transformation filter extractors (#301) * initial extractor implementation of replace functionality * minor changes, testing against mergeExtractorsToBody and extraction callback * add changelog entry * update API to use new mode selector update transformation_filter proto * minor updates to comments in transformation_filter.proto * remove existing references to no-longer-existing replace_all setting * update replacement_text_ to a std::optional<std::string> * remove duplicate mode enum * update comment indicating that subgroup should never exceed regex_result size * add AttemptReplaceFromNoMatchNonNilSubgroup test * prevent string reallocation * remove unnecessary if block + variable in replaceAllValues * clean up new tests * inline replacement_text in inja_transformer_test.cc * more test cleanup * update function signatures, remove replaced_value_ * support dynamic metadata as extractor input * update changelog location * add API changes to go with 3175ca9 * revert support for dynamic metadata as an extractor input 3175ca9 and e2668be * refactor calls to extract/replace * rename replace to extractDestructive, add breaks to switch statement * update data types to match updated function signatures in inja_transformer_test.cc * respond to review comments * update changelog location * update changelog location * separate destructive extractors and non-destructive extractors * fix match_not_null edge case * update inline documentation for new proto field * add test demonstrating use of format specifiers * update REPLACE_ALL mode to return input on no match * return input on no match in single replace case * update changelog * empty commit to kick changelog bot after switching target branch
Description
Overview
API Changes
Extraction
message in the transformation filter proto (see here):replacement_text
(google.protobuf.StringValue): If set, the portion of the input string that matches the regex/subgroup will be replaced with this text.-replace_all
(bool): If set to true, all matches against the regex will be replaced with the replacement text.- Note: When this field is set, the reqiurement that theregex
field must match the entire input string is relaxed. This means that the regex can match a substring of the input string, and the replacement will be applied to all matches.- Note: This field is mutually exclusive with the existingsubgroup
field. If both are set, the configuration will be rejected by envoy-gloo.mode
(enum): Used to explicitly specify the operation mode (EXTRACT
,SINGLE_REPLACE
,REPLACE_ALL
). This field replaces thereplace_all
boolean flag to provide clearer semantics and allow for future extensibility.Internal Changes
has_replacement_text_
(const bool)replacement_text
field is set in theExtraction
message.replacement_text
, so this field is used to differentiate between the case where the user has explicitly setreplacement_text
to an empty string and the case where the user has not setreplacement_text
at all.replacement_text_
(const std::string)replacement_text
field in theExtraction
message.replacement_text
field is not set, this field will be set to an empty string.-replace_all_
(const bool)- Set at construction time based on the value of thereplace_all
field in theExtraction
message.- If thereplace_all
field is not set, this field will be set to false.mode
(const enum)EXTRACT
,SINGLE_REPLACE
,REPLACE_ALL
) (defaults toEXTRACT
)extraction_func_
(function pointer)has_replacement_text_
field.- Ifhas_replacement_text_
is true, this field will be set toextractor::replaceValue
. Ifhas_replacement_text_
is false, this field will be set toextractor::extractValue
, the previous default behavior.extractValue
,replaceIndividualValue
,replaceAllValues
) based on the mode_replaced_value_
(mutable std::string)replaceValue
if a match is found.replaceValue
replace_all_
field, this method will either:replaceIndividualValue
, orreplaceAllValues
.replaceIndividualValue
andreplaceAllValues
, the following assumptions are made:replaced_value_
field in the Extractor will be set to the result of the replacement operation.replaceIndividualValue
replaceAllValues
Testing
Summary of new test cases
ExtractAndReplaceValueFromBodySubgroup
"not json body"
"not json BAZ"
ExtractAndReplaceValueFromFullBody
"not json body"
"BAZ"
ExtractAndReplaceAllFromFullBody
.*
regex."not json body"
"BAZBAZ"
(due to the edge case instd::regex_replace
with.*
)AttemptReplaceFromPartialMatch
replace_all
."not json body"
""
(no replacement due to partial match)AttemptReplaceFromPartialMatchNonNilSubgroup
"not json body"
""
ReplaceFromFullLiteralMatch
"not json body"
"BAZ"
AttemptToReplaceFromInvalidSubgroup
"not json body"
NestedSubgroups
"not json body"
"not BAZ body"
SubgroupUnset
"not json body"
"BAZ"
NoMatch
"not json body"
""
(no replacement due to no match)NilReplace
"not json body"
"not json "
NilReplaceWithSubgroupUnset
"not json body"
""
HeaderHappyPath
foo
with value"bar"
"BAZ"
ReplaceAllWithReplacementTextUnset
"bar bar bar"
ReplaceAllWithSubgroupSet
replace_all
with a subgroup set, expecting an exception due to configuration conflict."bar bar bar"
ReplaceAllHappyPath
"bar bar bar"
"BAZ BAZ BAZ"
IndividualReplaceIdentity
"bar bar bar"
"bar bar bar"
ReplaceAllIdentity
replace_all
enabled."bar bar bar"
"bar bar bar"
ReplaceAllNoMatch
replace_all
with a regex that does not match the input."not json body"
""
(no replacement due to no match)Context
Work Remaining (in subsequent PRs)