Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add regex replace functionality to transformation filter extractors [Revised] #309

Merged
merged 23 commits into from
Mar 8, 2024
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
dc47659
Add regex replace functionality to transformation filter extractors (…
ben-taussig-solo Feb 1, 2024
7780a9c
update function signatures, remove replaced_value_
ben-taussig-solo Feb 8, 2024
3175ca9
support dynamic metadata as extractor input
ben-taussig-solo Feb 26, 2024
9ca3462
Merge branch 'main' into regex_extractor_revision
ben-taussig-solo Feb 26, 2024
0cf5b16
update changelog location
ben-taussig-solo Feb 26, 2024
e2668be
add API changes to go with 3175ca98f190b91ad9292a993d1a63b39e79affa
ben-taussig-solo Feb 26, 2024
22a0f84
Merge branch 'main' into regex_extractor_revision
ben-taussig-solo Feb 28, 2024
4caf6c7
revert support for dynamic metadata as an extractor input 3175ca98f19…
ben-taussig-solo Feb 28, 2024
3707234
refactor calls to extract/replace
ben-taussig-solo Feb 28, 2024
741ea24
rename replace to extractDestructive, add breaks to switch statement
ben-taussig-solo Feb 29, 2024
2a1cde6
update data types to match updated function signatures in inja_transf…
ben-taussig-solo Feb 29, 2024
5fcb545
respond to review comments
ben-taussig-solo Feb 29, 2024
3b19e65
Merge branch 'main' into regex_extractor_revision
ben-taussig-solo Feb 29, 2024
7201db5
update changelog location
ben-taussig-solo Feb 29, 2024
9cfb8a1
update changelog location
ben-taussig-solo Feb 29, 2024
9a543cf
separate destructive extractors and non-destructive extractors
ben-taussig-solo Mar 1, 2024
e170040
fix match_not_null edge case
ben-taussig-solo Mar 1, 2024
a4ed34d
update inline documentation for new proto field
ben-taussig-solo Mar 1, 2024
ef5b4d9
add test demonstrating use of format specifiers
ben-taussig-solo Mar 1, 2024
1e4f933
Merge branch 'main' into regex_extractor_revision
ben-taussig-solo Mar 4, 2024
6469b26
update REPLACE_ALL mode to return input on no match
ben-taussig-solo Mar 7, 2024
e6f2445
Merge branch 'regex_extractor_revision' of https://github.com/solo-io…
ben-taussig-solo Mar 7, 2024
bd9380c
return input on no match in single replace case
ben-taussig-solo Mar 8, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,19 @@ message Transformation {
// Extractions can be used to extract information from the request/response.
// The extracted information can then be referenced in template fields.
message Extraction {
// The mode of operation for the extraction.
enum Mode {
// Default mode. Extract the value of the subgroup-th capturing group.
EXTRACT = 0;
// Replace the value of the subgroup-th capturing group with the replacement_text.
// Note: replacement_text must be set for this mode.
SINGLE_REPLACE = 1;
nfuden marked this conversation as resolved.
Show resolved Hide resolved
// Replace all matches of the regex in the source with the replacement_text.
// Note: replacement_text must be set for this mode.
// Note: subgroup is ignored for this mode. configuration will fail if subgroup is set.
// Note: restrictions on the regex are different for this mode. See the regex field for more details.
REPLACE_ALL = 2;
}

// The source of the extraction
oneof source {
Expand All @@ -162,15 +175,31 @@ message Extraction {
google.protobuf.Empty body = 4;
}

// Only strings matching this regular expression will be part of the
// extraction. The most simple value for this field is '.*', which matches the
// whole source. The field is required. If extraction fails the result is an
// empty value.
// The regex field specifies the regular expression used for matching against the source content.
// - In EXTRACT mode, the entire source must match the regex. The subgroup-th capturing group,
// if specified, determines which part of the match is extracted.
// - In SINGLE_REPLACE mode, the regex also needs to match the entire source. The subgroup-th capturing group
// is targeted for replacement with the replacement_text.
// - In REPLACE_ALL mode, the regex is applied repeatedly to find all occurrences within the source that match.
// Each matching occurrence is replaced with the replacement_text, and the subgroup field is not used.
// This field is required, and if the regex does not match the source as per the selected mode, the result of
// the extraction will be an empty value.
string regex = 2;

// If your regex contains capturing groups, use this field to determine which
// group should be selected.
// For EXTRACT and SINGLE_REPLACE, refers to the portion of the text
// to extract/replace.
// Config will be rejected if this is specified in REPLACE_ALL mode.
uint32 subgroup = 3;

// The string to replace the matched portion of the source with.
// Used in SINGLE_REPLACE and REPLACE_ALL modes.
google.protobuf.StringValue replacement_text = 5;
yuval-k marked this conversation as resolved.
Show resolved Hide resolved

// The mode of operation for the extraction.
// Defaults to EXTRACT.
Mode mode = 6;
}

// Defines a transformation template.
Expand Down
7 changes: 7 additions & 0 deletions changelog/v1.27.3-patch2/extractor_regex_replace.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
changelog:
- type: NEW_FEATURE
resolvesIssue: false
issueLink: https://github.com/solo-io/gloo/issues/8706
description: >
Update transformation filter extractors to support regex
replace/replace all operations on extracted values.
161 changes: 150 additions & 11 deletions source/extensions/filters/http/transformation/inja_transformer.cc
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,9 @@ getHeader(const Http::RequestOrResponseHeaderMap &header_map,
Extractor::Extractor(const envoy::api::v2::filter::http::Extraction &extractor)
: headername_(extractor.header()), body_(extractor.has_body()),
group_(extractor.subgroup()),
extract_regex_(Solo::Regex::Utility::parseStdRegex(extractor.regex())) {
extract_regex_(Solo::Regex::Utility::parseStdRegex(extractor.regex())),
replacement_text_(extractor.has_replacement_text() ? std::make_optional(extractor.replacement_text().value()) : std::nullopt),
mode_(extractor.mode()) {
// mark count == number of sub groups, and we need to add one for match number
// 0 so we test for < instead of <= see:
// http://www.cplusplus.com/reference/regex/basic_regex/mark_count/
Expand All @@ -65,6 +67,26 @@ Extractor::Extractor(const envoy::api::v2::filter::http::Extraction &extractor)
fmt::format("group {} requested for regex with only {} sub groups",
group_, extract_regex_.mark_count()));
}

switch (mode_) {
case ExtractionApi::EXTRACT:
break;
case ExtractionApi::SINGLE_REPLACE:
if (!replacement_text_.has_value()) {
throw EnvoyException("SINGLE_REPLACE mode set but no replacement text provided");
}
break;
case ExtractionApi::REPLACE_ALL:
if (!replacement_text_.has_value()) {
throw EnvoyException("REPLACE_ALL mode set but no replacement text provided");
}
if (group_ != 0) {
throw EnvoyException("REPLACE_ALL mode set but subgroup is not 0");
}
break;
default:
throw EnvoyException("Unknown mode");
}
}

absl::string_view
Expand All @@ -80,7 +102,34 @@ Extractor::extract(Http::StreamFilterCallbacks &callbacks,
if (header_entries.empty()) {
return "";
}
return extractValue(callbacks, header_entries[0]->value().getStringView());
const auto &header_value = header_entries[0]->value().getStringView();
jbohanon marked this conversation as resolved.
Show resolved Hide resolved
return extractValue(callbacks, header_value);
}
}

std::string
Extractor::extractDestructive(Http::StreamFilterCallbacks &callbacks,
const Http::RequestOrResponseHeaderMap &header_map,
GetBodyFunc &body) const {
if (body_) {
const std::string &string_body = body();
absl::string_view sv(string_body);
if (mode_ == ExtractionApi::SINGLE_REPLACE) {
return replaceIndividualValue(callbacks, sv);
} else {
return replaceAllValues(callbacks, sv);
}
} else {
const Http::HeaderMap::GetResult header_entries = getHeader(header_map, headername_);
if (header_entries.empty()) {
return "";
}
const auto &header_value = header_entries[0]->value().getStringView();
if (mode_ == ExtractionApi::SINGLE_REPLACE) {
return replaceIndividualValue(callbacks, header_value);
} else {
jbohanon marked this conversation as resolved.
Show resolved Hide resolved
return replaceAllValues(callbacks, header_value);
}
}
}

Expand All @@ -105,6 +154,73 @@ Extractor::extractValue(Http::StreamFilterCallbacks &callbacks,
return "";
}

// Match a regex against the input value and replace the matched subgroup with the replacement_text_ value
std::string
Extractor::replaceIndividualValue(Http::StreamFilterCallbacks &callbacks,
absl::string_view value) const {
std::match_results<absl::string_view::const_iterator> regex_result;

// if there are no matches, return an empty string
if (!std::regex_search(value.begin(), value.end(), regex_result, extract_regex_)) {
ENVOY_STREAM_LOG(debug, "replaceValue: extractor regex did not match input", callbacks);
return "";
}

// if the subgroup specified is greater than the number of subgroups in the regex, return an empty string
if (group_ >= regex_result.size()) {
// this should never happen as we test this in the ctor.
ASSERT("no such group in the regex");
ENVOY_STREAM_LOG(debug, "replaceValue: invalid group specified for regex", callbacks);
return "";
}

// if the regex doesn't match the entire input value, return an empty string
if (regex_result[0].length() != long(value.length())) {
ENVOY_STREAM_LOG(debug, "replaceValue: Regex did not match entire input value. This is not allowed in SINGLE_REPLACE mode.", callbacks);
return "";
}

// Create a new string with the maximum possible length after replacement
auto max_possible_length = value.length() + replacement_text_.value().length();
std::string replaced;
replaced.reserve(max_possible_length);

auto subgroup_start = regex_result[group_].first;
auto subgroup_end = regex_result[group_].second;

// Copy the initial part of the string until the match
replaced.assign(value.begin(), subgroup_start);

// Append the replacement text
replaced += replacement_text_.value();

// Append the remaining part of the string after the match
replaced.append(subgroup_end, value.end());

return replaced;
}

// Match a regex against the input value and replace all instances of the regex with the replacement_text_ value
std::string
Extractor::replaceAllValues(Http::StreamFilterCallbacks &callbacks,
absl::string_view value) const {
std::string input(value.begin(), value.end());
std::string replaced;

// create an iterator to search for matches of extract_regex_ in the input string
std::sregex_iterator it(input.begin(), input.end(), extract_regex_);
std::sregex_iterator end_it; // default end iterator for comparison

// return an empty string if the regex doesn't match any part of the input value
if (it == end_it) {
ENVOY_STREAM_LOG(debug, "replaceAllValues: extractor regex did not match input", callbacks);
return "";
}

ben-taussig-solo marked this conversation as resolved.
Show resolved Hide resolved
// If a match was found, replace all instances of the regex in the input value with the replacement_text_ value
return std::regex_replace(input, extract_regex_, replacement_text_.value());
}

// A TransformerInstance is constructed by the InjaTransformer constructor at config time
// on the main thread. It access thread-local storage which is populated during the
// InjaTransformer::transform method call, which happens on the request path on any
Expand Down Expand Up @@ -545,27 +661,50 @@ void InjaTransformer::transform(Http::RequestOrResponseHeaderMap &header_map,
}
}
// get the extractions
std::unordered_map<std::string, absl::string_view> extractions;
std::unordered_map<std::string, std::string> extractions;
nfuden marked this conversation as resolved.
Show resolved Hide resolved
if (advanced_templates_) {
extractions.reserve(extractors_.size());
}

for (const auto &named_extractor : extractors_) {
const std::string &name = named_extractor.first;
if (advanced_templates_) {
extractions[name] =
named_extractor.second.extract(callbacks, header_map, get_body);
} else {
absl::string_view name_to_split = name;
json *current = &json_body;
const ExtractionApi::Mode mode = named_extractor.second.mode();
jbohanon marked this conversation as resolved.
Show resolved Hide resolved

// prepare variables for non-advanced_templates_ scenario
absl::string_view name_to_split;
json* current = nullptr;
if (!advanced_templates_) {
name_to_split = name;
current = &json_body;
for (size_t pos = name_to_split.find("."); pos != std::string::npos;
pos = name_to_split.find(".")) {
auto &&field_name = name_to_split.substr(0, pos);
current = &(*current)[std::string(field_name)];
name_to_split = name_to_split.substr(pos + 1);
}
(*current)[std::string(name_to_split)] =
named_extractor.second.extract(callbacks, header_map, get_body);
}

switch(mode) {
case ExtractionApi::REPLACE_ALL:
case ExtractionApi::SINGLE_REPLACE: {
if (advanced_templates_) {
extractions[name] = named_extractor.second.extractDestructive(callbacks, header_map, get_body);
} else {
(*current)[std::string(name_to_split)] = named_extractor.second.extractDestructive(callbacks, header_map, get_body);
}
break;
}
case ExtractionApi::EXTRACT: {
if (advanced_templates_) {
extractions[name] = named_extractor.second.extract(callbacks, header_map, get_body);
} else {
(*current)[std::string(name_to_split)] = named_extractor.second.extract(callbacks, header_map, get_body);
}
break;
}
default: {
PANIC_DUE_TO_CORRUPT_ENUM
}
}
}

Expand Down
14 changes: 12 additions & 2 deletions source/extensions/filters/http/transformation/inja_transformer.h
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ namespace HttpFilters {
namespace Transformation {

using GetBodyFunc = std::function<const std::string &()>;
using ExtractionApi = envoy::api::v2::filter::http::Extraction;

struct ThreadLocalTransformerContext : public ThreadLocal::ThreadLocalObject {
public:
Expand All @@ -33,7 +34,7 @@ struct ThreadLocalTransformerContext : public ThreadLocal::ThreadLocalObject {
const Http::RequestOrResponseHeaderMap *header_map_;
const Http::RequestHeaderMap *request_headers_;
const GetBodyFunc *body_;
const std::unordered_map<std::string, absl::string_view> *extractions_;
const std::unordered_map<std::string, std::string> *extractions_;
const nlohmann::json *context_;
const std::unordered_map<std::string, std::string> *environ_;
const envoy::config::core::v3::Metadata *cluster_metadata_;
Expand Down Expand Up @@ -82,15 +83,24 @@ class Extractor : Logger::Loggable<Logger::Id::filter> {
absl::string_view extract(Http::StreamFilterCallbacks &callbacks,
const Http::RequestOrResponseHeaderMap &header_map,
GetBodyFunc &body) const;

std::string extractDestructive(Http::StreamFilterCallbacks &callbacks,
yuval-k marked this conversation as resolved.
Show resolved Hide resolved
const Http::RequestOrResponseHeaderMap &header_map,
GetBodyFunc &body) const;
ExtractionApi::Mode mode() const { return mode_; }
jbohanon marked this conversation as resolved.
Show resolved Hide resolved
private:
absl::string_view extractValue(Http::StreamFilterCallbacks &callbacks,
absl::string_view value) const;
std::string replaceIndividualValue(Http::StreamFilterCallbacks &callbacks,
absl::string_view value) const;
std::string replaceAllValues(Http::StreamFilterCallbacks &callbacks,
absl::string_view value) const;

const Http::LowerCaseString headername_;
const bool body_;
const unsigned int group_;
const std::regex extract_regex_;
const std::optional<std::string> replacement_text_;
jbohanon marked this conversation as resolved.
Show resolved Hide resolved
jbohanon marked this conversation as resolved.
Show resolved Hide resolved
const ExtractionApi::Mode mode_;
};

class InjaTransformer : public Transformer {
Expand Down
15 changes: 15 additions & 0 deletions test/extensions/filters/http/transformation/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,21 @@ envoy_gloo_cc_test(
],
)

envoy_gloo_cc_test(
name = "inja_transformer_replace_test",
srcs = ["inja_transformer_replace_test.cc"],
repository = "@envoy",
deps = [
"//source/extensions/filters/http/transformation:inja_transformer_lib",
"@envoy//source/common/common:random_generator_lib",
"@envoy//source/common/common:base64_lib",
"@envoy//test/test_common:environment_lib",
"@envoy//test/mocks/http:http_mocks",
"@envoy//test/mocks/server:server_mocks",
"@envoy//test/mocks/upstream:upstream_mocks",
],
)

envoy_cc_test_binary(
name = "inja_transformer_speed_test",
srcs = ["inja_transformer_speed_test.cc"],
Expand Down
Loading
Loading