Would you accept a convenience function for replace_all
that builds the AhoCorasick
automaton for you?
#152
-
Hi Andrew!
But for my interest, more importantly, It's hard to line up the Both these problems can be eliminated with this utility fn I made: const fn aho_corasick_builder<'a, const N: usize>(
tuples: [(&'a str, &'a str); N],
) -> ([&'a str; N], [&'a str; N]) {
let mut lhs = [""; N];
let mut rhs = [""; N];
let mut i = 0;
while i < N {
lhs[i] = tuples[i].0;
rhs[i] = tuples[i].1;
i += 1;
}
(lhs, rhs)
} So now I can write code that looks like this: const REPLACER: ([&str; 19], [&str; 19]) = aho_corasick_builder([
("^", "\\^"), // Boost: Used to indicate boosting in query relevance scoring.
(":", "\\:"), // Field: Used to separate field names and values.
(" TO ", " \\TO "), // Range: Indicates a range query (e.g., date ranges).
("[", "\\["), // Range: Used to denote inclusive range queries.
("]", "\\]"), // Range: Used to denote inclusive range queries.
("{", "\\{"), // Range: Used to denote exclusive range queries.
("}", "\\}"), // Range: Used to denote exclusive range queries.
("<", "\\<"), // Range: Indicates less-than range.
(">", "\\>"), // Range: Indicates greater-than range.
("=", "\\="), // Range: Indicates equality.
("+", "\\+"), // Sign: Indicates that the term must appear in the documents.
("-", "\\-"), // Sign: Indicates that the term must not appear in the documents.
("&&", "\\&&"), // Boolean: Logical AND operator.
("||", "\\||"), // Boolean: Logical OR operator.
("!", "\\!"), // Boolean: Logical NOT operator.
("~", "\\~"), // Proximity: Indicates proximity searches.
("?", "\\?"), // Wildcard: Represents a single character.
("\\", "\\\\"), // Escape: Used to escape special characters.
("/", "\\/"), // Regex: Used to begin or end a regular expression.
]); Very readable, and hard to make a mistake. (I wish we could destructure consts into two lists, but we cant.) I was wondering if it would would interest you incorporate this directly into aho-corasick? And generally what your thoughts are on multiple lists, vs lists of tuples. Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 5 replies
-
Thanks for reaching out. I did find your framing here a little confusing because Combining them into one API seems like bad juju to me, because it means you'll have to pay for Moreover, I think that if lining up the patterns and the replacements is proving tricky for you, and you don't mind paying the construction cost of Note also the existence of |
Beta Was this translation helpful? Give feedback.
Thanks for reaching out. I did find your framing here a little confusing because
AhoCorasick::replace_all
only accepts one slice. But you mention two slices. After I read through your full post, I think I understand: you're talking about the original sequence of patterns (used to buildAhoCorasick
) and then the sequence of replacements for each pattern. But these are two entirely separate parts of the API.Combining them into one API seems like bad juju to me, because it means you'll have to pay for
AhoCorasick
construction every single time you call it. Building aAhoCorasick
can be somewhat expensive, so I'm not sure there's really enough benefit here to be worth that cost.Moreover, I …