-
Hi lovely folks, We have bindings for the tokenizers library. At some point, we need to create an Therefore, I want to go from a In the current code, we use However, in case it does make a copy, how could we go from Term to Cow? I was hoping I could go to Binary and then to a slice, but my Rust skills are failing me. I also tried to use the new I would appreciate any help you could give, thank you :) |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 6 replies
-
This indeed allocates a new string, as the rust
If we decode the term to a We can then use the The slightly tricky part comes from managing the lifetimes, we need to introduce an explicit lifetime annotation to carry the lifetime of the input term into the output type. fn term_to_encode_input<'a, 'b>(term: &'a Term<'b>) -> Result<EncodeInput<'b>, ExTokenizersError> {
if let Ok(seq) = term.decode::<&'b str>() {
Ok(EncodeInput::Single(seq.into()))
} else if let Ok((seq1, seq2)) = term.decode::<(&'b str, &'b str)>() {
Ok(EncodeInput::Dual(seq1.into(), seq2.into()))
} else {
Err(ExTokenizersError::Other(String::from(
"input must be either a string or a tuple",
)))
}
} Here we end up with the |
Beta Was this translation helpful? Give feedback.
-
Just a tiny note: Decoding as What was the issue with binary's |
Beta Was this translation helpful? Give feedback.
This indeed allocates a new string, as the rust
String
type manages its own memory.If we decode the term to a
&str
, that does not allocate and instead points to the binary allocated by the BEAM.We can then use the
&str
to construct aCow
, which should not allocate.The slightly tricky part comes from managing the lifetimes, we need to introduce an explicit lifetime annotation to carry the lifetime of the input term into the output type.