What is the difference between preprocessing tokenizer and input_features tokenizer? #2230
-
Hi, I'm trying to understand the difference between the following two configurations: "input_features": [
{
"name": "both_text",
"type": "text",
"preprocessing": {
"tokenizer": "space",
},
"encoder": "rnn",
"cell_type": "lstm",
"num_layers": 8,
"reduce_output": None,
}
],
"preprocessing": {
"split_probabilities": [0.8, 0.1, 0.1],
}, and "input_features": [
{
"name": "both_text",
"type": "text",
"encoder": "rnn",
"cell_type": "lstm",
"num_layers": 8,
"reduce_output": None,
}
],
"preprocessing": {
"split_probabilities": [0.8, 0.1, 0.1],
"text": {
"tokenizer": "space",
},
}, They both appear to be tokenizing the input_text based on Can anyone please tell me whats the difference between using Also, if I use tokenizer setting in both
Thanks |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hi @farazk86! A couple of questions for you regarding the error you are running into:
The two configurations you provided should be the same regarding preprocessing. Feature-specific preprocessing configuration parameters can override global preprocessing configuration parameters. This is documented here. |
Beta Was this translation helpful? Give feedback.
Hi @farazk86!
A couple of questions for you regarding the error you are running into:
The two configurations you provided should be the same regarding preprocessing. Feature-specific preprocessing configuration parameters can override global preprocessing configuration parameters. This is documented here.