Why are the instructions always in English? #771
Replies: 1 comment 1 reply
-
Hi @thesofakillers 👋 It is true almost all tasks with non-English input/outputs have English instructions. I should also add that, when it comes to building multilingual models, less diverse inputs is often less of a concern -- multi-lingual LMs often can parse inputs of various language (e.g., a multilingual LM trained on SQuAD can reasonably generalize to similar questions in other lang). The main bottleneck is being able to generate text in various scripts which depends on outputs being in diverse languages (which we support). FYI @yizhongw |
Beta Was this translation helpful? Give feedback.
-
I was quite surprised to find out that tasks where both the input and output are not in English, the instructions remain in English.
Is there a specific design reason for this other than accessibility?
The instructions are technically part of the training input for most of the best performing models in Table 4 of the Tk-instruct paper, this way the input is ultimately always at least partly in English.
This makes the dataset dependent on the presence of English capabilities of the model, making it hard to evaluate performance on a given non-english language completely independently.
I think it would be of value, perhaps for v3, to provide the instructions also in the input and output languages besides English. This would make the dataset more flexible and allow experimenters more freedom.
Beta Was this translation helpful? Give feedback.
All reactions