You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CLEANED| 02-03 16:09:00 天润乳业公告,拟在新疆生产建设兵团第十二师222团投资建设头规模化奶牛示范牧场项目,222团予以提供牧场运营所需配套资源,222团保证二十年内免费提供本项目使用的设施农业用地,免征土地租赁费,并保障项目生产经营所需水、电等基础配套设施。 |`
as shown above, the year num "2021" and the "10000" num is deleted, what config cause the deleting?
`ORIGINAL| 2021-02-03 16:09:00 天润乳业公告,拟在新疆生产建设兵团第十二师222团投资建设10000头规模化奶牛示范牧场项目,222团予以提供牧场运营所需配套资源,222团保证二十年内免费提供本项目使用的设施农业用地,免征土地租赁费,并保障项目生产经营所需水、电等基础配套设施。 |
my config file is:
basic: batch_size: 3000 input: Astock_all_converted.jsonl is_jsonl: true num_workers: 32 output: Astock_all.jsonl result_key: target source_key: target extractors: ContentExtractor: save_key: pageContent TimeExtractor: save_key: pagePublishTime TitleExtractor: save_key: pageTitle filters: SimplifiedFilter: config_file: t2s.json SymbolFilter: filter_control: true filter_emoji: true TextCleaner: filter_extraspace: true filter_personal: true filter_url: true TextIntegrityChecker: do_end_clip: true double_mark_check: true end_mark_check: true length_check: true min_length: 16
The text was updated successfully, but these errors were encountered: