Korean Grammar Correction Model based on LLM
A Project for Introduction to Text Processing(LIS3813)
This project is ongoing.
-
Dataset: 국립국어원 맞춤법 교정 말뭉치 2022
-
Backbone Model: KoBART(gogamza/kobart-base-v2)
- Baseline Model Link (HuggingFace)
- Distilled Model Link (HuggingFace)
- Tiny Distilled Model Link (HuggingFace)
-
Requirements
torch transformers
-
Inference
from transformers import BartConfig from transformers import AutoTokenizer, AutoModelForSeq2SeqLM from transformers import pipeline checkpoint = 'theSOL1/kogrammar-base' tokenizer = AutoTokenizer.from_pretrained(checkpoint) config = BartConfig.from_pretrained(checkpoint) model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint, config=config, device_map='auto') pipe = pipeline('text2text-generation', model=model, tokenizer=tokenizer) sample_text = 'ㄴㅏ는 ㄱㅏ끔 눈물을흘린다' corrected_text = pipe(sample_text) print(corrected_text)