Skip to content

Generated pseudo text using LSTMs (Long Short Term Memory networks) and GPT-2, evaluated how close this machine-generated text is to human-generated text by checking if they follow statistical features followed by human-generated text such as Zipf’s and Heap’s Laws for Words

Notifications You must be signed in to change notification settings

sumedhachugh/Neural-Language-Generated-Text

Repository files navigation

Empirical Laws of Natural Language Processing for Neural Language Generated Text

Many sequence models show great work in generating human like text, but the amount of research work done to check the extent up to which their results match the man-made texts are limited in number.

In this work, the text is generated using Long Short Term Memory networks (LSTMs) and Generative Pretrained Transformer-2 (GPT-2). The text by neural language models based on LSTMs and GPT-2 follows Zipf’s law and Heap’s law, two statistical representations followed by every natural language generated text. One of the main findings is about the influence of parameter Temperature on the text produced. The LSTM generated text improves as the value of Temperature increases. The comparison between GPT-2 and LSTM generated text also shows that text generated using GPT-2 is more similar to natural text than that generated by LSTMs.

Sherlock.txt: Dataset

LSTM_Text_Generator_colab.ipynb: Explored and cleaned dataset, tuned hyper-parammeters and generated LSTM model

LSTM_Text_Exploration.ipynb: Generated text using model generated in previous N.B Verified Zipf's and Heap's law on data generated

If you use this work, Please cite it as:

Sumedha, Rohilla, R. (2021). Empirical Laws of Natural Language Processing for Neural Language Generated Text. In: Bhattacharya, M., Kharb, L., Chahal, D. (eds) Information, Communication and Computing Technology. ICICCT 2021. Communications in Computer and Information Science, vol 1417. Springer, Cham. https://doi.org/10.1007/978-3-030-88378-2_15

About

Generated pseudo text using LSTMs (Long Short Term Memory networks) and GPT-2, evaluated how close this machine-generated text is to human-generated text by checking if they follow statistical features followed by human-generated text such as Zipf’s and Heap’s Laws for Words

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published