This study focused on analysing Sinhala news reports published online to extract important features using text mining and machine learning techniques. Then, represent this extracted information in a way that readers find it easy to read news or do research on reports published in the past.
As a contribution to the future research on Sinhala NLP, most of the resources developed under this project like code snippets, datasets and other lexical resources are made publicly available in this repository.
The study was presented at Ruhuna International Science and Technology Conference (RISTCON 2018) on 15th February 2018.
Keywords: Sinhala language, Natural language processing, Sinhala NLP, Feature selection, Text classification, Text clustering