From e683e55370363f0248e931509fc18b97d2aacca1 Mon Sep 17 00:00:00 2001 From: nabihanaqvie Date: Wed, 5 Jan 2022 13:35:58 -0500 Subject: [PATCH] Update README.md --- README.md | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 3be77ca..da1be0b 100644 --- a/README.md +++ b/README.md @@ -1 +1,10 @@ -# SANA-Project \ No newline at end of file +# SANA-Project + +The SANA project's goal is to create a Islamic-specific database for research purposes. My contribution to this goal is to create a model that would predict category based on Abstract and Title. + +In order to accomplish this, I have currently divided the work with taking removig no punctuation and removing punctuation to see the overall noise difference it creates. + +Next Steps: +1) Create a dictionary or import a list of arabic names for grouping. Ex: Mohammed and Mohamad --> Mohammad +2) Include removing some punctuation vs others +3) Write machine learning classifiers as a pipeline