Skip to content

Latest commit

 

History

History
10 lines (7 loc) · 609 Bytes

README.md

File metadata and controls

10 lines (7 loc) · 609 Bytes

SANA-Project

The SANA project's goal is to create a Islamic-specific database for research purposes. My contribution to this goal is to create a model that would predict category based on Abstract and Title.

In order to accomplish this, I have currently divided the work with taking removig no punctuation and removing punctuation to see the overall noise difference it creates.

Next Steps:

  1. Create a dictionary or import a list of arabic names for grouping. Ex: Mohammed and Mohamad --> Mohammad
  2. Include removing some punctuation vs others
  3. Write machine learning classifiers as a pipeline