Skip to content

Latest commit

 

History

History
27 lines (21 loc) · 1.05 KB

File metadata and controls

27 lines (21 loc) · 1.05 KB

Quora-Insincere-Questions-Classification

Quora Insincere Questions Classification Project Under Dr. Sri Phani Krishna Karri at NIT AP.

DATA SOURCE - https://www.kaggle.com/c/quora-insincere-questions-classification/data

RESEARCH PAPER USED - http://cs229.stanford.edu/proj2019aut/data/assignment_308832_raw/26647500.pdf
https://arxiv.org/pdf/1810.04805.pdf

LIBRARIES - pandas, numpy, matplotlib, seaborn, natural language processsing toolkit, regular expression, textblob, wordcloud, PIL, tensorflow, Keras, scikit learn, collections (will keep updating them).

PREPROCESSING TECHNIQUES - 1.Lowercasing 2.Removing HTML 3.Removing Email-id 4.Removing the URLS 5.Removing the uncessary whitespaces. 6.Removing stopwords 7.Lemmatization 8.Stripping Possessives 9.Removing Special Characters 10.Expanding contractions 11.Stemming (Snowball) 12.Removing Punctuations

MODELS - Bert, Naive bayes, CNN and, Logistics Regression.

Final Submission - https://drive.google.com/drive/folders/16wM0fso_SohQUxFHze-5qplPqm-o6xBB?usp=sharing