An NLP program to identify whether the news article is real or fake using Python in Jupyter Notebook
- id: unique id for a news article
- title: the title of a news article
- author: author of the news article
- text: the text of the article; could be incomplete
- label: a label that marks the article as potentially unreliable • 1: unreliable • 0: reliable
We have ‘train.csv’ file which is our dataset. It contains 2007 record almost 50% 1’s and 0’s. We will check whether is there any null values. If there are any, we will fill it with empty strings.
- Pandas
- Stopwords
- PorterStemmer
- Regular Expressions
- TF IDF Vectorizer
- Decision Tree Classifier
- Pickle
- We just load the dataset. Check the top five rows and information about the dataset.
- We don’t want any NULL values so we checked if there are any NULL values and we got many NULL, so we filled the null values with empty string.
- After that we remove the ID, title and author column because we don’t need it in our project.
- I imported the library that were needed. And created a object of PortStemmer class.
- And test with a sample sentence.
- I created a function name stemming which will remove all the regular expressions, convert upper case letter into lower case, remove stopwords and also removes blank space.
- We split the data into training and testing in 80 , 20 ratio. 80 for training and 20 for testing.
- We used the decision tree as our algorithm in this project.
- It gives us the accuracy of almost 83%.
- We imported pickle library to store the data.
- We created the fake news function which will detect the news is fake or not.
If we want to check the live news. We will have to put the live news in the dataset first.
Because if we don’t put the live news into the dataset and directly feed to the program it won’t work.
Because the live news didn’t go through any process of stopwords removal, converting the upper-case letter into lower case, neither it goes through the process of porter Stemmer nor the decision tree classifier.
So, it will not give the accurate result.
Once we put the news into the dataset. It will go through all the process and the result will be accurate.