An overview of text mining (notes)
Trump tweets: which are the words most likely to be from Android and most likely from iPhone?
The example we replicated in class, worth a read: Text analysis of Trump's tweets confirms he writes only the (angrier) Android half
NRC Word-Association Lexicon -- the sentiment classifier we used in class (via the syuzhet R package)
Stanford's Core NLP library -- the standard programming library for natural language processing
This assignment is not required. You may turn it in by email (galkamaxd at gmail) or in person at class.
Due: 22-Mar
Choose 3 major brands that have an active presence on Twitter (e.g. Coca Cola, McDonald's, Apple, etc) and compare the sentiment of tweets that mention them. The sample size for each should be at least 1,000.
To collect the tweets, you may use either the Twitter streaming or rest API, searching for the brand's Twitter handle as a keyword (e.g. "searchTwitter('@CocaCola', n=4000)"). Retweets should be excluded from the analysis (any tweet where "isRetweet" = true).
To classify the sentiment, use the syuzhet R package as we did in class. In that example, we classified the sentiment of individual words. In this case, the items you are classifying are the full text of each tweet. The command for running the classification should look something like this: get_nrc_sentiment(tweetDf$text).
The final results and all code used.