Data science work for RedditInsight.
- Segmented data by subreddit
- Used NLTK to separate the words in titles by their parts of speech
- Developed frequency analysis of nouns by subreddit
- Munged dataset for predictive model- extracted day of week, and hour of day the post was created. Developed categorical variable out of the subreddit and domain features.
- Evaluated predictive value of model, decided to focus on data visualizations.
- Developed clustering analysis of subreddit data for subreddits that had natural topic segmentation.
- Developed noun frequency analysis by subreddit
- Visualizations created from this work are in- https://github.com/sheltowt/redditD3