End-to-end data engineering project where we extract LessWrong posts information, and visualize trends in AI Alignment posts.
Scraping latest LessWrong posts using requests & Beautiful Soup to get the raw html information
Extract information from the HTML like title, author name, post tags and karma.
- after scraping the latest n posts, we store the raw data on AWS S3 as object storage