Skip to content

Guest400123064/PennGUIN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

News-based Sentiment Analysis with GDELT

Introduction

In this project, we explore the feasibility of using news articles to predict/interpolate the relationship (friendliness) between important geo-political entities, such as big companies, politicians, and military representatives. We hope to analyze and, ideally, forecast the trend of socioeconomic conflicts centered around these entities. For instance, we ask questions like "Is the relationship between Shell and Nigeria governors worsening?" or "How many 'hidden parties' exist within these politicians?".

In doing so, we first scrape all relevant news articles using the URLs from the GDELT Project database. Next, we tokenize the articles into sentences, and we detect the entity co-mentions within each sentence. Whenever there is a co-mention detected within a sentence, e.g., "A and B failed to resolve their disputes across a wide range of issue areas.", we calculate a Sentiment Score based on the Goldstein Conflict Score by detecting the event(s) mentioned in that sentence. We use these sentiment scores as a proxy for the friendliness between the interested geo-political players. Finally, we construct a relationship graph using the co-mention edges, together with tonality scores, to perform analysis, e.g., graph clustering, and visualizations.

To sum, in this repo, we archived the code snippets for,

  • News scraping given URLs, text cleaning, and sentence tokenization
  • Entity mention detection (workable, but in development)
  • Co-reference resolution (experimenting)
  • Event detection and scoring (experimenting with advanced event extraction features)
  • Graph clustering and visualization

Environment Setup

CONDA IS REQUIRED FOR SETUP. To create the same environment, please run conda create -f environment.yml in terminal/command line. The environment file locates in the project home folder. To activate the newly created environment, run conda activate pennguin

Get Started with Source Code

For low-level APIs, i.e., scraping, co-mention detection, and event detection & grading, please directly refer to the source code under $REPO_FOLDER/src. Detailed description and sample usage are documented in the source file.

For graph clustering and visualization, please refer to $REPO_FOLDER/examples and read the Jupyter notebooks.

Misc

  • $REPO_FOLDER/analysis contains code for past analysis. Each analysis has its own source code folder, data folder, and output folder.
  • $REPO_FOLDER/data contains global data files shared across the entire project.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published