News-based Sentiment Analysis with GDELT

Introduction

In this project, we explore the feasibility of using news articles to predict/interpolate the relationship (friendliness) between important geo-political entities, such as big companies, politicians, and military representatives. We hope to analyze and, ideally, forecast the trend of socioeconomic conflicts centered around these entities. For instance, we ask questions like "Is the relationship between Shell and Nigeria governors worsening?" or "How many 'hidden parties' exist within these politicians?".

In doing so, we first scrape all relevant news articles using the URLs from the GDELT Project database. Next, we tokenize the articles into sentences, and we detect the entity co-mentions within each sentence. Whenever there is a co-mention detected within a sentence, e.g., "A and B failed to resolve their disputes across a wide range of issue areas.", we calculate a Sentiment Score based on the Goldstein Conflict Score by detecting the event(s) mentioned in that sentence. We use these sentiment scores as a proxy for the friendliness between the interested geo-political players. Finally, we construct a relationship graph using the co-mention edges, together with tonality scores, to perform analysis, e.g., graph clustering, and visualizations.

To sum, in this repo, we archived the code snippets for,

News scraping given URLs, text cleaning, and sentence tokenization
Entity mention detection (workable, but in development)
Co-reference resolution (experimenting)
Event detection and scoring (experimenting with advanced event extraction features)
Graph clustering and visualization

Environment Setup

CONDA IS REQUIRED FOR SETUP. To create the same environment, please run conda create -f environment.yml in terminal/command line. The environment file locates in the project home folder. To activate the newly created environment, run conda activate pennguin

Get Started with Source Code

For low-level APIs, i.e., scraping, co-mention detection, and event detection & grading, please directly refer to the source code under $REPO_FOLDER/src. Detailed description and sample usage are documented in the source file.

For graph clustering and visualization, please refer to $REPO_FOLDER/examples and read the Jupyter notebooks.

Misc

$REPO_FOLDER/analysis contains code for past analysis. Each analysis has its own source code folder, data folder, and output folder.
$REPO_FOLDER/data contains global data files shared across the entire project.

Name		Name	Last commit message	Last commit date
Latest commit History 144 Commits
analysis		analysis
data		data
examples		examples
models		models
pictures		pictures
src		src
tmp		tmp
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

News-based Sentiment Analysis with GDELT

Introduction

Environment Setup

Get Started with Source Code

Misc

About

Releases

Packages

Languages

License

Guest400123064/PennGUIN

Folders and files

Latest commit

History

Repository files navigation

News-based Sentiment Analysis with GDELT

Introduction

Environment Setup

Get Started with Source Code

Misc

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages