Seminar Project in Social Informatics of Large Language Models, with the goal to assess the capability of state-of-the-art LLM GPT-4 to accurately mimic German parliamentary politician speeches.
The code directory contains the jupyter notebooks needed to recreate our project. Each notebook includes a clarification in its header of who worked on the respective code. The data directory contains some of the data necessary for our project. Since some of our data files are too large for GitHub we have provided them via cloud storage. For simplicities sake we would recommend downloading the data directory from the cloud storage and replacing the data directory in the repository with it.\
The notebooks should be run in the following order:
- speeches_preprocessing.ipynb
- prompt_engeniering.ipynb
- analysis_manifestoberta.ipynb / analysis_pca.ipynb
This documents who was responsible for what parts of our final report:
Introduction - Andri, Elias
Data & Methods
Speech Selection - Andri, Elias, Jakob
Politician Selection - Andri
GPT-4 Prompting - Andri
Placement Analysis - Elias, Jakob
Results
Manifestoberta - Jakob
Doc2Vec - Elias
Discussion - Elias, Jakob\