This repository contains the code for CBS News' 2022 analysis of FBI data related to homicide clearance rates in the US. It includes all files used in this analysis, including code, documentation, and data files.
This repository uses Make to create a workflow that can be easily reproduced with a single command.
The project is divided into tasks, each of which is contained in its own directory:
Task folder | Description |
---|---|
Extract | Turns the raw annual fixed-width files from the FBI into single csv files. |
Transform | Cleans the outputs of extract tasks |
Merge | Merges any files before loading them |
Load | Loads the outputs of transformations into a database using Django |
Report | Generates reports using Jinja, which are sent to each individual local station |
View the README file in each task folder for additional documentation of that task.
The data files themselves are not uploaded to this repository because they total more than 50gb. Instead, to reproduce this workflow from a fresh clone, run the following:
make venv/bin/activate
source venv/bin/activate
make raw
This runs a series of commands in the root Makefile that initializes your virtual environment, installs all python dependencies, and downloads the input files to the appropriate folder in raw/.
NOTE: these files are very large and this can take over an hour depending on your internet speed
The python files in scripts/ are used to run various stages of the workflow, for example downloading the raw data files.
documents/ contains PDF files that are not part of the workflow itself but were used in its creation, for example the fixed-with file schemas used in extract/.
NOTE: This was created using Linux and uses Linux tools, so it will not work from windows unless you run it using wsl.
- Clone this repository.
- Run the steps to download the raw data as described above.
- Run
make