Please read this note from the Federal Election Commission on permissible use of this data before cloning or contributing
The FEC publishes individual and PAC contributions to candidates in a searchable web app and as bulk downloads.
The goal of this project is to lower the barrier to entry for analysis of money in US politics by loading the bulk downloads into a Postgres database. This helps in a couple of ways:
- Some of the tables (particularly the individual_contributions table) get prohibitively large for ad-hoc analysis using pandas or excel on a normal laptop. Whether or not Postgres will solve this for you depends on your machine, the number of years of data you load, and the queries you run, but it will help in many situations.
- The normalization of the dataset across a few tables tables lends itself to manipulation using SQL. There are some examples at the end of this README that help demonstrate this.
If you don't have Homebrew go here and install it.
Then run this to install postgres on your machine and then start it as a service. If you do this, then there is no further configuration needed to use the scripts here.
brew install postgres
brew services start postgres
This repo consists of two components right now:
- A set of postgres table definitions that are compatible with the schema published by the FEC and the data available in their bulk downloads. These are annotated with the comments in the FEC's data definition files.
- A bash script that downloads files from the FEC's public S3 bucket, does some minor character formatting to make sure the files are ready to be loaded, and copies them into the tables defined in 1.
It can be used like this to load data from the last two election cycles -- 2020 and 2018 -- into a postgres database:
sh load-fec-year.sh 2020 2018
Specify postgres configurations in environment variables by following the official postgres documentation and running something like:
PGHOST=localhost PGUSER=postgres PGPASSWORD=mysecretpass sh load-fec-year.sh 2020 2018
All tables have an additional column added called file_year
. This corresponds to the election
cycle that the bulk download files were associated with. Adding file_year
makes it easy to add
files from multiple election cycles into the same database and run queries across all of them.
The FEC documentation is very thorough so reviewing that is essential for
navigating the schema -- just be aware that the file_year
is being added to all tables.
For example, this query will show expenditures by committees in opposition of a candidate by election cycle:
SELECT
SUM(ccc.transaction_amt) transaction_total,
cm.cmte_nm committee_name,
cm.file_year,
cand.cand_name candidate_name
FROM committee_candidate_contributions ccc
JOIN committee_master cm
ON cm.cmte_id = ccc.cmte_id AND cm.file_year = ccc.file_year
JOIN candidate_master cand
ON cand.cand_id = ccc.cand_id AND cand.file_year = ccc.file_year
WHERE (ccc.transaction_tp = '24A' OR ccc.transaction_tp = '24N') -- https://www.fec.gov/campaign-finance-data/transaction-type-code-descriptions/
GROUP BY
cm.file_year,
ccc.cand_id,
cand.cand_name,
cm.cmte_nm
ORDER BY transaction_total DESC
LIMIT 100
This will show the employers who are listed most often in individual contributions to campaign committees by election cycle:
SELECT employer, file_year, count(*) distinct_contributions
FROM individual_contributions ci
JOIN committee_master cm
USING(cmte_id, file_year)
GROUP BY employer, file_year
ORDER BY distinct_contributions DESC
LIMIT 100