Predicting Federal Campaign Contributions using Demographic Data

Political donations are terribly important.

  • First, it's expensive to run for political office. Experts estimate that running for (and winning)
    the presidency in 2016 cost somewhere between 3 and 5 billion dollars.
  • Second, donations by individuals are an act of investment. People are essentially buying in
    to the process, and probably expect something in return.

One challenge is that the majority of money comes from the minority of donors, obfuscating exactly
who the campaign is meant to represent.

For my Insight Data Science Fellowship project, I built an interactive web application that uses US
Census demographics to predict donations in a geographical area, thus helping campaign fundraisers
answer two critical questions:

  1. Who is the candidate supposed to represent?
  2. Where should they target their efforts to raise more money?


This dashboard displays a random sample of 50,000 individual donations given to federal election candidates and political action campaigns (PACs) in the 2015-2016 election cycle.

  • The heatmap (right) shows the distribution of donations.
  • The histograms and bargraphs (left) represent demographic features associated with these donations.
  • The 'Occupation', 'Donation Size', and 'Demographic Cluster' bar graphs are responsive: click on a bar to see more detailed information about donors in that group.
  • Click on the 'Donations over Time' (upper left) to zoom in on a particular date range.
  • Markers on the map represent 'under-performing' zip codes. Click on a marker to see the zip code and amount ($) by which it is underperforming.

Number of Donations over Time (2015 - 2016)
Contributors by Occupation Type
Donation Sizes
Histogram of Cluster Donations
Demographic Cluster
HeatMap of Donations
Number of Donations

Tess Jeffers


To predict campaign contributions in a zip code, I'm using:

  1. Individual Campaign Contributions, aggregated by the Federal Elections Committee,
  2. US Demographic Data from the American Community Survey

After normalizing and correcting for missing and skewed data, I built a 135-feature Gradient Boosting Regression model to predict the total campaign contributions in a zip code using the demographic features of that area.

In addition, I used hierarchical agglomerative clustering to generate demographic profiles that I could use to better understand contribution behavior across different demographic clusters.

Together, this yields a predictive model for targeting and visualizing underperforming zip codes. Take a look at the "MONEYTALK$ DASHBOARD".

For each zip code in the US, I calculate the expected total $ it should raise. If perfectly accurate, all zip codes (blue dots) should fall on the 45 degree line (gold line). Of course, my model isn't perfect, and in fact these outliers represent an opportunity. Take for example the zipcode '10024', the Upper West Side in Manhattan. My model predicts the UWS should raise ~ $1,700,000. In actuality, this zip code alone raised over $17,000,000. Knowing the location of these 'over-performing' zip codes could be useful for planning high-budget donation events, like $10,000 / plate dinners, galas, etc.

On the other end of the spectrum, 'under-performing' zip codes represent locations where, for whatever reason, people have not donated to the capacity that the model predicts they should. These zip codes represent an opportunity for a candidate to refine their strategy, send out additional flyers, or otherwise attempt better messaging to connect with the demographics in that area.


Take a look!


I am currently a Data Science Fellow at Insight Data Science, NYC.

I recently completed my PhD in Quantitative and Computational Biology from Princeton University,
where I studied how the genome is folded inside our cells. That project taught me how to use big data
and clever analyses to visualize unsee-able microscopic trends.

In my free time, you can find me exploring NYC, going on very long walks (see for example:
The Great Saunter), and curating a small insect collection.

Let's connect!