Political donations are terribly important.
One challenge is that the majority of money comes from the minority of donors, obfuscating exactly
who the campaign is meant to represent.
For my Insight Data Science Fellowship project, I built an interactive web application that uses US
Census demographics to predict donations in a geographical area, thus helping campaign fundraisers
answer two critical questions:
This dashboard displays a random sample of 50,000 individual donations given to federal election candidates and political action campaigns (PACs) in the 2015-2016 election cycle.
To predict campaign contributions in a zip code, I'm using:
After normalizing and correcting for missing and skewed data, I built a 135-feature Gradient Boosting Regression model to predict the total campaign contributions in a zip code using the demographic features of that area.
In addition, I used hierarchical agglomerative clustering to generate demographic profiles that I could use to better understand contribution behavior across different demographic clusters.
Together, this yields a predictive model for targeting and visualizing underperforming zip codes. Take a look at the "MONEYTALK$ DASHBOARD".
For each zip code in the US, I calculate the expected total $ it should raise. If perfectly accurate, all zip codes (blue dots) should fall on the 45 degree line (gold line). Of course, my model isn't perfect, and in fact these outliers represent an opportunity. Take for example the zipcode '10024', the Upper West Side in Manhattan. My model predicts the UWS should raise ~ $1,700,000. In actuality, this zip code alone raised over $17,000,000. Knowing the location of these 'over-performing' zip codes could be useful for planning high-budget donation events, like $10,000 / plate dinners, galas, etc.
On the other end of the spectrum, 'under-performing' zip codes represent locations where, for whatever reason, people have not donated to the capacity that the model predicts they should. These zip codes represent an opportunity for a candidate to refine their strategy, send out additional flyers, or otherwise attempt better messaging to connect with the demographics in that area.
Take a look!
I am currently a Data Science Fellow at Insight Data Science, NYC.
I recently completed my PhD in Quantitative and Computational Biology from Princeton University,
where I studied how the genome is folded inside our cells. That project taught me how to use big data
and clever analyses to visualize unsee-able microscopic trends.
In my free time, you can find me exploring NYC, going on very long walks (see for example:
The Great Saunter), and curating a small insect collection.