The Supreme Court v. The Court of Public Opinion

This is my capstone project for The Data Incubator, an 8 week data science program I took part in during the summer of 2018. The goal of this project is to build a tracker which displays public opinion and Supreme Court opinions over time in an effort to understand if and how they are correlated.

See the README on GitHub for implementation details, but here is the outline. Take a corpus of plain text SCOTUS opinions (which I obtained with the help of Mike Lissner, Free Law Project, and CourtListener) and public opinions (I used ANES). When a user inputs a set of keywords, we filter the relevant SCOTUS opinions and public opinion Q+A's. This is really just a keyword search, which I implemented using TF-IDF.

The Supreme Court's own database has binary variables indicating whether the decision leans liberal or conservative, which is the axis we're going to use. There are many spectra to consider, but this one is the most straightforward. The vote ratio gives us the polarity of the SCOTUS opinions.

Public opinion is slightly trickier, as we need to analyze the political sentiment of each Q+A pair. There are several response types such as {YES, NO}, {STRONGLY AGREE, AGREE, NEUTRAL, DISAGREE, STRONGLY DISAGREE}, a number 0-100 indicating 'warmth of feeling' towards something, and more. We want to map each pair to a number in [-1,+1], and there seem to be two strategies to accomplish this, both of which involve some machine learning.

We could try to make use of the response words and merge them with questions to form statements, which are easier to analyze. For instance, (Q: "Do you believe the government should play an active role in citizens' lives?"). If (A: "Yes."), then (Q: "...")+(A: "...") = (S: "Yes, I believe the gov't should play an active role in citizen's lives"). When passed through a second neural network (a CNN trained to measure political bias), we expect a positive number between 0 and 1 to pop out. If the response types include "strongly agree", then we may obtain (S: "I believe strongly that the gov't should play an active role in citizen's lives."), which should register as close to +1 indicating a liberal sentiment.

The issue with that strategy is heavy reliance on the sentiment analyzer, which may be very noisy. By merging the repsonses into sentences, you lose the direct results of the survey which may result in a severe loss of accuracy. Here is an alternate, hybrid approach. For each repsonse type, numeric or text, scale the responses to [-1,+1] only in terms of positive or negative sentiment without considering political bias yet. That is, {YES,NO} -> {+1,-1}, {STRONGLY AGREE, ..., STRONGLY DISAGREE} -> {+1,+0.5,0,-0.5,-1}, etc. Separately, classify the questions into one of three categories {-1,0,+1} based on whether an affirmative response to the question would be considered liberal (+1), conservative (-1), or irrelevant/undecided (0). This is a pre-processing step that can be done however you like - with a trained classifier, by hand, with a group of people, etc. It may be possible to use the above strategy where we only merge the word "YES" with the question and see if the result is positive or negative, or even just the relative difference from Q+"NO". As ANES only has 1,000 questions, I opted to go ahead and classify as many as I could by hand as the questions are a little messy and this will provide a benchmark for any future classifier. Multiplying the question orientation and response magnitude together gives the polarity.

The user interface is a Flask webapp which accepts the keywords through a form, and displays a plot of the data and linear regressions, or other best fit curve, for each and a number measuring their correlation.