Data Science for Social Good fellows present their project results

Published on September 3, 2020

This year's Data Science for Social Good teams tackled timely issues, conducting projects to identify disinformation articles about COVID-19 and detect minority vote dilution resulting from geographic boundary setting in state, city, county and school board districts. Image Credit: Pikist DMCA

This year, two interdisciplinary teams at the eScience Institute’s Data Science for Social Good (DSSG) program tackled timely issues, conducting projects to identify disinformation articles about the coronavirus and detect minority vote dilution resulting from geographic boundary setting in state, city, county and school board districts.

On August 19th, the DSSG student fellows presented the results of their projects, conducted with project leads and data scientists, to more than 130 people via zoom. The ten-week summer program joins student fellows from universities around the country with data and domain researchers, and real-world stakeholders, to work on collaborative projects for societal benefit. This year the program took place remotely for the first time due to the coronavirus pandemic.

Descriptions of this year’s two projects are below:

eiCompare: Making Every Vote Count

This project significantly enhances a software package called eiCompare that identifies minority vote dilution in city, county, state and school board districts with racially polarized voting patterns. The software is used in court cases to argue for redrawing district boundaries to uphold the right to equal representation in the Voting Rights Act. The project was largely based on a recent court case in East Ramapo, New York, in which minority vote dilution shown by eiCompare was used as evidence. The software will be pivotal in national redistricting next year following the 2020 U.S. Census. Minority vote dilution is proven by showing that a large, geographically compact and politically cohesive minority group is unable to elect preferred candidates due to “bloc voting” by the majority. However, these criteria are difficult to prove because ballot choices are confidential and voter registration files in most states do not identify race.

To accommodate these complexities, the software uses Ecological Inference (EI) methods to estimate percentages of racial groups that voted for each candidate in prior elections, to evaluate existing district boundaries and assess new district maps created by opposing parties in court. These processes, newly improved by the DSSG team, consist of geocoding addresses from voter files to merge them with racial data from the U.S. Census, using Bayesian Improved Surname Geocoding (BISG) to infer individual voter race from surname and location patterns in the Census, and aggregating and merging these data with election results at the voting precinct level to predict turnout in future elections.

The fellows are Juandalyn Burke, a doctoral student in epidemiology at the University of Washington; Ari Decter-Frain, a doctoral student in policy analysis and management at Cornell University; Hikari Murayama, a master’s student in the Energy and Resources Group at the University of California (UC) Berkeley; and Pratik Sachdeva, a doctoral student in physics at UC Berkeley. They worked with project leads Matt A. Barreto, professor of political science and Chicana/o Studies at UCLA, and Loren Collingwood, associate professor of political science at UC Riverside; and data scientists Scott Henderson, a research scientist in the Department of Earth and Space Sciences and data science fellow at the eScience Institute, and Spencer Wood, a research scientist at the eScience Institute and senior research scientist with EarthLab.

Identifying Coronavirus Disinformation Online

This project creates an open source model to identify disinformation articles about the coronavirus using machine learning and natural language processing techniques. The tool is being designed in partnership with the nonprofit organization Global Disinformation Index (GDI), which works to defund disinformation sources by identifying high-risk websites for advertising technology companies that sell ads to websites automatically for companies and organizations. Since the sales happen in real time, with ads loading in seconds before each website is launched, there is no systematic review of website content. This tool will help GDI to flag websites that have a large quantity of disinformation articles, defined as intentionally deceptive or adversarial in nature, to help advertisers avoid placing ads there.

The project team includes fellows George Hope Chidziwisano, a doctoral candidate in media and information at Michigan State University; Richa Gupta, a master’s student in quantitative methods in the social sciences at Columbia University; Kseniya Husak, a master’s student in public policy and information science at the University of Michigan; and Maya Luetke, a doctoral candidate in epidemiology at Indiana University, Bloomington. The project is led by Maggie Engler, Lead Data Scientist, and Lucas Wright, Senior Researcher, at GDI. Technical guidance is provided by Noah Benson, a senior data scientist, and Vaughn Iverson, a senior research scientist at the eScience Institute.

Continue reading at the eScience Institute.

Originally written by Emily Keller for the eScience Institute.

Data Science for Social Good fellows present their project results

Colleges and Units

Research Topics

Latest News

Twitter Feed

Be boundless

Connect with us: