AUGUST 6, 2020 — Social media has become the latest method to monitor the spread of diseases such as influenza or coronavirus. However, machine learning algorithms used to train and classify tweets have an inherent bias because they do not account for how minority groups potentially communicate health information.
These are the findings made by UTSA researchers in one of the first studies of bias conducted on biomedical content on the microblogging and social networking service Twitter.
The researchers found that simple models offer the fairest system to survey how minority groups communicate health behaviors, such as vaccine adoption or incidence of flu. Without a fair, natural language processing system, governments and other organizations that rely on social media may limit vaccines and other resources used to tackle disease within certain populations.
“The problem is that if machine bias is left unchecked, it can aggravate health disparities instead of improving them,” said Anthony Rios, assistant professor in the Department of Information Systems and Cyber Security in UTSA’s College of Business.
According to Rios, computers are used to monitor and classify millions of tweets to track how disease content spreads. There are many advantages to the use of machine learning, primarily that health organizations can deploy the algorithms quickly and at large geographic scales. Yet surveillance systems are based mostly on one dialect and, in essence, don’t account for how a minority group might use different terms or a specific communicative style. Therefore, organizations can assume incorrectly that healthy behaviors or enough medical supplies exist within certain regions.
In this study, the UTSA scientists analyzed two data sets that examined both bias and fairness on influenza-related tasks, including identifying influenza-related tweets, detecting whether a tweet is about an infection or simply raising awareness, detecting whether a user is discussing themselves or someone else, and identifying vaccine-related tweets.
Bias can be abundant in machine learning methods developed for a wide variety of natural language processing tasks, including how text is classified or how a system learns about words. For instance, machine learning methods can generate word embeddings or vector representations for terms—that is, representations of words a computer can understand along numerical values. But the learned representations may become skewed. In some cases this can lead to potential gender bias in which the word man is similar to doctor, while woman is similar to nurse.
In a review of fairness, which is related to bias, the researchers explored the integrity of the influenza classifiers built using different machine learning algorithms, including linear models and neural networks. In the analysis a very specific definition of fairness was applied. Intuitively, a machine learning model is fair if the predictive performance (its accuracy) is the same when it is applied to two different groups of data for the same task.
“Our task involves detecting influenza-related tweets on social media. Our groups are tweets written in either Standard American English or African American Vernacular English. If an unfair model is applied to geographical regions with a large number of AAE speakers, then it may not perform as the model developers expected. Because the number of speakers of SAE is larger than AAE speakers, a model can be both highly accurate and unfair,” said Rios.
“For influenza-related tasks we found that neural networks were more accurate, but simple machine learning methods produced fairer predictions,” Rios added.
France, South Korea, Australia and Singapore have all deployed COVID-19 applications. Even Apple and Google Android platforms have created built-in software to deploy digital contact tracing among users. However, privacy issues have put governments and technology companies at odds—limiting the information that epidemiologists need to understand the spread of the virus.
“Although there are still privacy and ethical issues in social media use for research, it is potentially a great way to observe health trends, since platforms are agnostic and don’t require people to download anything or check in. Using social media, we can conduct disease surveillance tasks, such as predicting infection rates or estimating infection risk. Moreover, social media can be used to understand the public’s view about potential treatments and vaccinations,” added Rios.
It’s estimated that influenza vaccination rates are lower by 10% among Hispanic and African American communities, resulting in approximately 2,000 preventable deaths per year. Moreover, the timetable for COVID-19 vaccine development is anywhere between six months and two years. It’s for this reason that Rios urges natural language processing data scientists to examine how health-related algorithms are built.
Worldwide coronavirus has resurged in many countries. While in more than 30 U.S. states cases continue to climb leaving local governments with a shortage of contract tracers, a key tool used to contain the disease. It’s for this reason that machine learning offers immediate benefits and new technology to help with digital tracing or predicting potential outbreaks.
There are current limitations to the UTSA analysis. Since most NLP bias research does not analyze public health applications, and curating large biomedical data sets is difficult, the findings are based on small samples. This is why the researchers want to bring more attention to the issue of fairness when scientists build biomedical NLP data sets to train machines to code and classify health-related information written by different populations.
Brandon Lwowski, a UTSA doctoral student and is co-lead in the study, which was funded by the National Science Foundation.
UTSA Today is produced by University Communications and Marketing, the official news source of The University of Texas at San Antonio. Send your feedback to news@utsa.edu. Keep up-to-date on UTSA news by visiting UTSA Today. Connect with UTSA online at Facebook, Twitter, Youtube and Instagram.
San Antonio’s treasured Asian Festival returns on Saturday, May 27, 2023, at The University of Texas at San Antonio (UTSA) Downtown Campus. In observance of Asian American and Pacific Islander (AAPI) Heritage Month the one-day performance, entertainment, and food event will celebrate the diverse Asian diaspora represented in South Texas and San Antonio. Come and enjoy one of San Antonio’s premier family-friendly events, with hands-on activities and opportunities to learn through experience.
UTSA Downtown CampusThe Texas Coalition for Heritage Spanish (TeCHS) seeks to provide a cooperative platform to support the success of Spanish heritage language speakers and their communities in Texas, assisting and promoting bicultural and bilingual development in the state.
River Walk Room (DBB 1.124,) Durango Building, Downtown CampusDr. Michael Doyle has had an immense impact on the field of catalysis and organic chemistry. Join in a one-day symposium. In order to honor Dr. Doyle’s colossal career accomplishments with his upcoming retirement, we are holding a one day symposium event
Riklin Auditorium (FS 1.406,) Frio Street Building, Downtown CampusThe NHERI Summer Institute is a free event for early-career faculty, NHERI Graduate Student Council, K-12 educators from the San Antonio area, engineers, and researchers to learn more about the Natural Hazards Engineering Research Infrastructure (NHERI) community.
La Villita Room (DBB 1.116,) Durango Building, Main CampusBuilding the Dual Language Leader Symposium will provide a safe space for current and aspiring leaders to learn best practices, theories, policies, and systems that support a dual language bilingual education.
UTSA Southwest Campus, 300 Augusta St.Streaming of Spray the Word that will conclude with a discussion with San Antonio's Poet Laureate, Andrea "Vocab" Sanderson.
Aula Canaria (BVB 1.328,) Buena Vista Street Building, Downtown CampusCelebrate Hispanic Heritage Month at our very own street fair - Calle UTSA. We will have activities, performances, food, music, and pinatas to break open! All free to UTSA students, faculty, and staff.
Student Union Paseo, Main CampusThe University of Texas at San Antonio is dedicated to the advancement of knowledge through research and discovery, teaching and learning, community engagement and public service. As an institution of access and excellence, UTSA embraces multicultural traditions and serves as a center for intellectual and creative resources as well as a catalyst for socioeconomic development and the commercialization of intellectual property - for Texas, the nation and the world.
To be a premier public research university, providing access to educational excellence and preparing citizen leaders for the global environment.
We encourage an environment of dialogue and discovery, where integrity, excellence, inclusiveness, respect, collaboration and innovation are fostered.
UTSA is a proud Hispanic Serving Institution (HSI) as designated by the U.S. Department of Education.
The University of Texas at San Antonio, a Hispanic Serving Institution situated in a global city that has been a crossroads of peoples and cultures for centuries, values diversity and inclusion in all aspects of university life. As an institution expressly founded to advance the education of Mexican Americans and other underserved communities, our university is committed to ending generations of discrimination and inequity. UTSA, a premier public research university, fosters academic excellence through a community of dialogue, discovery and innovation that embraces the uniqueness of each voice.