AUGUST 6, 2020 — Social media has become the latest method to monitor the spread of diseases such as influenza or coronavirus. However, machine learning algorithms used to train and classify tweets have an inherent bias because they do not account for how minority groups potentially communicate health information.
These are the findings made by UTSA researchers in one of the first studies of bias conducted on biomedical content on the microblogging and social networking service Twitter.
The researchers found that simple models offer the fairest system to survey how minority groups communicate health behaviors, such as vaccine adoption or incidence of flu. Without a fair, natural language processing system, governments and other organizations that rely on social media may limit vaccines and other resources used to tackle disease within certain populations.
“The problem is that if machine bias is left unchecked, it can aggravate health disparities instead of improving them,” said Anthony Rios, assistant professor in the Department of Information Systems and Cyber Security in UTSA’s College of Business.
According to Rios, computers are used to monitor and classify millions of tweets to track how disease content spreads. There are many advantages to the use of machine learning, primarily that health organizations can deploy the algorithms quickly and at large geographic scales. Yet surveillance systems are based mostly on one dialect and, in essence, don’t account for how a minority group might use different terms or a specific communicative style. Therefore, organizations can assume incorrectly that healthy behaviors or enough medical supplies exist within certain regions.
In this study, the UTSA scientists analyzed two data sets that examined both bias and fairness on influenza-related tasks, including identifying influenza-related tweets, detecting whether a tweet is about an infection or simply raising awareness, detecting whether a user is discussing themselves or someone else, and identifying vaccine-related tweets.
Bias can be abundant in machine learning methods developed for a wide variety of natural language processing tasks, including how text is classified or how a system learns about words. For instance, machine learning methods can generate word embeddings or vector representations for terms—that is, representations of words a computer can understand along numerical values. But the learned representations may become skewed. In some cases this can lead to potential gender bias in which the word man is similar to doctor, while woman is similar to nurse.
In a review of fairness, which is related to bias, the researchers explored the integrity of the influenza classifiers built using different machine learning algorithms, including linear models and neural networks. In the analysis a very specific definition of fairness was applied. Intuitively, a machine learning model is fair if the predictive performance (its accuracy) is the same when it is applied to two different groups of data for the same task.
“Our task involves detecting influenza-related tweets on social media. Our groups are tweets written in either Standard American English or African American Vernacular English. If an unfair model is applied to geographical regions with a large number of AAE speakers, then it may not perform as the model developers expected. Because the number of speakers of SAE is larger than AAE speakers, a model can be both highly accurate and unfair,” said Rios.
“For influenza-related tasks we found that neural networks were more accurate, but simple machine learning methods produced fairer predictions,” Rios added.
France, South Korea, Australia and Singapore have all deployed COVID-19 applications. Even Apple and Google Android platforms have created built-in software to deploy digital contact tracing among users. However, privacy issues have put governments and technology companies at odds—limiting the information that epidemiologists need to understand the spread of the virus.
“Although there are still privacy and ethical issues in social media use for research, it is potentially a great way to observe health trends, since platforms are agnostic and don’t require people to download anything or check in. Using social media, we can conduct disease surveillance tasks, such as predicting infection rates or estimating infection risk. Moreover, social media can be used to understand the public’s view about potential treatments and vaccinations,” added Rios.
It’s estimated that influenza vaccination rates are lower by 10% among Hispanic and African American communities, resulting in approximately 2,000 preventable deaths per year. Moreover, the timetable for COVID-19 vaccine development is anywhere between six months and two years. It’s for this reason that Rios urges natural language processing data scientists to examine how health-related algorithms are built.
Worldwide coronavirus has resurged in many countries. While in more than 30 U.S. states cases continue to climb leaving local governments with a shortage of contract tracers, a key tool used to contain the disease. It’s for this reason that machine learning offers immediate benefits and new technology to help with digital tracing or predicting potential outbreaks.
There are current limitations to the UTSA analysis. Since most NLP bias research does not analyze public health applications, and curating large biomedical data sets is difficult, the findings are based on small samples. This is why the researchers want to bring more attention to the issue of fairness when scientists build biomedical NLP data sets to train machines to code and classify health-related information written by different populations.
Brandon Lwowski, a UTSA doctoral student and is co-lead in the study, which was funded by the National Science Foundation.
UTSA Today is produced by University Communications and Marketing, the official news source of The University of Texas at San Antonio. Send your feedback to news@utsa.edu. Keep up-to-date on UTSA news by visiting UTSA Today. Connect with UTSA online at Facebook, Twitter, Youtube and Instagram.
Learn to use the simple but powerful features of EndNote®, a citation management tool. In this hands-on workshop, participants will learn to setup an EndNote library, save references and PDFs, and automatically create and edit a bibliography.
Virtual EventLearn to use Zotero®, a citation manager that can help you store and organize citations you find during your research. Zotero can generate bibliographies in various styles, insert in-text citations and allow you to share sources with collaborators.
Virtual EventThe UTSA Institute of Texan Cultures welcomes historian Rebecca Sharpless, author of “Grain and Fire: History of Baking in the American South,” to discuss how food customs shape cultures.
Room 1.01.01 on the 1st Floor at ITC, UTSA Institute Of Texan CulturesOur GSAW Research Symposium begins with lunch and a Poster Presentation Competition. Faculty, staff, and graduate students are welcome to attend and review the exceptional research from UTSA's best and brightest.
Student Union Ballrooms 1 & 2, Main CampusHear from UTSA doctoral candidate in environmental science, Andre Felton, as he discusses best practices to discuss scholarly research in non-academic settings. Our 2023 Three Minute Thesis (3MT) winners will also share their winning presentations.
Student Union Ballrooms 1 & 2, Main CampusJoin this fun event if you want resume and interview resources, a job or internship, a snow cone from Kona Ice and to socialize.
Sombrilla PlazaIn partnership with San Antonio Metro Health, join us for a special lecture series during Public Health week! An esteemed panel will discuss the job market’s impact on public health departments in Texas municipalities.
Retama Auditorium (SU 2.02.02,) Main CampusThe University of Texas at San Antonio is dedicated to the advancement of knowledge through research and discovery, teaching and learning, community engagement and public service. As an institution of access and excellence, UTSA embraces multicultural traditions and serves as a center for intellectual and creative resources as well as a catalyst for socioeconomic development and the commercialization of intellectual property - for Texas, the nation and the world.
To be a premier public research university, providing access to educational excellence and preparing citizen leaders for the global environment.
We encourage an environment of dialogue and discovery, where integrity, excellence, inclusiveness, respect, collaboration and innovation are fostered.
UTSA is a proud Hispanic Serving Institution (HSI) as designated by the U.S. Department of Education .
The University of Texas at San Antonio, a Hispanic Serving Institution situated in a global city that has been a crossroads of peoples and cultures for centuries, values diversity and inclusion in all aspects of university life. As an institution expressly founded to advance the education of Mexican Americans and other underserved communities, our university is committed to promoting access for all. UTSA, a premier public research university, fosters academic excellence through a community of dialogue, discovery and innovation that embraces the uniqueness of each voice.