AUGUST 6, 2020 — Social media has become the latest method to monitor the spread of diseases such as influenza or coronavirus. However, machine learning algorithms used to train and classify tweets have an inherent bias because they do not account for how minority groups potentially communicate health information.
These are the findings made by UTSA researchers in one of the first studies of bias conducted on biomedical content on the microblogging and social networking service Twitter.
The researchers found that simple models offer the fairest system to survey how minority groups communicate health behaviors, such as vaccine adoption or incidence of flu. Without a fair, natural language processing system, governments and other organizations that rely on social media may limit vaccines and other resources used to tackle disease within certain populations.
“The problem is that if machine bias is left unchecked, it can aggravate health disparities instead of improving them,” said Anthony Rios, assistant professor in the Department of Information Systems and Cyber Security in UTSA’s College of Business.
According to Rios, computers are used to monitor and classify millions of tweets to track how disease content spreads. There are many advantages to the use of machine learning, primarily that health organizations can deploy the algorithms quickly and at large geographic scales. Yet surveillance systems are based mostly on one dialect and, in essence, don’t account for how a minority group might use different terms or a specific communicative style. Therefore, organizations can assume incorrectly that healthy behaviors or enough medical supplies exist within certain regions.
In this study, the UTSA scientists analyzed two data sets that examined both bias and fairness on influenza-related tasks, including identifying influenza-related tweets, detecting whether a tweet is about an infection or simply raising awareness, detecting whether a user is discussing themselves or someone else, and identifying vaccine-related tweets.
Bias can be abundant in machine learning methods developed for a wide variety of natural language processing tasks, including how text is classified or how a system learns about words. For instance, machine learning methods can generate word embeddings or vector representations for terms—that is, representations of words a computer can understand along numerical values. But the learned representations may become skewed. In some cases this can lead to potential gender bias in which the word man is similar to doctor, while woman is similar to nurse.
In a review of fairness, which is related to bias, the researchers explored the integrity of the influenza classifiers built using different machine learning algorithms, including linear models and neural networks. In the analysis a very specific definition of fairness was applied. Intuitively, a machine learning model is fair if the predictive performance (its accuracy) is the same when it is applied to two different groups of data for the same task.
“Our task involves detecting influenza-related tweets on social media. Our groups are tweets written in either Standard American English or African American Vernacular English. If an unfair model is applied to geographical regions with a large number of AAE speakers, then it may not perform as the model developers expected. Because the number of speakers of SAE is larger than AAE speakers, a model can be both highly accurate and unfair,” said Rios.
“For influenza-related tasks we found that neural networks were more accurate, but simple machine learning methods produced fairer predictions,” Rios added.
France, South Korea, Australia and Singapore have all deployed COVID-19 applications. Even Apple and Google Android platforms have created built-in software to deploy digital contact tracing among users. However, privacy issues have put governments and technology companies at odds—limiting the information that epidemiologists need to understand the spread of the virus.
“Although there are still privacy and ethical issues in social media use for research, it is potentially a great way to observe health trends, since platforms are agnostic and don’t require people to download anything or check in. Using social media, we can conduct disease surveillance tasks, such as predicting infection rates or estimating infection risk. Moreover, social media can be used to understand the public’s view about potential treatments and vaccinations,” added Rios.
It’s estimated that influenza vaccination rates are lower by 10% among Hispanic and African American communities, resulting in approximately 2,000 preventable deaths per year. Moreover, the timetable for COVID-19 vaccine development is anywhere between six months and two years. It’s for this reason that Rios urges natural language processing data scientists to examine how health-related algorithms are built.
Worldwide coronavirus has resurged in many countries. While in more than 30 U.S. states cases continue to climb leaving local governments with a shortage of contract tracers, a key tool used to contain the disease. It’s for this reason that machine learning offers immediate benefits and new technology to help with digital tracing or predicting potential outbreaks.
There are current limitations to the UTSA analysis. Since most NLP bias research does not analyze public health applications, and curating large biomedical data sets is difficult, the findings are based on small samples. This is why the researchers want to bring more attention to the issue of fairness when scientists build biomedical NLP data sets to train machines to code and classify health-related information written by different populations.
Brandon Lwowski, a UTSA doctoral student and is co-lead in the study, which was funded by the National Science Foundation.
Join us to kickoff the spring semester! We will have events happening all week and please participate in all that you can. Learn about upcoming opportunities within your academic department in virtual sessions with faculty, alumni, and student organizations. This is a great way to get involved in the UTSA COE/CACP community.Virtual Event
The Adobe Creative Campus Kickoff will introduce students to Adobe software and how they can use it to produce professional content for their courses. Students will learn about UTSA’s Adobe Creative Campus program and how they can access the software. Time permitting, there will be an overview of Adobe Creative Cloud.Virtual Event
Want to study abroad but aren't sure where to start? Our Study Abroad Information Session is a great way to hear about the options available to those interested in pursuing a global academic experience! Here you can learn about program types, scholarships, and other essential information!Virtual Event
Tune in to learn about the best picks for having an easy and fun digital Spring. We will be introducing you to the new Blackboard experience and showing you how to access essential digital tools that are free! In this session, you’ll find resources that are UTSA exclusive.Virtual Event
Zoom is now available to all UTSA faculty, staff and students as part of the newly acquired university site license! Join us as we review the features best practices for using this software for education and think about the best way to use it to work with fellow students, provide opportunities for student organizations, and more.Virtual Event
The COE/CACP Student Success Center will help build your resume of experiences outside the classroom. Join us to learn about what we do, upcoming events, and how to get involved in the colleges.Virtual Event
Meet with Writing Center tutors to learn how we can help you with every stage of the writing process for your writing projects. Then learn about how to communicate a message effectively using Adobe Spark. Some of your creations will be featured on our Writing Center social media platforms!Virtual Event
The University of Texas at San Antonio is dedicated to the advancement of knowledge through research and discovery, teaching and learning, community engagement and public service. As an institution of access and excellence, UTSA embraces multicultural traditions and serves as a center for intellectual and creative resources as well as a catalyst for socioeconomic development and the commercialization of intellectual property - for Texas, the nation and the world.
To be a premier public research university, providing access to educational excellence and preparing citizen leaders for the global environment.
We encourage an environment of dialogue and discovery, where integrity, excellence, inclusiveness, respect, collaboration and innovation are fostered.
UTSA is a proud Hispanic Serving Institution (HSI) as designated by the U.S. Department of Education.
The University of Texas at San Antonio, a Hispanic Serving Institution situated in a global city that has been a crossroads of peoples and cultures for centuries, values diversity and inclusion in all aspects of university life. As an institution expressly founded to advance the education of Mexican Americans and other underserved communities, our university is committed to ending generations of discrimination and inequity. UTSA, a premier public research university, fosters academic excellence through a community of dialogue, discovery and innovation that embraces the uniqueness of each voice.