A new study by researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and the MIT Jameel Clinic assesses the impact of discriminatory AI models on systems that are intended to provide advice in urgent situations.
Artificial intelligence (AI) systems, particularly those based on machine learning, are increasingly used in medicine to diagnose specific diseases. AI systems are being used to evaluate X-rays and to aid decision-making in other areas of health care.
The paper’s lead author is Hammaad Adam, a PhD student at MIT’s Institute for Data Systems and Society. The report co-authors are Aparna Balagopalan and Emily Alsentzer, both PhD students, and Professors Fotini Christia and Marzyeh Ghassemi.
Researchers found that computer software can be compromised in certain respects. Unlike computer hardware, which is free of bias, the software is programmed by fallible humans and can be fed data. Machine learning models can encode biases against minority subgroups, according to recent research. As a result, the recommendations they make may reflect the same tendencies. The study was published in Communications Medicine.
“We study how the advice is framed can have significant repercussions. Fortunately, the harm caused by biased models can be limited (though not necessarily eliminated) when the advice is presented differently,” explained the paper’s lead author, Hammaad Adam.
AI models used in medicine can suffer from inaccuracies and inconsistencies, partly because the data used to train the models are often not representative of real-world settings. Different X-ray machines, for example, can record things differently and thus produce different results. Furthermore, models trained primarily on white people may not be as accurate when applied to other groups. The Communications Medicine paper is not concerned with such issues but with problems caused by biases and ways to mitigate the negative consequences.
The experiment’s key finding is those prescriptive recommendations from a biased AI system heavily influenced participants. But they also found that using descriptive rather than prescriptive recommendations allowed participants to retain their original, unbiased decision-making.
In other words, the bias built into an AI model can be reduced by appropriately framing the advice rendered. Why the different outcomes, depending on how advice is posed? Adam explains that when someone is told to do something, such as call the police, there is little room for doubt. However, simply describing the situation allows for a participant’s interpretation, allowing them to be more flexible and consider the case for themselves.
Second, the researchers discovered that the language models commonly used to provide advice are susceptible to bias. Language models represent a class of machine learning systems trained on text, such as the entire contents of Wikipedia and other web material. When these models are fine-tuned using a much smaller subset of data, the resulting models are prone to bias. For example, the researcher submits only 2,000 sentences for training purposes instead of 8 million web pages.
Third, the MIT researchers discovered that biased models’ recommendations could mislead unbiased decision-makers. Medical training (or lack thereof) had no discernible effect on responses. Biased models influenced clinicians just as much as non-experts. According to Adam, these findings could apply to other settings and are not necessarily restricted to healthcare situations.
In New Zealand, researchers explored the AI possibility to predict the length of court sentences. Dr Andrew Lensen of the School of Engineering and Computer Science and Dr Marcin Betkier of the Law School are hopeful that AI will aid in better sentencing performance in court. The trust stems from using AI to predict certain criminal behaviours, such as financial fraud. Even though they have not evaluated the algorithm model in the courtroom to deliver sentences, they are confident that AI can have a role in the sentencing process.