Researchers have developed a way to allow workers to collaborate with artificial intelligence systems.
A busy hospital radiologist uses an artificial intelligence system to diagnose medical conditions using patient X-ray images. The AI system can speed up her diagnosis, but how do you know when to trust its predictions?
She doesn’t. She may instead rely on her knowledge, the system’s confidence level, or an explanation of the algorithm’s prediction to make an estimate. This may seem convincing, but it may still be incorrect.
MIT researchers have created an onboarding method that helps humans understand when an AI “teammate” is trustworthy. It guides them to discern better the situations in which a machine can make correct predictions and those where it makes mistakes.
The training technique could help people make better decisions and come up with faster conclusions when working with AI agents.
Hussein Mozannar is a graduate student at the Computer Science and Artificial Intelligence Laboratory’s Clinical Machine Learning Group. He says he proposes a teaching phase in which the AI model is gradually introduced to the user. This will allow them to see the strengths and weaknesses of the AI model. We mimic the interaction of the AI with humans in practice and provide feedback to aid them in understanding each interaction with the AI.
Mozannar co-authored the paper with Arvind Sayanarayan (assistant professor of computer science in CSAIL) and David Sontag, senior author and associate professor at MIT of electrical engineering and chief of the Clinical Machine Learning Group. The research will be presented to the Association for the Advancement of Artificial Intelligence (February).
Mental models
This research focuses on how people view others. The radiologist may consult a colleague with expertise in that area if she is unsure about a case. She has built a mental model from experience and knowledge about her colleague that helps her assess his advice.
Mozannar states that humans build similar mental models when interacting with AI agents. Therefore, those models must be accurate. Cognitive science suggests that humans can make complex decisions based on past experiences and interactions. The researchers devised an onboarding process that shows AI and humans working together. These examples serve as references for humans to use in the future. The algorithm was created to identify the best examples of AI that humans can learn from.
Mozannar states that we first learn about a human expert’s biases and strengths and then use observations of past unguided decisions using AI. “We combine our knowledge of the AI with our understanding of the human to determine where the AI will be most helpful to the human. We then get cases in which AI is needed by humans and circumstances in which it is not.
Researchers tested their onboarding method on a passage-based task. The user is given a passage and a question. The course contains the answer. The user must answer the question, then click the button to “let AI answer.” However, the user cannot see the AI answer ahead of time, so they have to rely on the mental model of AI. These examples are shown to the user during the onboarding process. The AI system then attempts to predict the answer. While the AI system may prove wrong and the human might be right, the AI will explain why it made the correct prediction. Two contrasting examples will help the user understand why the AI did what it did.
Perhaps the training question asks which plant of two is native to more continents. This question was derived from a complicated paragraph in a botany textbook. The AI system can either answer the question on its own or leave it to humans. She then sees two more examples to help her understand the AI’s capabilities. The AI may be wrong about a fruit question but right about geology. The words that the system used in each case to predict the outcome are highlighted. Mozannar explains that the highlighted words help the user understand the limitations of the AI agent.
The user writes down the rule she learned from the teaching example. This will help her retain the information and can be used later to guide her interactions with the agent. These rules also formalize the user’s mental model of the AI.
Teaching has a profound impact.
Researchers tested the teaching method with three different groups of participants. One group completed the onboarding process, while another still needs follow-up examples. The baseline group received no teaching but could see the AI’s response in advance.
“The participants who received instruction performed just as well as those who did not receive it but were able to see the AI’s answer,” Mozannar states that they can simulate the AI’s answers as well as they could if they had seen them.
Researchers dug deeper into the data to determine which rules each participant wrote. The researchers found that nearly half of those who had received training could report real lessons about the AI’s capabilities. The people who learned the correct lessons correctly could spot 63 percent of the cases, while those without exact readings could only see 54 percent. The AI answers could be seen by those who did not receive instruction but could see them correctly on 57 percent.
Teaching is a powerful tool that has a positive impact on students. This is the key takeaway. He says that participants are more likely to learn effectively if they can be taught well than if you give them the answer.
However, the results show that there is still some room for improvement. Only half of those trained in AI could build accurate mental models. Even those who did were wrong only 63 percent of the time. Mozannar states that even though they had learned the proper lessons, they didn’t always follow their own rules.
Researchers are left scratching their heads over this question: even though the AI is correct, why wouldn’t people listen to their mental models? This question is something they want to investigate in the future. They also plan to improve the onboarding process to speed up the process. They are interested in conducting user studies using more advanced AI models, especially in healthcare.