“Data scientists don’t always ask questions persistently enough”

A study carried out by Dennis Collaris and Hilde Weerts of TU/e’s department of Mathematics and Computer Science shows that explanation techniques don’t always meet the expectations of data scientists, which can result in misuse. These techniques help explain why self-learning computer models make certain predictions. Misapplication by data scientists could lead to a false interpretation of the outcomes, resulting in all kinds of consequences.

by
photo Anyaberkut / iStock

It seems practical: let a computer do all the thinking until it arrives at the best decision. And it is practical. One only has to think of the explosive growth in popularity of self-learning computer models. But the use of these models also comes with a risk, especially when they get to co-decide whether someone is eligible for a loan, or when they need to make predictions about someone’s health. How can you be sure that the model took the right steps to arrive at the result?

To determine that, data scientists use explanation techniques. These techniques allow them to analyze and explain outcomes, and, consequently, to determine a model’s reliability. But the study carried out by Collaris and Weerts show that data scientists sometimes have a hard time interpreting these techniques.

Feature importance

The problem lies in the term ‘feature importance,’ which refers to a set of techniques that determine which features in the data the model considers important for making predictions. These techniques assign a score to input features: the higher the score, the more important that feature is for determining the prediction or outcome. Imagine a model that predicts how healthy a person is. In that case, the variable age will play a more important role than name or height. The variable age will in any case be assigned a higher feature importance score.

Many people use feature importance as an umbrella term to describe two different techniques that both define which factors are important for making predictions, but that do so in an entirely different way, Weerts explains. She’s talking about gradient-based and ablation-based techniques. Weerts is specialized in the latter technique.

Data scientists think to themselves: an explanation, cool. I’ll use it. But they don’t really understand the underlying technique

Dennis Collaris
Postdoc at Mathematics and Computer Science

“When you apply an ablation-based approach, you determine a feature’s importance by removing it. Image: someone is 25 years old right now. How would the prediction change if this person would have been younger or older? We replace 25 with every other possible age.” The extent to which the prediction changes on average, makes it possible to deduce how important it is that this person is 25 years old.

Collaris is specialized in the gradient-based approach, and explains why this technique differs slightly from the ablation-approach. “This technique determines what happens to a prediction when you apply minimal adjustments. If a slight change in height leads to a completely different medical prediction, you could conclude that length is an important feature to the model.”

Interpretation

The important difference lies in the interpretation of the two techniques, Weerts adds. “When you apply the gradient approach, a higher feature importance of age indicates that as a person ages, there’s a chance that something is also likely to increase, and vice versa. The ablation approach, by contrast, only indicates that the fact that a person is 25 years of age is important, but it doesn’t say anything about what would happen if that person would have been younger or older. That makes the interpretation substantially different.”

Interestingly, not all data scientists and researchers are aware of this difference, the researchers say. “Out of frustration over this, we decided to write a paper. In our study, we looked at people’s expectations, and whether those expectations matched reality.” As it turned out, many of the data scientists they interviewed had expectations that weren’t in line with what the techniques actually did. Collaris: “Some people even expected that feature importance was a combination of the two techniques, even though they are totally incompatible.”

The same thing

How is it possible that data scientists, whose job it is to interpret predictions made by computer models, don’t fully understand these techniques? Even some of the people who do research on explanation methods sometimes think that gradient and ablation techniques are the same thing. “We happen to have immersed ourselves in these techniques,” Weerts says, “but our research is quite niche.” The problem, Collaris says, is that this is a booming field, with lots of people who are just starting out. Many of them don’t fully grasp the nuances and subtle differences between techniques.

Nevertheless, these kinds of techniques are used quite often, because they have nice implementations in programming language Python, Weerts says. “Data scientists think to themselves: nice package, lots of people use it, we’ll import it. Bleep bleep, nice characters, looks pretty cool.” What happens, according to her, is that when something looks logical, people are likely to assume that it’s probably right. “That has been studied. And when a model comes up with something illogical, people search for obscure answers. Or they’ll simply say: well, it’s just a complex model. Math, right?” People don’t ask questions persistently enough, Collaris adds. “They think to themselves: an explanation, cool. I’ll use it. But they don’t really understand the underlying technique.”

European regulation

Data scientists who make models should always be responsible for generating an explanation, and they should continue to be involved, Collaris believes. They can then check that explanation before it is offered to other people. The paper can help them with that. Weerts: “In section 4 of pour paper, we mention several features for data scientists to consider.” In addition, the European Union has drafted a regulation on AI, Collaris adds. “It requires companies to disclose information about models. And if data scientists manage to generate an explanation that they’ve though about long and hard, you can easily present that explanation to people who have even less knowledge in this field, such as people from insurance companies.”

Happily, Explainable AI is almost just as up and coming now as AI itself, Collaris says. “That’s because everyone understands: this can’t go on any longer. We’re so busy using these models everywhere in society; we make predictions on just about everything these days. It’s just wrong not to have a clear insight into that.” As self-learning computer models become increasingly important tools with which to support decisions, data scientists need to ask themselves which explanation technique is required at what moment, Weerts says. “That question needs to be answered, otherwise people might draw wrong conclusions based on that explanation.”

Share this article