And then, there is the problem of interpreting machine behaviour. How did the system get to its classification, and is its reasoning defensible? Several approaches have emerged in this area. Some require an additional layer that checks back on the interpretation, others involve human supervision. While these approaches may work some of the time, Professor He says that in reality, they can quickly run into problems. ‘Wouldn’t it be better to design the model in such a way that it produces an interpretation?’, she asks. ‘This is what we propose’.
In response, Professor He and her team have developed Project HINT, an acronym for Hierarchical Interpretable Neural Text classification. The idea is, she continues, to identify hierarchical relationships between text classifiers and how they interact in order to gain a better understanding of what’s truly going on in a text. HINT, then, builds up a hierarchy of this kind that generates automatic explanations based on actual meaningful topics rather than words and phrases alone.
In a typical movie review, for instance, cinema enthusiasts may comment positively on some aspects of the production of the film while being highly critical or even negative about the movie as a whole. HINT aims to identify these different levels of relevance for the overall evaluation of meaning—interpretation is inbuilt, it is not added on. ‘The model is explainable by design’, Professor He confirms.