Bolsa de Doutoramento FCT: Modelos Autoexplicativos de Aprendizagem Profunda com Explicações Multimodais baseadas em Conceitos para o Diagnóstico Médico [2022.11566.BD]

Publicações

Self-Explanatory Deep Learning Models with Concept-based Multimodal Explanations for Medical Imaging Diagnosis

Publication . Patrício, Cristiano Pires ; Neves, João Carlos Raposo; Teixeira, Luís Filipe Pinto de Almeida

The remarkable performance of deep learning models in automated medical imaging diagnosis is achieved at the expense of the low interpretability of their representations. The opaque nature of these methods, which often operate as “black boxes”, remains a major barrier to their adoption in real-world applications, especially in high-stakes scenarios such as healthcare. This lack of interpretability motivated the development of eXplainable Artificial Intelligence (XAI) techniques capable of explaining model decisions so that humans can understand and interpret their decision-making. Early efforts in XAI applied to images relied mainly on post-hoc strategies that generate model-agnostic explanations by assessing the influence of input regions on predictions. However, these explanations are often ambiguous and unreliable. Similarly, textual explanations face challenges as language models are prone to generate inaccurate content, including ambiguous or factually incorrect statements. As an alternative, Concept Bottleneck Models (CBMs) offer an inherently interpretable design, where the final predictions are explicitly derived from intermediate human-understandable concepts. Nevertheless, CBMs face several critical limitations. Their reliance on manual concept annotations, the lack of visual interpretability for the predicted concepts, and the need for model retraining when new concepts are introduced hinders their utility and scalability. This thesis addresses these limitations by introducing methods capable of generating multimodal explanations grounded on human-understandable concepts, thereby enhancing both the transparency and the interpretability of the model output. First, we present a comprehensive survey of state-of-the-art XAI methods, datasets, and evaluation metrics in medical image diagnosis, highlighting existing gaps and open challenges in the XAI literature. Building on these insights, we propose two concept-based approaches for skin lesion diagnosis: one extending the conventional CBMs to produce concept-based visual explanations, and another that leverages a transformer-based architecture with learnable concept tokens, improving the visual coherence of concept explanations through a dedicated architecture and regularization. To reduce reliance on concept annotations, we further explore Vision Language Models (VLMs), proposing strategies that automatically annotate concepts and predict the final diagnosis either through a linear classifier or by prompting Large Language Models (LLMs). To overcome the lack of visual context in disease prediction in these latter approaches, we propose CBVLM, a training-free framework that integrates off-the-shelf Large Vision-Language Models (LVLMs) to jointly generate concept-based explanations and predict disease diagnoses grounded in both semantic concepts and visual demonstration examples. Beyond concept-based explanations, we also demonstrate that interpretability can also be achieved even in constrained scenarios with limited annotations. Specifically, we propose an unsupervised framework for brain Magnetic Resonance Imaging (MRI) tumor detection that learns to reconstruct benign patterns of an input image using solely a dataset of healthy examples. At inference, when presented with brain MRI containing anomaly patterns, the reconstruction error between the input and the reconstructed image highlights potential tumor regions, allowing intuitive and interpretable anomaly localization. The results obtained from the methods proposed in this thesis demonstrate that it is possible to enhance the interpretability of CBMs by integrating visual concept explanations consistent with the learned concepts, while reducing their reliance on manual concept annotations, maintaining the interpretability and performance. Furthermore, extensive experiments across various medical imaging modalities, including dermoscopy, radiology, eye fundus imaging, and brain MRI, demonstrate that the proposed approaches not only improve disease diagnosis, but also provide more transparent and faithful multimodal explanations, paving the way for safer clinical integration and increased trust.

2026-04-17Tese de doutoramento

Acesso aberto

Ver mais

Palavras-chave

Engineering and technology ,Engineering and technology/Electrical engineering, electronic engineering, information engineering

Entidade financiadora

Fundação para a Ciência e a Tecnologia, I.P.

Programa de financiamento

Bolsa de Doutoramento

Número da atribuição

2022.11566.BD