Logo do repositório
 
A carregar...
Foto do perfil

Resultados da pesquisa

A mostrar 1 - 2 de 2
  • Self-Explanatory Deep Learning Models with Concept-based Multimodal Explanations for Medical Imaging Diagnosis
    Publication . Patrício, Cristiano Pires ; Neves, João Carlos Raposo; Teixeira, Luís Filipe Pinto de Almeida
    The remarkable performance of deep learning models in automated medical imaging diagnosis is achieved at the expense of the low interpretability of their representations. The opaque nature of these methods, which often operate as “black boxes”, remains a major barrier to their adoption in real-world applications, especially in high-stakes scenarios such as healthcare. This lack of interpretability motivated the development of eXplainable Artificial Intelligence (XAI) techniques capable of explaining model decisions so that humans can understand and interpret their decision-making. Early efforts in XAI applied to images relied mainly on post-hoc strategies that generate model-agnostic explanations by assessing the influence of input regions on predictions. However, these explanations are often ambiguous and unreliable. Similarly, textual explanations face challenges as language models are prone to generate inaccurate content, including ambiguous or factually incorrect statements. As an alternative, Concept Bottleneck Models (CBMs) offer an inherently interpretable design, where the final predictions are explicitly derived from intermediate human-understandable concepts. Nevertheless, CBMs face several critical limitations. Their reliance on manual concept annotations, the lack of visual interpretability for the predicted concepts, and the need for model retraining when new concepts are introduced hinders their utility and scalability. This thesis addresses these limitations by introducing methods capable of generating multimodal explanations grounded on human-understandable concepts, thereby enhancing both the transparency and the interpretability of the model output. First, we present a comprehensive survey of state-of-the-art XAI methods, datasets, and evaluation metrics in medical image diagnosis, highlighting existing gaps and open challenges in the XAI literature. Building on these insights, we propose two concept-based approaches for skin lesion diagnosis: one extending the conventional CBMs to produce concept-based visual explanations, and another that leverages a transformer-based architecture with learnable concept tokens, improving the visual coherence of concept explanations through a dedicated architecture and regularization. To reduce reliance on concept annotations, we further explore Vision Language Models (VLMs), proposing strategies that automatically annotate concepts and predict the final diagnosis either through a linear classifier or by prompting Large Language Models (LLMs). To overcome the lack of visual context in disease prediction in these latter approaches, we propose CBVLM, a training-free framework that integrates off-the-shelf Large Vision-Language Models (LVLMs) to jointly generate concept-based explanations and predict disease diagnoses grounded in both semantic concepts and visual demonstration examples. Beyond concept-based explanations, we also demonstrate that interpretability can also be achieved even in constrained scenarios with limited annotations. Specifically, we propose an unsupervised framework for brain Magnetic Resonance Imaging (MRI) tumor detection that learns to reconstruct benign patterns of an input image using solely a dataset of healthy examples. At inference, when presented with brain MRI containing anomaly patterns, the reconstruction error between the input and the reconstructed image highlights potential tumor regions, allowing intuitive and interpretable anomaly localization. The results obtained from the methods proposed in this thesis demonstrate that it is possible to enhance the interpretability of CBMs by integrating visual concept explanations consistent with the learned concepts, while reducing their reliance on manual concept annotations, maintaining the interpretability and performance. Furthermore, extensive experiments across various medical imaging modalities, including dermoscopy, radiology, eye fundus imaging, and brain MRI, demonstrate that the proposed approaches not only improve disease diagnosis, but also provide more transparent and faithful multimodal explanations, paving the way for safer clinical integration and increased trust.
  • A zero­shot learning method for recognizing objects using low­power devices
    Publication . Patrício, Cristiano Pires ; Neves, João Carlos Raposo; Proença, Hugo Pedro Martins Carriço
    Zero­Shot Learning (ZSL) has been a subject of increasing interest due to its revolutionary paradigm that simulates human behavior in recognizing objects that have never seen before. The ZSL models must be capable of recognizing classes that do not appear during training, using only the provided textual descriptions of the unseen classes as an aid. Despite the vast benchmarking around the ZSL paradigm, few works have assessed the computational performance of the developed strategy regarding inference time. Furthermore, no work has evaluated the effects of using different CNN architectures, such as lightweight architectures, apart from the de facto standard ResNet101 architecture, and the feasibility of deploying zero­shot learning approaches in a real­world scenario, particularly when using low­power devices. Consequently, in this dissertation, we carried out an extensive benchmarking toward analyzing the impact of using lightweight CNN architectures on ZSL performance, allowing us to perceive how the ZSL methods perform in real­world scenarios, mainly when run in low­power devices. Our experimental results demonstrate that the impact on the ZSL accuracy is not significant when a lightweight architecture is adopted, indicating the effectiveness of such low­power devices in performing ZSL methods.