Browsing by Author "Duarte, Rodrigo Manuel Teixeira"
Now showing 1 - 1 of 1
Results Per Page
Sort Options
- Combining Text and Visual Modalities for Enhanced Portuguese Image RetrievalPublication . Duarte, Rodrigo Manuel Teixeira; Campos, Ricardo Nuno Taborda; Proença, Hugo Pedro Martins CarriçoThe availability of digital images on the Internet has grown exponentially in recent years. This has made it challenging for users to find relevant images in the context of Information Retrieval IR tasks, as search engines are often unable to understand their content accurately. This challenge becomes even greater when searching for images in languages other than English - especially low-to-mid resource languages like Portuguese, which often lack the necessary linguistic resources. To address these issues, several approaches have been proposed, such as using multimodal language models that attempt to understand both image content and associated textual information. However, most of these models are fine-tuned primarily for the English language. Another common strategy involves language translation models, where queries in a target language are translated into English before being processed. However, such a solution is also not perfect as the meaning of the query can be lost in translation, leading to suboptimal results. This MSc thesis tackles this challenge by developing and evaluating multimodal approaches for Portuguese image retrieval, with a specific focus on understanding the limitations and opportunities of current vision-language models. Our hypothesis is that combining text-based and image-based retrieval modalities through innovative score adjustment mechanisms will lead to more effective results than individual approaches alone. The primary objective of this research is to develop an effective image IR system for Portuguese queries and establish performance baselines for this domain. To achieve this, we created a Portuguese image retrieval evaluation dataset comprising 80 queries and 5,201 annotated images from the Portuguese Presidency website. We developed a novel hybrid retrieval algorithm that combines text-based and image-based retrieval through mathematical score adjustment mechanisms, utilizing K-Nearest Neighbors (KNN) algorithms for similarity matching. Our comprehensive evaluation encompassed traditional text-based IR methods, commercial search engines, Portuguese-specific language models, and state-of-the-art vision-language models. The results revealed that multilingual visionlanguage models, particularly OpenCLIP xlm-roberta-base, substantially outperformed traditional text-based approaches by 62% in MRR scores, achieving 71% better performance with shorter queries compared to longer descriptive formulations. Surprisingly, fine-tuning experiments showed decreased performance across all metrics, with degradations ranging from 16% to 28%, suggesting that pre-trained multilingual representations are more valuable than domain-specific adaptations. The proposed hybrid algorithm achieved meaningful improvements, with a 1.8% enhancement in Mean Reciprocal Rank over the best baseline approach.
