Browsing by Issue Date, starting with "2025-02-19"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
- Bug Taxonomy Classification Using Machine Learning AlgorithmsPublication . Caldeira, Beatriz Isabel Trocas; Pombo, Nuno Gonçalo Coelho CostaBugs are a natural occurrence in the realm of software development. As society increasingly relies on software, the frequency of these occurrences has naturally increased as well, potentially leading to catastrophic consequences for products or businesses. To monitor and manage the bug resolution process, various bug tracking systems have been developed. These platforms enable teams responsible for bug analysis to view identified bugs, track the most common issues, and monitor their resolution status. However, the challenge lies in analyzing each of these reports. In large-scale products, hundreds of reports can be submitted daily through these platforms, whether by other developers, pen testers, or end users. End users, due to their likely lack of knowledge about software, system architecture, or other inherent processes, may submit incorrect or unclear reports. Addressing the need for developers to manually analyze and classify each of these reports (a complex and time-consuming task) through automation is an idea that has been explored by various researchers. This research aims to develop a comprehensive and detailed classification schema that can provide additional, automated information to report analysts, thereby facilitating part of their analysis process. To achieve this, the proposed solution is based on leveraging the high performance and capabilities of the BERT model, a model rooted on Transformer architecture, to classify these reports as accurately as possible, based on the textual descriptions within each report. A dataset was created following this schema, designed through a divideand-conquer approach, and multiple models were trained on each category. The results indicate that BERT can form the basis of a robust solution for this purpose, even when provided with a small amount of labeled data as input and despite the need for some refinement in the training process. The best results were obtained when both the title and the description were used together as input data for the model, with one model achieving an overall accuracy of 75%, and the lowest accuracy being 54.6%.
- Deep Learning-Based Software Defect Prediction via Semantic Key Features of Source Code, Handling Imbalanced DatasetsPublication . Andrade, Hiro Gaspar Inglês de; Pombo, Nuno Gonçalo Coelho Costa; Pais, Sebastião Augusto Rodrigues FigueiredoThis work is part of the master’s thesis in Computer Engineering at the University of Beira Interior. It addresses themes related to software defect prediction, known as SDP, with the main objective of developing a predictive model using contextual features generated through deep learning models. To achieve the defined goals, five fundamental steps were followed: data preprocessing, mapping and embedding of tokens, extraction of contextual information, handling of datasets with class imbalance, and building the machine learning model for defect prediction. The dataset used was PROMISE, which encompasses software projects developed in Java, with multiple versions for each one. The experiments were conducted individually for each version, using static and contextual features generated through LSTM networks. The models were evaluated based on AUC, Accuracy, MCC, Recall, and Precision metrics. In general, it was observed that the use of contextual features resulted in significantly better performance. Among the models tested, Logistic Regression proved to be the most effective, demonstrating the best predictive capability. However, when combining different versions of the projects, a drop in performance was recorded, with the MCC showing low values, especially in the case of Naive Bayes, which in some scenarios even presented negative values. This phenomenon can be explained by factors such as concept drift (the change in data behavior over time) and overfitting (when the model fits excessively to the training data, compromising its ability to generalize), issues that have not been deeply addressed but are considered for future work.
