Browsing by Author "Caldeira, Beatriz Isabel Trocas"
Now showing 1 - 1 of 1
Results Per Page
Sort Options
- Bug Taxonomy Classification Using Machine Learning AlgorithmsPublication . Caldeira, Beatriz Isabel Trocas; Pombo, Nuno Gonçalo Coelho CostaBugs are a natural occurrence in the realm of software development. As society increasingly relies on software, the frequency of these occurrences has naturally increased as well, potentially leading to catastrophic consequences for products or businesses. To monitor and manage the bug resolution process, various bug tracking systems have been developed. These platforms enable teams responsible for bug analysis to view identified bugs, track the most common issues, and monitor their resolution status. However, the challenge lies in analyzing each of these reports. In large-scale products, hundreds of reports can be submitted daily through these platforms, whether by other developers, pen testers, or end users. End users, due to their likely lack of knowledge about software, system architecture, or other inherent processes, may submit incorrect or unclear reports. Addressing the need for developers to manually analyze and classify each of these reports (a complex and time-consuming task) through automation is an idea that has been explored by various researchers. This research aims to develop a comprehensive and detailed classification schema that can provide additional, automated information to report analysts, thereby facilitating part of their analysis process. To achieve this, the proposed solution is based on leveraging the high performance and capabilities of the BERT model, a model rooted on Transformer architecture, to classify these reports as accurately as possible, based on the textual descriptions within each report. A dataset was created following this schema, designed through a divideand-conquer approach, and multiple models were trained on each category. The results indicate that BERT can form the basis of a robust solution for this purpose, even when provided with a small amount of labeled data as input and despite the need for some refinement in the training process. The best results were obtained when both the title and the description were used together as input data for the model, with one model achieving an overall accuracy of 75%, and the lowest accuracy being 54.6%.
