Browsing by Author "Jesus, Edgar Daniel Santos de"
Now showing 1 - 1 of 1
Results Per Page
Sort Options
- Preventing School-bullying through Automated Video AnalysisPublication . Jesus, Edgar Daniel Santos de; Neves, João Carlos RaposoCurrently, humanity strives to prevent discrimination, whether through offensive words or violent attitudes. Most teenagers who suffer bullying in school have difficulties in the learning process and consequently low grades. Most of the recent studies carried out by professionals in the health department show that the marks left by events of this type can bring illnesses such as depression, low self-esteem, and self-destructive behaviors. To address this problem non-profit institutions appear to prevent this kind of action through sensibility campaigns. However, these institutions have limitations that make it impossible to diagnose most of these occurrences, creating a lack of assistance for the victim. These reasons motivate us to search for new solutions with the help of automated systems that will make it possible to detect, at the exact moment, the persons involved in bullying actions in school property. With the help of a Portuguese non-profit bullying organization, a study was made to collect information about the most known behaviors of persons involved in bullying actions and their effects on society to have good guidelines to identify this events. Next, we carried out an investigation about technologies used in computer vision and artificial intelligence that allow the analysis of videos captured by surveillance cameras and can predict which type of action is inhered in each one. We present a variety of architectures since the first model capable to classify human behavior on videos, until the current times, where state-of-the art architectures, composed by two 3D convolutions streams, able to extract spatial and temporal features were developed. To search previous studies in the deep learning area related to bullying recognition in school videos, three scientific papers were found that already had investigated this kind of problem. Our analysis derived by the studies shows us the need to create a novel dataset able to represent all types of existing bullying actions and a new model architecture capable of identifying these events with high accuracy. Following the previous studies made in Chapter 2 and 3, a few guidelines were created to mimic bullying behavior on school grounds with a group of teenagers. Three hundred fifty clips were shot in bathrooms, classrooms, hallways, and canteens with five kids aged 7 to 18 years old. Another 200 films were acquired from the Internet and categorized alongside the recorded videos, producing a balanced dataset of 550 trimmed videos. The data cleaning process removed audio and black sidebars. The Kinetics 400 was downloaded and applied for fine-tuning deep learning pipelines. In terms of models, the SlowFast, I3D, C2D, and FGN architectures were used to construct the application. The FGN was the only model that produced plausible results when trained from scratch, finishing the training process with an accuracy on the test dataset of around 70%. However, when the ideal threshold is employed, this value drops to around 51%. Following the successful training from scratch with the FGN, a training strategy known as K-Fold Cross Validation was implemented, which divided the dataset into ten pieces to test the entire dataset. The final result is the average of the ten models, which attained an accuracy of 65.67%. When trained from scratch, the other three models could not converge to a minimum and only got satisfying performance when fine-tuned using the Kinetics 400 weights. These three models do not perform well when trained from scratch since they contain numerous parameters that must be changed, signaling that more extensive datasets are required. The SlowFast model obtained approximately 83% when selecting the class with highest probability. However, this score was maintained when adopting the optimum threshold. The I3D model scored 81% on the test dataset, when considered the class with highest probability. However, determining the appropriate threshold achieved the best accuracy of approximately 87%. Finally, the C2D model obtained approximately 77% accuracy on the test dataset. This model maintained this performance when computed and utilizing the optimum threshold. These thresholds were determined using the ROC Curve, which looked for the best threshold with the highest number of true positives and the lowest amount of false positives. Ultimately, this study offered a unique bullying dataset with activities that highlight the bullying theme and have more attributes than well-known conflict datasets. After cleaning and labeling the dataset, 550 bullying and non-bullying trimming films were produced. Due to the sensitivity of the topic and the requirement for authorization from the student’s responsible entity, the filming procedure of the movies, getting the school locations and students, was challenging. It was suggested for future work to use network compression techniques through knowledge distillation, teaching a student model with a smaller size with knowledge derived from a huge model, to reduce the number of parameters and thus the number of computing resources while maintaining accuracy. This approach has advantages since it allows the model to be performed in inference mode on IoT devices rather than transferring data over the Internet to large data centers. This method provides an additional security layer to an application because of the sensitive bullying topic and school video information. Another enhancement proposal is to record new bullying and non-bullying films to offer more features and variation to the dataset.