Browsing by Author "Brito, Pedro Jorge Franco"
Now showing 1 - 1 of 1
Results Per Page
Sort Options
- Object Detection in Data Acquired From Aerial DevicesPublication . Brito, Pedro Jorge Franco; Proença, Hugo Pedro Martins CarriçoThe object detection task, both in images and in videos, has been the source of extraordinary advances with state-of-the-art architectures that can achieve close to perfect precision on large modern datasets. As a result, since these models are trained on large-scale datasets, most of them can adapt to almost any other real-world scenario if given enough data. Nevertheless, there is a specific scenario, aerial images, in which these models tend to perform worse due to their natural characteristics. The main problem differentiating typical object detection datasets from aerial object detection datasets is the object’s scale that needs to be located and identified. Moreover, factors such as the image’s brightness, object rotation and details, and background colours also play a crucial role in the model’s performance, no matter its architecture. Deep learning models make decisions based on the features they can extract from the training data. This technique works particularly well in standard scenarios, where images portray the object at a standard scale in which the object’s details are precise and allow the model to distinguish it from the other objects and background. However, when considering a scenario where the image is being captured from 50 meters above, the object’s details diminish considerably and, thus, logically, making it harder for deep learning models to extract meaningful features that will allow for the identification and localization of the said object. Nowadays, many surveillance systems use static cameras placed in pre-defined places; however, a more appropriate approach for some scenarios would be using drones to surveil a particular area with a specific route. More specifically, these types of surveillance would be adequate for scenarios where it is not feasible to cover the whole area with static cameras, such as wild forests. The first objective of this dissertation is to gather a dataset that focuses on detecting people and vehicles in wild-forest scenarios. The dataset was captured using a DJI drone in four distinct zones of Serra da Estrela. It contains instances captured under different weather conditions – sunny and foggy – and during different parts of the day – morning, afternoon and evening. In addition, it also includes four different types of terrain, earth, tar, forest, and gravel, and there are two classes of objects, person and vehicle. Later on, the second objective of this dissertation aims to precisely analyze how state-ofthe-art single-frame-based and video object detectors perform in the previously described dataset. The analysis focuses on the models’ performance related to each object class in every terrain. Given this, we can demonstrate the exact situations in which the different models stand out and which ones tend to perform the worse. Finally, we propose two methods based on the results obtained during the first phase of experiments, where each aims to solve a different problem that emerged from applying stateof-the-art models to aerial images. The first method aims to improve the performance of the video object detector models in certain situations by using background removal algorithms to delineate specific areas in which the detectors’ predictions are considered valid. One of the main problems with creating a high-quality dataset from scratch is the intensive and time-consuming annotation process after gathering the data. Regarding this, the second method we propose consists of a self-supervised architecture that aims to tackle the particular scarcity of high-quality aerial datasets. The main idea is to analyze the usefulness of unlabelled data in these problems and thus, avoid the immense time-consuming process of labelling the entirety of a full-scale aerial dataset. The reported results show that even with only a partially labelled dataset, it is possible to use the unlabelled data in a self-supervised matter to improve the model’s performance further.