Repository logo
 
Loading...
Project Logo
Research Project

NOVA Laboratory for Computer Science and Informatics

Authors

Publications

Real-time 2D–3D door detection and state classification on a low-power device
Publication . Ramôa, João Gaspar; Lopes, Vasco; Alexandre, Luís; Mogo, Sandra
In this paper, we propose three methods for door state classifcation with the goal to improve robot navigation in indoor spaces. These methods were also developed to be used in other areas and applications since they are not limited to door detection as other related works are. Our methods work ofine, in low-powered computers as the Jetson Nano, in real-time with the ability to diferentiate between open, closed and semi-open doors. We use the 3D object classifcation, PointNet, real-time semantic segmentation algorithms such as, FastFCN, FC-HarDNet, SegNet and BiSeNet, the object detection algorithm, DetectNet and 2D object classifcation networks, AlexNet and GoogleNet. We built a 3D and RGB door dataset with images from several indoor environments using a 3D Realsense camera D435. This dataset is freely available online. All methods are analysed taking into account their accuracy and the speed of the algorithm in a low powered computer. We conclude that it is possible to have a door classifcation algorithm running in real-time on a low-power device.
Improving Neural Architecture Search With Bayesian Optimization and Generalization Mechanisms
Publication . Lopes, Vasco Ferrinho; Alexandre, Luís Filipe Barbosa de Almeida
Advances in Artificial Intelligence (AI) and Machine Learning (ML) obtained impressive breakthroughs and remarkable results in various problems. These advances can be largely attributed to deep learning algorithms, especially Convolutional Neural Networks (CNNs). The ever-growing success of CNNs is mainly due to the ingenuity and engineering efforts of human experts who have designed and optimized powerful neural network architectures, which obtained unprecedented results in a vast panoply of tasks. However, applying a ML method to a problem for which it has not been explicitly tailor-made usually leads to sub-optimal results, which in extreme cases can even lead to poor performances, thus hindering the sustainability of a system and the wide-spread application of ML by non-experts. Designing tailor-made CNNs for specific problems is a difficult task, as many design choices depend on each other. Thus, it became logical to automate this process by designing and developing automated Neural Architecture Search (NAS) methods. Architectures found with NAS achieve state-of-the-art performance in various tasks, outperforming human-designed networks. However, NAS methods still face several problems. Most heavily rely on human-defined assumptions constraining the search, such as the architecture’s outer-skeletons, number of layers, parameter heuristics, and search spaces. Common search spaces consist of repeatable modules (cells) instead of fully exploring the architecture’s search space by designing entire architectures (macro-search), which requires deep human expertise and restricts the search to pre-defined settings and narrows the exploration of new and diverse architectures by having forced rules. Also, considerable computation is still inherent to most NAS methods, and only a few can perform macro-search. In this thesis, we focused on proposing novel solutions to mitigate the problems mentioned above. First, we provide a comprehensive review of NAS components, methods, and benchmarks. For the latter, we conduct a study on operation importance to evaluate how the operation pool of search spaces influences the performance of generated architectures. Following, we studied how different neural networks behave for different classification problems and proposed two novel methods to improve upon existing neural networks with NAS by i) searching for a new classification head and ii) searching for a fusion method that allows performing multimodal classification. We then looked into improving the search cost of NAS methods by proposing a zero-proxy estimation strategy that scores architectures at initialization stage through the analysis of the Jacobian matrix and an evolutionary strategy that generates architectures by performing operation mutation and by leveraging the zero-cost proxy estimation to efficiently guide the search process. To further improve the capabilities of NAS methods, we extend the analysis of architectures at initialization stage by proposing a second zero-cost proxy method, which looks at the Neural Tangent Kernel of a generated architecture to infer its final performance if trained. With this, we also propose a novel search space that leverages large pre-trained feature extractors (CNNs) and forces the search only to a small middleware architecture that learns a downstream task. These two methods showed that large models can be efficiently leveraged to learn new tasks without requiring any fine-tuning or extensive computational resources. To further improve the search and memory costs of NAS methods, we proposed MANAS. This method frames NAS as a multi-agent optimization problem and uses independent agents that search for operations in a distributed manner. With MANAS, we showed that both the search cost and the memory resources can be heavily reduced while improving the final performance. Finally, to push NAS to less constrained search spaces and settings, we proposed LCMNAS, a NAS method that performs macrosearch without relying on pre-defined heuristics or bounded search spaces. LCMNAS introduces three components for the NAS pipeline: i) a method that leverages information about well-known architectures to autonomously generate complex search spaces based on weighted directed graphs with hidden properties, ii) an evolutionary search strategy that generates complete architectures from scratch, and iii) a mixed-performance estimation approach that combines information about architectures at initialization stage and lower fidelity estimates to infer their trainability and capacity to model complex functions. Results obtained by the proposed methods show that it is possible to improve NAS methods regarding search and memory costs, as well as computation requirements, while still obtaining state-of-the-art results. All proposed methods were evaluated in multiple search spaces and several data sets, showing improved performances while requiring only a fraction of previous NAS methods’ time and computation needs.
6D Pose Estimation and Object Recognition
Publication . Pereira, Nuno José Matos; Alexandre, Luís Filipe Barbosa de Almeida
6D pose estimation is a computer vision task where the objective is to estimate the 3 degrees of freedom of the object’s position (translation vector) and the other 3 degrees of freedom for the object’s orientation (rotation matrix). 6D pose estimation is a hard problem to tackle due to the possible scene cluttering, illumination variability, object truncations, and different shapes, sizes, textures, and similarities between objects. However, 6D pose estimation methods are used in multiple contexts like augmented reality, for example, where badly placed objects into the real-world can break the experience of augmented reality. Another application example is the use of augmented reality in the industry to train new and competent workers where virtual objects need to be placed in the correct positions to look like real objects or simulate their placement in the correct positions. In the context of Industry 4.0, robotic systems require adaptation to handle unconstrained pick-and-place tasks, human-robot interaction and collaboration, and autonomous robot movement. These environments and tasks are dependent on methods that perform object detection, object localization, object segmentation, and object pose estimation. To have accurate robotic manipulation, unconstrained pick-and-place, and scene understanding, accurate object detection and 6D pose estimation methods are needed. This thesis presents methods that were developed to tackle the 6D pose estimation problem as-well as the implementations of proposed pipelines in the real-world. To use the proposed pipelines in the real-world a data set needed to be capture and annotated to train and test the methods. Some controlling robot routines and interfaces were developed in order to be able to control a UR3 robot in the pipelines. The MaskedFusion method, proposed by us, achieves pose estimation accuracy below 6mm in the LineMOD dataset and an AUC score of 93.3% in the challenging YCB-Video dataset. Despite longer training time, MaskedFusion demonstrates low inference time, making it suitable for real-time applications. A study was performed about the effectiveness of employing different color spaces and improved segmentation algorithms to enhance the accuracy of 6D pose estimation methods. Moreover, the proposed MPF6D outperforms other approaches, achieving remarkable accuracy of 99.7% in the LineMOD dataset and 98.06% in the YCB-Video dataset, showcasing its potential for high-precision 6D pose estimation. Additionally, the thesis presents object grasping methods with exceptional accuracy. The first approach, comprising data capture, object detection, 6D pose estimation, grasping detection, robot planning, and motion execution, achieves a 90% success rate in non-controlled environment tests. Leveraging a diverse dataset with varying light conditions proves critical for accurate performance in real-world scenarios. Furthermore, an alternative method demonstrates accurate object grasping without relying on 6D pose estimation, offering faster execution and requiring less computational power. With a remarkable 96% accuracy and an average execution time of 5.59 seconds on a laptop without an NVIDIA GPU, this method demonstrates efficiency and practicality performing unconstrained pick-and-place tasks using a UR3 robot.
Contributions to Permissionless Decentralized Networks for Digital Currencies Based on Delegated Proof of Stake
Publication . Morais, Rui Pedro Bernardo de; Crocker, Paul Andrew; Sousa, Simão Melo de
With the growing and flourishing of human societies came the desire to exchange what was deemed as valuable, be it a good or a service. Initially this exchange was made directly through barter, either synchronously or asynchronously with debt. The first had the downside of requiring coincidence of wants and the second the need for trust. Both were very inefficient and did not scale well. So, what we call money was invented, which is nothing more than a good that is used as medium of exchange between other goods and services. Since then, money has changed form and has acquired new functions, namely unit of account and store of value. The most recent form of money is digital currency. This money cannot be transferred physically like other forms, so it needs a digital network to be transferred, which can have different characteristics. This thesis concerns a specific type of networks for digital currencies: permissionless, meaning that any participant can have read and write access to the network; decentralized, meaning that no single entity controls the network; and that use Delegated Proof of Stake (DPoS) as a Sybil defence mechanism, to prevent the network from being controlled by malicious actors that create numerous false identities. Its research tries to fulfil the vision that a network for digital currencies, besides being permissionless and decentralized, should be scalable, monetary policy agnostic, anonymous and have high performance. Three different layers of the network are studied: the communication layer, responsible for sending and receiving messages, the transaction layer, responsible for validating those messages, and the consensus layer, responsible for reaching agreement on the state of the network. The first two goals can be achieved in the communication layer. On one hand, a vertical way to scale the system is proposed composed of a peer management and traffic prioritization design based on DPoS, offering an alternative to highly disseminated fee-based models. On the other hand, a horizontal way to scale is presented through database sharding. In the transaction layer, a general framework to make DPoS compatible with anonymity is described. More specifically, two different approaches to achieve amount anonymity are proposed: one based on multi-party computation and the other on the Diffie-Hellman key exchange. Finally, a new decoy selection algorithm, called SimpleDSA, is developed to improve sender anonymity. The consensus layer features two innovative consensus algorithms, Nero and Echidna, and two methods for state machine replication: Sphinx (leader-based) and Cerberus (leaderless). These developments aim to enhance the performance of the network, specifically by decreasing the latency of its state changes and increasing the throughput, i.e., increasing the number of state changes per unit of time. A protocol that instantiates the transaction and consensus layer, called Adamastor, is formalized with security proofs and implemented with a prototype in the Rust language. Benchmarks demonstrate the practicality of the scheme and potential application to decentralized payment systems. While further research is needed, particularly in implementing a fully operational network, it sets a foundation for future advancements. In conclusion, this thesis contributes to the area of knowledge that results from the fusion of economics and computer science, by offering technical solutions for implementing a vision of a more inclusive, fairer, efficient, and secure financial system. The implications of this work are far-reaching, suggesting a future where digital currencies play a significant role in shaping global finance and technology.

Organizational Units

Description

Keywords

Contributors

Funders

Funding agency

Fundação para a Ciência e a Tecnologia

Funding programme

6817 - DCRRNI ID

Funding Award Number

UIDB/04516/2020

ID