Loading...
Research Project
NOVA Laboratory for Computer Science and Informatics
Funder
Authors
Publications
Real-time 2D–3D door detection and state classification on a low-power device
Publication . Ramôa, João Gaspar; Lopes, Vasco; Alexandre, Luís; Mogo, Sandra
In this paper, we propose three methods for door state classifcation with the goal to improve robot navigation in indoor
spaces. These methods were also developed to be used in other areas and applications since they are not limited to door
detection as other related works are. Our methods work ofine, in low-powered computers as the Jetson Nano, in real-time
with the ability to diferentiate between open, closed and semi-open doors. We use the 3D object classifcation, PointNet,
real-time semantic segmentation algorithms such as, FastFCN, FC-HarDNet, SegNet and BiSeNet, the object detection
algorithm, DetectNet and 2D object classifcation networks, AlexNet and GoogleNet. We built a 3D and RGB door dataset
with images from several indoor environments using a 3D Realsense camera D435. This dataset is freely available online.
All methods are analysed taking into account their accuracy and the speed of the algorithm in a low powered computer.
We conclude that it is possible to have a door classifcation algorithm running in real-time on a low-power device.
Improving Neural Architecture Search With Bayesian Optimization and Generalization Mechanisms
Publication . Lopes, Vasco Ferrinho; Alexandre, Luís Filipe Barbosa de Almeida
Advances in Artificial Intelligence (AI) and Machine Learning (ML) obtained impressive
breakthroughs and remarkable results in various problems. These advances can be
largely attributed to deep learning algorithms, especially Convolutional Neural Networks
(CNNs). The ever-growing success of CNNs is mainly due to the ingenuity and engineering
efforts of human experts who have designed and optimized powerful neural network
architectures, which obtained unprecedented results in a vast panoply of tasks. However,
applying a ML method to a problem for which it has not been explicitly tailor-made usually
leads to sub-optimal results, which in extreme cases can even lead to poor performances,
thus hindering the sustainability of a system and the wide-spread application of ML by
non-experts. Designing tailor-made CNNs for specific problems is a difficult task, as many
design choices depend on each other. Thus, it became logical to automate this process by
designing and developing automated Neural Architecture Search (NAS) methods.
Architectures found with NAS achieve state-of-the-art performance in various tasks,
outperforming human-designed networks. However, NAS methods still face several problems.
Most heavily rely on human-defined assumptions constraining the search, such
as the architecture’s outer-skeletons, number of layers, parameter heuristics, and search
spaces. Common search spaces consist of repeatable modules (cells) instead of fully exploring
the architecture’s search space by designing entire architectures (macro-search),
which requires deep human expertise and restricts the search to pre-defined settings and
narrows the exploration of new and diverse architectures by having forced rules. Also, considerable
computation is still inherent to most NAS methods, and only a few can perform
macro-search.
In this thesis, we focused on proposing novel solutions to mitigate the problems mentioned
above. First, we provide a comprehensive review of NAS components, methods,
and benchmarks. For the latter, we conduct a study on operation importance to evaluate
how the operation pool of search spaces influences the performance of generated architectures.
Following, we studied how different neural networks behave for different classification
problems and proposed two novel methods to improve upon existing neural networks
with NAS by i) searching for a new classification head and ii) searching for a fusion
method that allows performing multimodal classification. We then looked into improving
the search cost of NAS methods by proposing a zero-proxy estimation strategy that
scores architectures at initialization stage through the analysis of the Jacobian matrix and
an evolutionary strategy that generates architectures by performing operation mutation
and by leveraging the zero-cost proxy estimation to efficiently guide the search process.
To further improve the capabilities of NAS methods, we extend the analysis of architectures
at initialization stage by proposing a second zero-cost proxy method, which looks
at the Neural Tangent Kernel of a generated architecture to infer its final performance if
trained. With this, we also propose a novel search space that leverages large pre-trained feature extractors (CNNs) and forces the search only to a small middleware architecture
that learns a downstream task. These two methods showed that large models can be efficiently
leveraged to learn new tasks without requiring any fine-tuning or extensive computational
resources. To further improve the search and memory costs of NAS methods,
we proposed MANAS. This method frames NAS as a multi-agent optimization problem
and uses independent agents that search for operations in a distributed manner. With
MANAS, we showed that both the search cost and the memory resources can be heavily
reduced while improving the final performance. Finally, to push NAS to less constrained
search spaces and settings, we proposed LCMNAS, a NAS method that performs macrosearch
without relying on pre-defined heuristics or bounded search spaces. LCMNAS introduces
three components for the NAS pipeline: i) a method that leverages information
about well-known architectures to autonomously generate complex search spaces based
on weighted directed graphs with hidden properties, ii) an evolutionary search strategy
that generates complete architectures from scratch, and iii) a mixed-performance estimation
approach that combines information about architectures at initialization stage and
lower fidelity estimates to infer their trainability and capacity to model complex functions.
Results obtained by the proposed methods show that it is possible to improve NAS
methods regarding search and memory costs, as well as computation requirements, while
still obtaining state-of-the-art results. All proposed methods were evaluated in multiple
search spaces and several data sets, showing improved performances while requiring only
a fraction of previous NAS methods’ time and computation needs.
6D Pose Estimation and Object Recognition
Publication . Pereira, Nuno José Matos; Alexandre, Luís Filipe Barbosa de Almeida
6D pose estimation is a computer vision task where the objective is to estimate the 3
degrees of freedom of the object’s position (translation vector) and the other 3 degrees of
freedom for the object’s orientation (rotation matrix). 6D pose estimation is a hard problem
to tackle due to the possible scene cluttering, illumination variability, object truncations,
and different shapes, sizes, textures, and similarities between objects. However, 6D
pose estimation methods are used in multiple contexts like augmented reality, for example,
where badly placed objects into the real-world can break the experience of augmented
reality. Another application example is the use of augmented reality in the industry to
train new and competent workers where virtual objects need to be placed in the correct
positions to look like real objects or simulate their placement in the correct positions. In
the context of Industry 4.0, robotic systems require adaptation to handle unconstrained
pick-and-place tasks, human-robot interaction and collaboration, and autonomous robot
movement. These environments and tasks are dependent on methods that perform object
detection, object localization, object segmentation, and object pose estimation. To have
accurate robotic manipulation, unconstrained pick-and-place, and scene understanding,
accurate object detection and 6D pose estimation methods are needed.
This thesis presents methods that were developed to tackle the 6D pose estimation problem
as-well as the implementations of proposed pipelines in the real-world. To use the
proposed pipelines in the real-world a data set needed to be capture and annotated to
train and test the methods. Some controlling robot routines and interfaces were developed
in order to be able to control a UR3 robot in the pipelines.
The MaskedFusion method, proposed by us, achieves pose estimation accuracy below
6mm in the LineMOD dataset and an AUC score of 93.3% in the challenging YCB-Video
dataset. Despite longer training time, MaskedFusion demonstrates low inference time,
making it suitable for real-time applications. A study was performed about the effectiveness
of employing different color spaces and improved segmentation algorithms to enhance
the accuracy of 6D pose estimation methods.
Moreover, the proposed MPF6D outperforms other approaches, achieving remarkable
accuracy of 99.7% in the LineMOD dataset and 98.06% in the YCB-Video dataset, showcasing
its potential for high-precision 6D pose estimation. Additionally, the thesis presents
object grasping methods with exceptional accuracy. The first approach, comprising data
capture, object detection, 6D pose estimation, grasping detection, robot planning, and
motion execution, achieves a 90% success rate in non-controlled environment tests. Leveraging
a diverse dataset with varying light conditions proves critical for accurate performance in real-world scenarios. Furthermore, an alternative method demonstrates accurate
object grasping without relying on 6D pose estimation, offering faster execution and
requiring less computational power. With a remarkable 96% accuracy and an average
execution time of 5.59 seconds on a laptop without an NVIDIA GPU, this method demonstrates
efficiency and practicality performing unconstrained pick-and-place tasks using a
UR3 robot.
Contributions to Permissionless Decentralized Networks for Digital Currencies Based on Delegated Proof of Stake
Publication . Morais, Rui Pedro Bernardo de; Crocker, Paul Andrew; Sousa, Simão Melo de
With the growing and flourishing of human societies came the desire to exchange what
was deemed as valuable, be it a good or a service. Initially this exchange was made directly
through barter, either synchronously or asynchronously with debt. The first had
the downside of requiring coincidence of wants and the second the need for trust. Both
were very inefficient and did not scale well. So, what we call money was invented, which
is nothing more than a good that is used as medium of exchange between other goods and
services. Since then, money has changed form and has acquired new functions, namely
unit of account and store of value. The most recent form of money is digital currency. This
money cannot be transferred physically like other forms, so it needs a digital network to
be transferred, which can have different characteristics.
This thesis concerns a specific type of networks for digital currencies: permissionless,
meaning that any participant can have read and write access to the network; decentralized,
meaning that no single entity controls the network; and that use Delegated Proof of Stake
(DPoS) as a Sybil defence mechanism, to prevent the network from being controlled by
malicious actors that create numerous false identities.
Its research tries to fulfil the vision that a network for digital currencies, besides being
permissionless and decentralized, should be scalable, monetary policy agnostic, anonymous
and have high performance. Three different layers of the network are studied: the
communication layer, responsible for sending and receiving messages, the transaction
layer, responsible for validating those messages, and the consensus layer, responsible for
reaching agreement on the state of the network.
The first two goals can be achieved in the communication layer. On one hand, a vertical
way to scale the system is proposed composed of a peer management and traffic prioritization
design based on DPoS, offering an alternative to highly disseminated fee-based
models. On the other hand, a horizontal way to scale is presented through database sharding.
In the transaction layer, a general framework to make DPoS compatible with anonymity is
described. More specifically, two different approaches to achieve amount anonymity are
proposed: one based on multi-party computation and the other on the Diffie-Hellman
key exchange. Finally, a new decoy selection algorithm, called SimpleDSA, is developed
to improve sender anonymity.
The consensus layer features two innovative consensus algorithms, Nero and Echidna,
and two methods for state machine replication: Sphinx (leader-based) and Cerberus (leaderless).
These developments aim to enhance the performance of the network, specifically
by decreasing the latency of its state changes and increasing the throughput, i.e., increasing
the number of state changes per unit of time. A protocol that instantiates the transaction and consensus layer, called Adamastor, is formalized
with security proofs and implemented with a prototype in the Rust language.
Benchmarks demonstrate the practicality of the scheme and potential application to decentralized
payment systems. While further research is needed, particularly in implementing
a fully operational network, it sets a foundation for future advancements.
In conclusion, this thesis contributes to the area of knowledge that results from the fusion
of economics and computer science, by offering technical solutions for implementing a
vision of a more inclusive, fairer, efficient, and secure financial system. The implications
of this work are far-reaching, suggesting a future where digital currencies play a significant
role in shaping global finance and technology.
Organizational Units
Description
Keywords
Contributors
Funders
Funding agency
Fundação para a Ciência e a Tecnologia
Funding programme
6817 - DCRRNI ID
Funding Award Number
UIDB/04516/2020