Repository logo
 
Loading...
Profile Picture
Person

Correia, André Rosa de Sousa Porfírio

Search Results

Now showing 1 - 2 of 2
  • Improving the Robustness of Demonstration Learning
    Publication . Correia, André Rosa de Sousa Porfírio; Alexandre, Luís Filipe Barbosa de Almeida
    With the fast improvement of machine learning, Reinforcement Learning (RL) has been used to automate human tasks in different areas. However, training such agents is difficult and restricted to expert users. Moreover, it is mostly limited to simulation environments due to the high cost and safety concerns of interactions in the real world. Demonstration Learning is a paradigm in which an agent learns to perform a task by imitating the behavior of an expert shown in demonstrations. It is a relatively recent area in machine learning, but it is gaining significant traction due to having tremendous potential for learning complex behaviors from demonstrations. Learning from demonstration accelerates the learning process by improving sample efficiency, while also reducing the effort of the programmer. Due to learning without interacting with the environment, demonstration learning can allow the automation of a wide range of real world applications such as robotics and healthcare. Demonstration learning methods still struggle with a plethora of problems. The estimated policy is reliant on the coverage of the data set which can be difficult to collect. Direct imitation through behavior cloning learns the distribution of the data set. However, this is often not enough and the methods may struggle to generalize to unseen scenarios. If the agent visits out-of-distribution cases, not only will it not know what to do, but the consequences in the real world can be catastrophic. Because of this, offline RL methods try to specifically reduce the distributional shift. In this thesis, we focused on proposing novel methods to tackle some of the open problems in demonstration learning. We start by introducing the fundamental concepts, methodologies, and algorithms that underpin the proposed methods in this thesis. Then, we provide a comprehensive study of the state-of-the-art of Demonstration Learning methods. This study allowed us to understand existing methods and expose the open problems which motivate this thesis. We then developed five methods that push improve upon the state-of-the-art and solve different problems. The first method proposes to tackle the context problem, where policies are restricted to the context in which they were trained. We propose a method to learn context-invariant image representations with contrastive learning, by making use of a multi-view demonstration data set. We show that these representations can be used in lieu of the original images to learn a policy with standard reinforcement learning algorithms. This work also contributed with benchmark environment and a demonstration data set. Next, we tackled the potential of combining reinforcement learning with demonstration learning to cover the weaknesses of both paradigms. Specifically, we developed a method to improve the safety of reinforcement learning agents during their learning process. The proposed method makes use of a demonstration data set with safe and unsafe trajectories. Before each interaction, the method evaluates the trajectory and stops it if deems it unsafe. The method was used to augment state-of-theart reinforcement learning methods, and it reduced the crash rate significantly which also resulted in a slight increase in performance. In the following work, we acknowledged the significant strides made in sequence modelling and their impact in a plethora of machine learning problems. We noticed that these methods had recently been applied to demonstration learning. However, the state-of-the-art method was reliant on task knowledge and user interaction to perform. We proposed a hierarchical method which identifies important states in each demonstration, and uses them to guide the sequence model. The result is a method that is task and user independent but also achieves better performance than the previous state-of-the-art. Next, we made use of the novel Mamba architecture to improve upon the previous sequence modelling method. By replacing the Transformer architecture with the Mamba, we proposed two methods that reduce the complexity, and inference time while also improving the performance. Finally, we apply demonstration learning to under-explored applications. Specifically, we apply demonstration learning to teach an agent to dance to music. We describe the insight of modelling the task of learning to dance as a translation task, where the agent learns to translate from the language of music to the language of dance. We used the previous experience resulted from the two sequence modelling methods to propose two variants: using the Transformer or the Mamba architectures. The method modifies the standard sequence modelling architecture to process sequences of audio features and translate them to dance poses. Results show that the method can translate diverse and unseen music to high-quality dance motions coherent within the genre. Results obtained by the proposed methods advance the state-of-the-art in Demonstration Learning and provide solutions to open problems in the field. All the proposed methods were evaluated against state-of-the-art baselines and evaluated on several tasks and diverse data sets, improving the performance and tackling their respective problems.
  • Software for a Service Robot
    Publication . Correia, André Rosa de Sousa Porfírio; Alexandre, Luís Filipe Barbosa de Almeida
    Service robots are becoming more commonplace every year due to advances in artificial intelligence, substituting humans in increasingly more complex tasks. By having an autonomous and competent service robot performing routinely tasks instead of its human owners, their productivity increases. The search for better service robots has led to the creation of competitions where such robots are tested and the state of the art technology is pushed further. Socialab acquired a Turtlebot2 robot to serve people around the university campus and one day participate in such competitions. With hopes of achieving these goals, the laboratory has proposed a variety of projects over the years, each adding new layers offunctionality to the robot. With each studentthathas tackled their respective project,thedeveloped softwarehasbeencontinuously stacking. However, each completed project has remained separate from each other andhasn’t been used ever since. Hence,the developed software is being wasted. Therefore, it is imperative to integrate all the available software into the robot. Yet, as new projects are proposed, the problem of scattered software can reoccur after the integration of the currently available ones. Furthermore, with more functionality that is developed, the harder and longer it takes to complete their integration. To preventthis entirely, itis necessary to create structural software that eases the development of new functionality as well as its integration with the current software. To achieve this, a class was developed which is responsible for controlling the execution of all processes running in the robot, of which the different software depends on. Additionally, research was done on multiple competitions to identify the most commonly required functionality traits, which we refer to as modules. Afterwards, an implementation of each of these modules was developed. Because of their universality, their implementation allows future software that requires any of the modules to simply import them, rather than having to re-implement them. In line with good software quality practices, if any of the modules needs an upgrade, this upgrade simply has to be performed on the respective module, instead of upgrading every adjacent software that uses this module. This was the first goal of this thesis. After creating a solid foundation for robot software development, the focus shifted towards the creation of new functionality. The different tasks were obtained from the previous research of various robotic competitions. The idea is that if the robot can perform such tasks then it can participate in the competitions, while the same functionalities can be used around campus. The list aimed to be as long as possible with the goal of leaving the robot with as much functionality as possible while taking into consideration the time restraints of the development of this thesis. Seven tasks were selected. The implementation of each task is explained in detail. As each task was developed, the implemented steps were turned into modules, therefore respecting the initial goal of flexible and reusable software. Because of this, as more tasks were developed the following task’s implementation was increasingly simpler as some of the requirements were already available from the development of their predecessors. The tasks required knowledge from different areas of artificial intelligence. This lead to the broadening of my knowledge rather than specialization in a single area. With this work, we show how distinct robotic tasks were implemented. Due to the varied nature ofthe tasks, we show how to tackle a multitude of different problems that appear in the area of artificial intelligence. Additionally, the work presents an approach to create a solid foundation for the development and integration of increasingly more software. The tasks are benchmarked, meaning future updates ofthe tasks can be performed and proved superior through the comparison of their results.