Repository logo
 
Loading...
Profile Picture
Person

Cordeiro, João Paulo da Costa

Search Results

Now showing 1 - 7 of 7
  • ECIR 2018: Text2Story Workshop-Narrative Extraction from Texts
    Publication . Jorge, Alípio Mário Guedes; Campos, Ricardo; Jatowt, Adam; Nunes, Sergio; Rocha, Conceiçeição; Cordeiro, João; Pasquali, Arian; Mangaravite, Vitor
    The 1st International Workshop on Narrative Extraction from Texts (Text2Story 2018) was held in conjunction with the 40th European Conference on Information Retrieval, ECIR 2018, Grenoble on the 26th March 2018. The workshop aimed to help foster the collaboration of researchers on a wide range of multidisciplinary issues related to the text-to-narrative- structure. The program consisted of two keynote talks, six research presentations, a poster session and a slot for demo presentations. This report briefly summarizes the workshop.
  • Report on the Second International Workshop on Narrative Extraction from Texts (Text2Story 2019)
    Publication . Jorge, Alípio Mário Guedes; Campos, Ricardo; Jatowt, Adam; Bhatia, Sumit; Pasquali, Arian; Cordeiro, João; Rocha, Conceição; Mangaravite, Vítor
    The Second International Workshop on Narrative Extraction from Texts (Text2Story’19 [http://text2story19.inesctec.pt/]) was held on the 14th of April 2019, in conjunction with the 41st European Conference on Information Retrieval (ECIR 2019) in Cologne, Germany. The workshop provided a platform for researchers in IR, NLP, and design and visualization to come together and share the recent advances in extraction and formal representation of narratives. The workshop consisted of two invited talks, ten research paper presentations, and a poster and demo session. The proceedings of the workshop are available online at http://ceur-ws.org/Vol-2342/
  • SocialNetCrawler: Online Social Network Crawler
    Publication . Pais, S.; Cordeiro, João; Martins, Ricardo; Albardeiro, Miguel Ângelo Serra
    The emergence and popularization of online social networks suddenly made available a large amount of data from social organization, interaction and human behavior. All this information opens new perspectives and challenges to the study of social systems, be- ing of interest to many fields. Although most online social networks are recent, a vast amount of scientific papers was already published on this topic, dealing with a broad range of analytical methods and applications. Therefore, the development of a tool capable of gather tailored information from social networks is something that can help a lot of researchers on their work, especially in the area of Natural Language Processing (NLP). Nowadays, the daily base medium where people use more often text language lays precisely on social networks. Therefore, the ubiquitous crawling of social networks is of the utmost importance for researchers. Such a tool will allow the researcher to get the relevant needed information, allowing faster research in what really matters, without losing time on the development of his own crawler. In this paper, we present an extensive analysis of the existing social networks and their APIs, and also describe the conception and design of a social network crawler which will help NLP researchers.
  • Empirical study of verbs and prepositions in European Portuguese with recourse to Web/Text Minning
    Publication . Cordeiro, João; Brazdil, Pavel; Leal, António
    This chapter describes our study of verbs and prepositions for European Portuguese as they are used in current articles in newspapers. The aim is to enrich the information that is available in dictionaries. This particular study focusses on verbs indicating movement. We have analyzed articles in six Portuguese newspapers and extracted more than 200 thousand of potentially relevant verb + preposition/prepositional locution cases. These were processed to identify similar cases and obtain the corresponding frequencies. Furthermore, we have also used a clustering algorithm with the objective of discovering clusters of similar verbs that are associated with similar prepositions/ prepositional locutions. Although this latest set of results is still preliminary, some similarities among verbs were uncovered already. We hope to consolidate these results in the future.
  • Análise de sentimento em artigos de opinião
    Publication . Silva, Fatima; Silvano, Purificação; Leal, António; Oliveira, Fátima; Brazdil, Pavel; Cordeiro, João; Oliveira, Débora
    O estudo apresentado realiza-se na interface entre a linguística e as ciências da computação, tendo como objetivo fazer a análise computacional de artigos de opinião na área da economia e finanças, seguindo o quadro teórico da análise de sentimento. Os principais objetivos do trabalho são i) determinar a orientação do sentimento, positivo ou negativo, e a intensidade dessa orientação através da anotação da polaridade do léxico, com incidência nos nomes e adjetivos, nos segmentos em que ocorre a expressão da opinião, e ii) verificar se um léxico específico para a área de economia e finanças tem vantagens na atribuição automática de sentimento sobre um léxico geral. Para atingir esses objetivos, foiselecionado um corpus de 45 textos, analisado em duas fases por anotadores com formação distinta. Primeiro, uma amostra de 10 textos foi obtida e anotada pelos investigadores da área de linguística, coautores deste artigo, com o objetivo de desenvolver um modelo linguístico para determinar a orientação e intensidade da polaridade de termos em artigos de opinião e extrair termos de léxico relevantes para esta área de estudo. Em seguida, um conjunto de 35 textos foi anotado por estudantes universitários, seguindo o método utilizado na primeira amostra. Com base na anotação linguística, a equipa das ciências da computação procurou determinar até que ponto um léxico de sentimento geral para a língua portuguesa – SentiLex - é suficiente para caracterizar o sentimento de uma frase de maneira satisfatória ou se o EconoLex, um léxico específico de sentimento, seria mais eficaz. O léxico específico inclui termos e expressões multipalavra relevantes para o domínio da economia e finanças e para a língua portuguesa, e foi elaborado pelos autores deste estudo. Os dados foram analisados usando uma metodologia mista, qualitativa e quantitativa. Os resultados obtidos permitem-nos considerar os seguintes itens como contributos desta investigação: i) a elaboração do modelo de anotação linguística adotado para a análise da orientação e da intensidade da polaridade do léxico, em especial dos nomes e adjetivos; ii) o papel central, ainda que não exclusivo, dos adjetivos para a determinação da polaridade do sentimento nos segmentos opinativos dos artigos do corpus; iii) o desenvolvimento de um novo léxico de sentimento específico português para a área da economia e finanças; iv) a melhoria do desempenho computacional do EconoLex⨁SentiLex em relação ao SentiLex no que se refere ao desempenho na caracterização automática de sentimento. Apesar destes resultados positivos, há algumas limitações que constituem os elementos a desenvolver na continuidade deste trabalho interdisciplinar, nomeadamente a análise linguística mais detalhada das classes gramaticais estudadas, a consideração de outros elementos/estruturas linguísticas determinantes para a caracterização do sentimento em SN/ frase, o alargamento do corpus, o aumento do léxico específico do domínio e a afinação dos métodos automáticos de identificação de termos de sentimento em textos de opinião e determinação da sua intensidade.
  • Association and Temporality between News and Tweets
    Publication . Cordeiro, João; Brazdil, Pavel; Moutinho, Vânia
    With the advent of social media, the boundaries of mainstream journalism and social networks are becoming blurred. User-generated content is increasing, and hence, journalists dedicate considerable time searching platforms such as Facebook and Twitter to announce, spread, and monitor news and crowd check information. Many studies have looked at social networks as news sources, but the relationship and interconnections between this type of platform and news media have not been thoroughly investigated. In this work, we have studied a series of news articles and examined a set of related comments on a social network during a period of six months. Specifically, a sample of articles from generalist Portuguese news sources published in the first semester of 2016 was clustered, and the resulting clusters were then associated with tweets of Portuguese users with the recourse to a similarity measure. Focusing on a subset of clusters, we have performed a temporal analysis by examining the evolution of the two types of documents (articles and tweets) and the timing of when they appeared. It appears that for some stories, namely Brexit and the European Football Cup, the publishing of news articles intensifies on key dates (event-oriented), while the discussion on social media is more balanced throughout the months leading up to those events.
  • Extracting Adverse Drug Effects from User Experiences: A Baseline
    Publication . Abrantes, Diogo; Cordeiro, João
    It has been proved that pharmacovigilance benefits from the analysis and extraction of user-generated data from blogs, medical forums or other social networks, regarding adverse effect mentions or complaints that occur from taking certain drugs. Data mining, machine learning, pattern recognition, content summarization, and natural language processing techniques are often used in this field with promising results. However, there are still several difficulties concerning the extraction, as the highly domain-specific vocabulary presents a few challenges. This is mainly because patients like to use idiomatic or vernacular expressions along with descriptive symptom explanations, which tend to deviate from grammatical rules or expected terms. To address this issue, we propose a well-curated baseline. We believe that building a specific lexicon, identifying common linguistic patterns and observing certain phrasal structures is key to first understanding how a user generates contents online. From there, we can later develop sets of tailored rules that will allow data classification/extraction systems to potentially improve their efficiency at these tasks.