Name: | Description: | Size: | Format: | |
---|---|---|---|---|
534.19 KB | Adobe PDF |
Authors
Advisor(s)
Abstract(s)
Nos últimos anos tem-se ve rificado um crescente fluxo de conteúdos disponíveis e de
fácil acesso na World Wide Web (Web), fazendo com que actualmente haja uma acumulação
excessiva de textos de diversas naturezas. Apesar dos aspectos positivos que isto representa
e do potencial que acarreta, surge uma nova problemática que consiste na necessidade de
desenvolver ferramentas e metodologias capazes de tratar esses mesmos conteúdos ao nível
das opiniões e/ou sentimentos neles evidenciados.
A avaliação dos conteúdos Web não é uma tarefa fácil. As técnicas de avaliação dos
conteúdos estão inseridos na área de análise de sentimento e muitos são os trabalhos sugeridos.
Esta lese segue um rumo diferente, com ela pretende-se avaliar os conteúdos Web
para a língua portuguesa europeia. O critério base adoptado é criar as bases para, no futuro,
construir classificadores de sentimentos.
Os léxicos emocionais servem de base a grande parte dos métodos que efectuam a análise
de sentimento. Apesar de existir uma grande quantidade desses recursos disponíveis para
a comunidade científica, depois de muita pesquisa, verificou-se que não existe um recurso
semelhante para a língua portuguesa europeia. Com o interesse cada vez maior por parte
das empresas ou indivíduos em obter informação sobre os produtos em tempo real a partir
dos dados da Web, existe a necessidade de construir um léxico emocional para o português
que possa ser utilizado para efectuar a análise de sentimento, para esta língua.
Para colmatar esta falta, construiu-se automaticamente um léxico emocional para o português.
Os métodos que efectuam a análise de sentimento utilizam léxicos construídos
manualmente ou semiautomaticamente, surgindo o problema do acrescentar conhecimentos
linguísticos aos léxicos, inerente ao modo como estes são construídos.
Sendo a identificação dos sentimentos a chave do processo, é necessário saber que os
sentimentos são sinónimo de subjectividade. O desafio colocado nesta dissertação é de construir
automaticamente um léxico subjectivo para o português europeu, aplicando técnicas
estatísticas.
A base da construção do léxico são os corpora que vaã ser utilizados. Para efectuar este ˜ estudo é necessário um corpus subjectivo (constituído por textos dos blogues) e um corpus objectivo (constituído por textos do corpus jornalístico CETEMPublico). Os corpora foram escolhidos com base no estudo efectuado por Pais [21] na demonstração da similaridade entre um conjunto de blogues e um corpus jornalístico, relativamente a um corpus constituído por textos subjectivos e objectivos. Para identificar a subjectividade no texto utilizou-se a informação das categorias morfológicas (part-of-speech) das palavras simples e as palavras compostas (n-grams). Estes indícios de subjectividade foram extraídos com ferramentas que efectuam o processo automaticamente. Com este trabalho demonstrou-se que é possível construir um léxico de subjectividade para o português europeu, aplicando técnicas estatísticas e utilizando corpora não anotados manualmente e ferramentas para extrair automaticamente os indícios de subejctividade.
It has been verified in the last few years, the increasing of data contents and easy access to Web allowing an excessive gathering of text from several natures. Despite of all positive aspects and the promising potential, appears a new problem related to the need of develop too1s and methods capable to treat those contents in opinions and feelings. The evaluation of Web contents is no! an easy task. The content evaluation techniques belong to feelings analysis area and there is a lot of proposed works. This thesis follows a different course by evaluating the web contents to European Portuguese language. The initial criterion adapted is to build the bases to create feeling classifiers. The emotional lexicon is the base to the most of the feeling analysis methods. Regardless of being a 101 of available resources to the scientific community, has been verified after big research, that there is no similar resource to European Portuguese language. With the interest of most companies and individuals to gain product information from data web in real time, there is a need to build an emotional Portuguese lexicon that might be use to feeling analysis. To deal with this missing, an emotional Portuguese lexicon has been built. The feeling analysis methods use lexicons, built manually or semi automatic, appearing the problem of increasing the linguistic lexicon acknowledgment associated to the way of how they are built. Considering the feelings has the key of the process, it's important to know that feelings are synonymous of subjectivity. The goal of this dissertation is to build an automatic European Portuguese subject lexicon using statistical techniques. The lexicon construction base is the corpora which will be used. To do this research is required a subjective corpus (built by blog text) and an objective corpus (built by CETEMPúblico journalistic corpus text). The corpora have been chosen based in Pais [21] research in the similarity demonstration between a blog amount and a journalistic corpus, relatively to a corpus built by subjective and objective texts. To identify the text subjectivity has been used the information from the morphologic categories (part-of-speech) of simple and compound (n-grams) words. These subjectivity evidences were extracted with tools that work with the automatic process. This work demonstrate that it is possible to build a subjectivity lexicon to the European Portuguese language applying statistical techniques and using manually not noted corpora as well as subjectivity evidences automatic extraction tools.
It has been verified in the last few years, the increasing of data contents and easy access to Web allowing an excessive gathering of text from several natures. Despite of all positive aspects and the promising potential, appears a new problem related to the need of develop too1s and methods capable to treat those contents in opinions and feelings. The evaluation of Web contents is no! an easy task. The content evaluation techniques belong to feelings analysis area and there is a lot of proposed works. This thesis follows a different course by evaluating the web contents to European Portuguese language. The initial criterion adapted is to build the bases to create feeling classifiers. The emotional lexicon is the base to the most of the feeling analysis methods. Regardless of being a 101 of available resources to the scientific community, has been verified after big research, that there is no similar resource to European Portuguese language. With the interest of most companies and individuals to gain product information from data web in real time, there is a need to build an emotional Portuguese lexicon that might be use to feeling analysis. To deal with this missing, an emotional Portuguese lexicon has been built. The feeling analysis methods use lexicons, built manually or semi automatic, appearing the problem of increasing the linguistic lexicon acknowledgment associated to the way of how they are built. Considering the feelings has the key of the process, it's important to know that feelings are synonymous of subjectivity. The goal of this dissertation is to build an automatic European Portuguese subject lexicon using statistical techniques. The lexicon construction base is the corpora which will be used. To do this research is required a subjective corpus (built by blog text) and an objective corpus (built by CETEMPúblico journalistic corpus text). The corpora have been chosen based in Pais [21] research in the similarity demonstration between a blog amount and a journalistic corpus, relatively to a corpus built by subjective and objective texts. To identify the text subjectivity has been used the information from the morphologic categories (part-of-speech) of simple and compound (n-grams) words. These subjectivity evidences were extracted with tools that work with the automatic process. This work demonstrate that it is possible to build a subjectivity lexicon to the European Portuguese language applying statistical techniques and using manually not noted corpora as well as subjectivity evidences automatic extraction tools.
Description
Keywords
Análise de subjectividade - Informática Língua portuguesa - Subjectividade - Ferramentas informáticas - Léxico Recuperação da informação - Internet - Subjectividade Linguística computacional - Dicionário emocional - Web