| Name: | Description: | Size: | Format: | |
|---|---|---|---|---|
| 767.41 KB | Adobe PDF |
Authors
Advisor(s)
Abstract(s)
A expansão da Internet e o surgimento das redes sociais suscitam o constante crescimento de
texto on-line. Para ajudar as organizações no controlo do ponto de vista dos seus clientes
emitido nesse novo canal de comunicação, surgiu a Análise de Sentimento em Texto (AST). Esta
ciência ocupa-se no desenvolvimento de sistemas informáticos para previsão de sentimento em
grande quantidade de texto não estruturado.
Por ser uma área recente, este subdomínio do Processamento de Linguagem Natural sofre de
carência de recursos para o texto do domínio económico em Português. Face a esta realidade,
com a preocupação de dotar o Português, a semelhança do Inglês, de conhecimento, ferramentas
e recursos para AST no domínio económico, neste trabalho, verificou-se se os léxicos genéricos
de AST em Português apresentam bons resultados quando são utilizados em domínio específico.
Trabalhou-se particularmente com o domínio económico.
Para tal, desenvolvemos o SentiSoft, sistema de AST. Utilizando bases de dados lexicais como
Sentilex-Pt e OpLexicon, a taxa de acerto variou, nas experiências em texto genérico, entre 81%
e 74%, portanto, em texto do domínio económico, abaixo do 35%. A variação do sentido semântico
dos vocábulos em função do contexto foi apontado como principal causa deste insucesso.
Deste modo, concluí-se que os léxicos genéricos em Português, não apresentam bons resultados
quando são utilizados em domínios específicos e sugeriu-se a elaboração de um léxico exclusivo
para o domínio económico.
The expansion of the Internet and the emergence of social networks are provoking the constant growth of online text. In order to help organizations in the control of their customers’ opinions, issued in this new communication channel, arised the Text Sentiment Analysis (TSA). This science is concerned with the development of computer systems to classify sentiment in large amounts of unstructured text. Being a recent area, this subdomain of Natural Language Processing suffers from lack of resources for the text of the economic domain in Portuguese. Given this reality, with the aim of equipping Portuguese, the similarity of English, knowledge, tools and resources for AST in the economic domain, in this work, it was verified that the generic AST lexicons in Portuguese present good results when they are used specific domain. They worked particularly with the economic domain. For this, we developed the SentiSoft, AST system. Using lexical databases such as Sentilex-Pt and OpLexicon, the hit rate ranged from 81 % to 74 %, hence in economic domain text, below 35 %. The variation of the semantic sense of the words in relation to the context was pointed out as the main cause of this failure. Thus, it was concluded that the generic lexicons in Portuguese do not present good results when they are used in specific domains and it was suggested the elaboration of a lexicon exclusively for the economic domain.
The expansion of the Internet and the emergence of social networks are provoking the constant growth of online text. In order to help organizations in the control of their customers’ opinions, issued in this new communication channel, arised the Text Sentiment Analysis (TSA). This science is concerned with the development of computer systems to classify sentiment in large amounts of unstructured text. Being a recent area, this subdomain of Natural Language Processing suffers from lack of resources for the text of the economic domain in Portuguese. Given this reality, with the aim of equipping Portuguese, the similarity of English, knowledge, tools and resources for AST in the economic domain, in this work, it was verified that the generic AST lexicons in Portuguese present good results when they are used specific domain. They worked particularly with the economic domain. For this, we developed the SentiSoft, AST system. Using lexical databases such as Sentilex-Pt and OpLexicon, the hit rate ranged from 81 % to 74 %, hence in economic domain text, below 35 %. The variation of the semantic sense of the words in relation to the context was pointed out as the main cause of this failure. Thus, it was concluded that the generic lexicons in Portuguese do not present good results when they are used in specific domains and it was suggested the elaboration of a lexicon exclusively for the economic domain.
Description
Keywords
Análise de Sentimento Em Texto Domínio Económico Léxico Taxa de Acerto
