Name: | Description: | Size: | Format: | |
---|---|---|---|---|
1.03 MB | Adobe PDF |
Authors
Advisor(s)
Abstract(s)
The last years saw a surge in the statistical processing of natural language and in particular in corpus based methods oriented to language acquisition. Polysemy is pointed at as the main obstacle to many tasks in the area and to thesaurus construction in particular. This dissertation summarizes the current results of a work on automatic synonymy discovery. The accent is focused on the difficulties that spring from polysemy and on linguistically and empirically motivated means to deal with it. In particular, we propose an unsupervised method to identify word usage profiles pertinent to specific word meanings. Further, we show that the routine to verify every possibility in search of semantic relations is not only computationally expensive but is rather counterproductive. As a consequence,
we propose an application of a recently developed system for paraphrases extraction and alignment so that the exhaustive search is avoided in an unsupervised manner. This led to a method, that creates short lists of pairs of words that are highly probable to be in synonymy relation.
The results show that the negative impact of polysemy is significantly reduced for part
of the polysemy specter that covers about two thirds of the vocabulary. Besides the
increased probability to discover frequently manifested synonymy relations, paraphrase
alignment proved to highlight infrequent word meanings, and to reliably identify a set of
very specific semantic relations.
Description
Keywords
Linguística computacional Linguagem natural - Processamento automático Linguagem natural - Sinonímia Linguagem natural - Análise lexical Linguagem natural - Relações semânticas Processamento da linguagem natural