| Name: | Description: | Size: | Format: | |
|---|---|---|---|---|
| 1.12 MB | Adobe PDF |
Authors
Advisor(s)
Abstract(s)
A prática de plágio em documentos, livros e na arte de forma geral, tem consequência gravas na
sociedade. A existência de pessoas sem honestidade, na academia, na indústria, na imprensa
que se apropriam da propriedade intelectual de outrem, levou algumas organizações a produzirem
normas de combate ao plágio e adotarem meios tecnológicas para enfrentar e evitar a
propagação deste mal.
Os sistemas de Deteção Automática de Plágio (DAP) são, sem dúvida, os principais meios utilizadas
para identificação de situações que envolvem a prática de plágio em documentos de texto
disponíveis na Web.
Para tentar ofuscar a atitude fraudulenta (omitir o plágio) em um documento de texto de grande
dimensão, os praticantes de plágio, algumas vezes extraem curtas frases, sendo consequentemente
manipuladas e transformadas de voz ativa para passiva e vice-versa, bem como os léxicos
transformados em sinónimos e antónimos [ASA12, AIAA15, ASI+17]. Por outra, com pares de
texto1 de maior tamanho, o processo de alinhamento textual é fastidioso, que o torna menos
eficiente e até menos eficaz, sobretudo, se existir tentativa de ofuscação.
Este trabalho tinha como objetivo propor métodos de DAP menos complexos que tornam o processo
da Análise Detalhada mais eficiente e com melhor eficácia. Para tal, desenvolvemos
dois métodos de DAP primeiramente, um método de deteção plágio que utiliza uma abordagem
de segmentação recursiva do documento fonte em três blocos, afim de identificar pequenos e
grandes segmentos plagiados com paráfrases com eficácia e alto nível de eficiência temporal.
O segundo método proposto é o de Pesquisa de Plágio por Scanning Vetorial. Este método utiliza
word embeeding (word2vec) sem recurso aos cálculos matriciais, e é capaz de detetar quer
pequenos segmentos plagiados, quer segmentos grandes, mesmo com alto nível de ofuscação
de forma eficiente e com alto nível de eficácia.
Os resultados que apresentados no Capítulo 4 demonstram a eficácia e a eficiência dos métodos
propostos nesta dissertação.
The existence of people without honesty, in the academy, in the industry, in the press that appropriates the intellectual property of others, led some organizations to produce norms to combat plagiarism and to adopt technological means to confront and to prevent the propagation of this evil. Plagiarism Automatic Detectiors (PAD) systems are undoubtedly the main means used to identify situations involving the practice of plagiarism in text documents available in Web. To attempt to obfuscate the fraudulent attitude (omitting plagiarism) in a large text document, plagiarists sometimes extract short phrases and are consequently manipulated and transformed from active to passive and vice versa, as well as lexicons transformed into synonyms and antonyms [ASA12, AIAA15, ASI+17]. On the other, with pairs of text 2 Of larger size, the process of text alignment is tedious, which makes it less efficient and even less effective, especially if there is an attempt to obfuscate. This work aimed to propose less complex PAD methods that make the Detailed Analysis process more efficient and with better efficiency. For this, we developed two methods of PAD first, a plagiarism detection method that uses a recursive segmentation approach of the source document in three blocks, in order to identify small and large segments plagiarized with efficacious paraphrases and high level of temporal efficiency. The second proposed method is the Plagiarism Research by Vector Scanning). This method uses word embeedings (word2vec) without recourse to matrix calculations, and is capable of detecting either small plagiarized segments or large segments, even with high level of obfuscation efficiently and with high level of efficiency. The results presented in Chapter 4 demonstrate the efficacy and efficiency of the methods proposed in this dissertation.
The existence of people without honesty, in the academy, in the industry, in the press that appropriates the intellectual property of others, led some organizations to produce norms to combat plagiarism and to adopt technological means to confront and to prevent the propagation of this evil. Plagiarism Automatic Detectiors (PAD) systems are undoubtedly the main means used to identify situations involving the practice of plagiarism in text documents available in Web. To attempt to obfuscate the fraudulent attitude (omitting plagiarism) in a large text document, plagiarists sometimes extract short phrases and are consequently manipulated and transformed from active to passive and vice versa, as well as lexicons transformed into synonyms and antonyms [ASA12, AIAA15, ASI+17]. On the other, with pairs of text 2 Of larger size, the process of text alignment is tedious, which makes it less efficient and even less effective, especially if there is an attempt to obfuscate. This work aimed to propose less complex PAD methods that make the Detailed Analysis process more efficient and with better efficiency. For this, we developed two methods of PAD first, a plagiarism detection method that uses a recursive segmentation approach of the source document in three blocks, in order to identify small and large segments plagiarized with efficacious paraphrases and high level of temporal efficiency. The second proposed method is the Plagiarism Research by Vector Scanning). This method uses word embeedings (word2vec) without recourse to matrix calculations, and is capable of detecting either small plagiarized segments or large segments, even with high level of obfuscation efficiently and with high level of efficiency. The results presented in Chapter 4 demonstrate the efficacy and efficiency of the methods proposed in this dissertation.
Description
Keywords
Análise-Detalhada Deteção Automática de Plágio Extrínseco Plágio-Word2vec Recuperação de Informação Similaridade Documental
