PhraseNET: Detección y extracción automatizada de unidades fraseológicas

Autor:

Jose Luiz de Lucca

Director:

María Lluïsa Carrió

Editorial/Institución editora:

Universitat Politécnica de València

Ciudad:

Valencia

Año:

2011

Tipo de publicación:

Tesis

Tipo de tesis:

Tesis doctorales

Materias de especialidad:

Lingüística

Lingüística aplicada

Lingüística computacional

Lingüística de corpus

Lingüística textual

Descripción:

This article describes a new method to identify and extract phraseological units from textual corpora. There are different methods of classification of phraseological units, but we have to highlight the ones proposed by Corpas Pastor (1996). This author, starting from a wide conception of phraseology, classifies Spanish's phraseologisms in three different categories: collocations, locutions and phraseological enunciated units (fixed forms and routine formulas) from which we have to choose locutions and phraseological enunciated for our research.

The extraction is done sentence-by-sentence and the proposed architecture is based on statistics and relational algebra. The main characteristic of this architecture is the scarce use of linguistic resources, which are replaced by algorithms of searches and statistical methods. Four experiments are presented in this paper to show the extraction of phraseological units from textual corpora. This application demonstrates the useful architecture of the software design, comparing the results of this system with manual extraction. The advantages of this system are the huge database that can be analyzed.

Página de Internet:

http://www.infoling.org/repository/PhDdiss-Infoling-8-10-2015.pdf

Correo electrónico:

jldlme@hotmail.com

16/06/2016 Publicaciones

Compartir en: