Mohamed Khemakhem | Assoziierter Doktorand

Dynamiken und Erfahrungen der Globalisierung
Centre Marc Bloch, Friedrichstraße 191, D-10117 Berlin
Email: mohamed.khemakhem  ( at )  inria.fr Tel: +49(0) 30 / 20 93 70700 or 70707

Home Institution : Paris Diderot University - Paris 7 | Position : PhD Student | Disciplines : Computer science |

CV File
Title of thesis

Standard-based lexical models for automatically structured dictionaries

Summary of thesis

This project is motivated by the determinant role of lexical resources in various disciplines dealing with natural languages. In particular, the digitization of lexical resources in the past couple of decades has raised the issue of structuring their content to be decoded and exploited.

 

A substantial work has been already carried out by standardization organisms to find dedicated models and practices for representing these key language resources. The leading standards in this direction are Text Encoding Initiative (TEI) and Lexical Markup Framework (LMF). While TEI offers a well established framework for structuring a wide range of texts and dedicates a whole chapter for lexical resources, LMF has a focused scope for modelling lexical resources and offers a meta-model for presenting different linguistic levels. Given the similarities and the specificities in their approaches and the encoding alternatives they propose, I support the hypothesis of the mutual improvement that TEI and LMF present for each other.

Moreover, there is still a serious need for techniques to apply these standards for structuring existing digitized lexical resources. This research axis requires more efforts to be invested to overcome the complex challenges that it presents for the related language engineering tasks.


The goal of this project is to advance research in the field of standardization and structuring of lexical resources. I plan to propose a TEI-LMF customization by studying the mapping between the two standards. In addition, I will investigate the use of machine learning techniques for the purpose of detecting automatically structures in varied dictionary samples and generating TEI-LMF customized resources.  

Supervisor

Dr. Laurent Romary

Extraction automatique d’informations structurelles et lexicales à partir de dictionnaires classiques numérisés


Affiliated Institute

© Centre Marc Bloch 2018 - Deutsch-Französisches Forschungszentrum für Sozialwissenschaften, Berlin

© Centre Marc Bloch 2018 - Deutsch-Französisches Forschungszentrum für Sozialwissenschaften, Berlin