Mohamed Khemakhem | Assoziierter Doktorand
Paris Diderot University - Paris 7
Title of thesis
Standard-based lexical models for automatically structured dictionaries
Summary of thesis
This project is motivated by the determinant role of lexical resources in various disciplines dealing with natural languages. In particular, the digitization of lexical resources in the past couple of decades has raised the issue of structuring their content to be decoded and exploited.
A substantial work has been already carried out by standardization organisms to find dedicated models and practices for representing these key language resources. The leading standards in this direction are Text Encoding Initiative (TEI) and Lexical Markup Framework (LMF). While TEI offers a well established framework for structuring a wide range of texts and dedicates a whole chapter for lexical resources, LMF has a focused scope for modelling lexical resources and offers a meta-model for presenting different linguistic levels. Given the similarities and the specificities in their approaches and the encoding alternatives they propose, I support the hypothesis of the mutual improvement that TEI and LMF present for each other.
Moreover, there is still a serious need for techniques to apply these standards for structuring existing digitized lexical resources. This research axis requires more efforts to be invested to overcome the complex challenges that it presents for the related language engineering tasks.
The goal of this project is to advance research in the field of standardization and structuring of lexical resources. I plan to propose a TEI-LMF customization by studying the mapping between the two standards. In addition, I will investigate the use of machine learning techniques for the purpose of detecting automatically structures in varied dictionary samples and generating TEI-LMF customized resources.
Dr. Laurent Romary