Centre Marc Bloch: Mohamed Khemakhem

Mohamed Khemakhem | Associate Postgraduate

Former Member

Dynamiken und Erfahrungen der Globalisierung

Centre Marc Bloch, Friedrichstraße 191, D-10117 Berlin

Email: mohamed.khemakhem ( at ) inria.fr

Home Institution : Paris Diderot University - Paris 7 | Position : PhD Student | Disciplines : Computer science |

CV File

Title of thesis

Standard-based lexical models for automatically structured dictionaries

Summary of thesis

This project is motivated by the determinant role of lexical resources in various disciplines dealing with natural languages. In particular, the digitization of lexical resources in the past couple of decades has raised the issue of structuring their content to be decoded and exploited.

A substantial work has been already carried out by standardization organisms to find dedicated models and practices for representing these key language resources. The leading standards in this direction are Text Encoding Initiative (TEI) and Lexical Markup Framework (LMF). While TEI offers a well established framework for structuring a wide range of texts and dedicates a whole chapter for lexical resources, LMF has a focused scope for modelling lexical resources and offers a meta-model for presenting different linguistic levels. Given the similarities and the specificities in their approaches and the encoding alternatives they propose, I support the hypothesis of the mutual improvement that TEI and LMF present for each other.

Moreover, there is still a serious need for techniques to apply these standards for structuring existing digitized lexical resources. This research axis requires more efforts to be invested to overcome the complex challenges that it presents for the related language engineering tasks.

The goal of this project is to advance research in the field of standardization and structuring of lexical resources. I plan to propose a TEI-LMF customization by studying the mapping between the two standards. In addition, I will investigate the use of machine learning techniques for the purpose of detecting automatically structures in varied dictionary samples and generating TEI-LMF customized resources.

Supervisor

Dr. Laurent Romary

Automatically Structuring of Structural Information in Digitised Dictionaries

My thesis topic is the automatic extraction of lexical information from old digitized dictionaries and the generation of structured and standardized electronic versions of them. This research project is part of efforts to enhance the value of the human heritage and allow for the advanced exploitation of a wide range of resources with a lexical or encyclopedic structure. My research objective is to answer questions about the genericity and scalability of the automatic analysis and standardization approach to such resources. The techniques developed will be integrated into European digital humanities platforms aimed at supporting researchers in this emerging field.

Publications

An exhaustive list of my scientific publications is available under my HAL profile