Dr Ahmed Omer
Postdoctoral Research Associate

Profile

Biography

I am an AI and Natural Language Processing (NLP) specialist with strong academic and industry experience, with a particular focus on stylometry, forensic linguistics, and the analysis of spoken corpora. I hold a PhD in Computer Science from the University of Wolverhampton, where my research applied machine-learning and computational linguistic methods to the study of written and spoken language.

My academic work has focused on the computational analysis of authorship, writing style, and linguistic variation. At the University of Wolverhampton, I conducted stylometric analyses on large text datasets for authorship attribution and forensic comparison, and applied classification and clustering techniques to spoken corpora for the automatic categorisation of audio data. I also worked extensively with speech data, developing systems to analyse phonetic and linguistic variation in recorded speech.

In parallel with my academic research, I have worked in industry as an AI NLP expert at XTM International, a translation management software company. There, I applied NLP techniques to multilingual text analysis, including named entity extraction and inappropriate language detection, gaining experience in deploying research-led language technologies in real-world environments.

Alongside these roles, I have been active as a freelance NLP engineer, working on projects that integrate speech technologies with applied linguistic research. These include the development of a technology-assisted consecutive interpreting tool providing real-time speech-to-text translation, created in collaboration with researchers at the University of Málaga.

My current work at York is for a sub-project of the Common European Language Indication and Analysis (CELIA) project, led by Dr George Brown (Lancaster) and Prof Sam Hellmuth (York). The work in York focuses on development of a pipeline for processing of Arabic dialectal speech corpora, including the Intonational Variation in Arabic corpus and the Dialectal Variation in the Arabic Levant corpus.

Research

Overview

My research focuses on stylometry, forensic linguistics, and spoken corpus analysis, employing computational and machine-learning approaches to investigate authorship, linguistic variation, and speech patterns, with a particular emphasis on Arabic data.

A core strand of my work is computational stylometry and forensic authorship analysis. I have investigated writing style and authorship attribution using statistical and machine-learning methods, including the use of Arabic-specific features such as metrical systems in poetry and stylistic markers in prose. This work contributes to forensic and comparative linguistic analysis by identifying distinctive linguistic patterns across authors and texts.

I also conduct research on spoken corpora and speech data, focusing on the automatic analysis and classification of audio recordings. This includes work on phonetic variation and dialect clustering, as well as the use of speech-to-text technologies for linguistic analysis and interpreter support. My research explores how spoken corpora can be systematically collected, annotated, and analysed for both academic and applied purposes.

My work on Arabic dialectology examines phonetic and linguistic variation across regional dialects, combining corpus-based methods with machine-learning techniques. This line of research is extended in the current CELIA-affiliated project. In addition to academic research, I am interested in the application of forensic and corpus-based methods in professional and industrial contexts, particularly where robust linguistic analysis is required for multilingual or speech-based data.

Publications

Selected publications

Gaber, M., Pastor, G. C., & Omer, A. (2020). Speech-to-Text Technology as a Documentation Tool for Interpreters: A New Approach to Compiling an Ad Hoc Corpus and Extracting Terminology from Video-Recorded Speeches. TRANS: Revista de Traductología, 24, 263–281.
Omer, A. I. A., & Oakes, M. P. (2017). Arud, the Metrical System of Arabic Poetry, as a Feature Set for Authorship Attribution. Proceedings of the 2017 IEEE/ACS 14th International Conference on Computer Systems and Applications (AICCSA), 431–436.
Omer, A. I., Zampieri, M., & Oakes, M. (2018). Phonetic Differences for Dialect Clustering. Proceedings of the 9th International Conference on Information and Communication Systems (ICICS), 145–150.
Omer, A., & Oakes, M. (2019). Writing Styles of Salwa and Al-Qarni. Proceedings of the 3rd Workshop on Arabic Corpus Linguistics, 16–21.
Omer, A., & Oakes, M. (2019). Computer Stylometric Comparison of Writings by Qassim Amin and Mohammed Abdu on Women’s Rights. Proceedings of the 3rd Workshop on Arabic Corpus Linguistics.

Dr Ahmed Omer Postdoctoral Research Associate