Accessibility statement

Completed projects

Accent and Identity on the Scottish/English Border (AISEB)

The AISEB project examined patterns of phonological variation in the speech of 160 inhabitants of four towns lying close to the Scottish/English border (Berwick upon Tweed, Carlisle, Eyemouth, and Gretna), and related them to attitudes and orientations the participants expressed towards local and national identity issues. The variables of particular interest were (r) (presence/absence of post-vocalic rhoticity and variants of overtly-realised /r/), Voice Onset Time in /p t k b d g/, and the qualities of the FACE, GOAT and NURSE vowels. We also investigated interactions between these variables, for example that of the merging BIRD~BERTH~BURT > NURSE lexical sets with derhoticisation in the Scottish (Eyemouth and Gretna) varieties. To complement the production and attitudinal data, the researchers also collected responses gathered during perception experiments run on a subset of 40 participants, to test the sensitivity of local people to subtle phonetic differences that index membership of various locally-relevant social groups. This innovative tripartite approach to the investigation of the inter-relationships between language and identities in a border region allowed us to develop a richly detailed picture of how the adoption or rejection of sound changes is mediated by speakers' differing evaluations of the phonetic variants involved. The results of AISEB have also cast useful light on the role played by individual and group agency in the diffusion of linguistic change both synchronically and in the historical context.

Acquisition of Korean Wh-Words by Heritage Speakers and Second Language Learners

  • Co-I: Heather Marsden
  • Dec 2013 – Nov 2014
  • Academy of Korean Studies Grant: AKS-2013-R88

This research focuses on a complex and unusual characteristic of Korean, namely the dual function of Korean wh-words. Korean wh-words function both as wh-question words (e.g., 'who') and as indefinite pronouns (e.g., 'anyone'/'someone'). We investigate how second language knowledge of a range of properties associated with the dual function of Korean wh-words is acquired, by means of an experimental study that compares Korean language learners from different first-language backgrounds (English, Japanese and Chinese) with Korean heritage speakers whose dominant languages are English, Japanese and Chinese. The findings will have implications for the theory of second language acquisition and also for Korean language education (a fast-growing area, world-wide) and bilingual education. This is a joint project with Kook-Hee Gil (Sheffield).

Canonical Typology in Action

A big challenge when studying large sets of diverse languages is the issue of data comparability. The CTIA project aims to solves this problem by applying the Canonical Typology method, which treats linguistic features as multidimensional, to identify the limits of linguistic variation in grammatical agreement. We will enhance a dataset for grammatical agreement in 15 languages, using data points from the Surrey Database of Agreement (, adding new information for dimensions of variation with the aim of identifying how much of the possible space for variation is actually populated by empirical data.

Combining Gender and Classifiers in Natural Language

An AHRC project on gender and classifiers, involving collaboration with the University of Surrey. Genders and classifiers are two different types of system which do a similar thing, categorize nouns, and it is reasonable to assume that they would be mutually exclusive. If a language has a classifier system, we don't normally expect it to have a gender system, and similarly if it has a gender system, we don't normally expect it to have a classifier system. However, there are a few which have both ('dual categorization'). This project investigates what happens when languages have such dual systems and compares this with those which have only one such system, or none.

Diagnostic Instruments for Autism in Deaf Children (DIADs)

The project aims to translate a diagnostic instrument for autism into British Sign Language (BSL). This involves a team of clinicians (professionals including clinical psychologists, specialists on autism, speech and language therapists, some Deaf, some hearing). The team has translated the questions from English into BSL, then done a blind back-translation (i.e. they are translated back into English by people who haven't seen the original English), then discussed whether the BSL is an accurate version of the English. Richard Ogden has been providing them with his linguistic knowledge and helping them to resolve some of the problems of translation so they can achieve a good BSL version of the test.

Endangered Complexity

A joint AHRC/ESRC project on the Oto-Manguean languages of Mexico involving collaboration with the University of Surrey. There are about 200 of these languages, and many of them are severely threatened or endangered. The Oto-Manguean languages have complex inflectional morphology (system of encoding grammatical information on words). They combine suffixes, prefixes, complex tonal patterns and stem alternations into many different inflectional classes. Understanding how the Oto-Manguean languages work provides important evidence as to the possible limits of inflectional complexity.

From Competing Theories to Fieldwork: The Challenge of an Extreme Agreement System

An AHRC-funded project on the Archi agreement system, involving collaboration with Essex, Harvard, and Surrey. The Nakh-Daghestanian language Archi provides a rich source of data on the interaction between morphology and syntax, particularly in relation to the role of both components in agreement. A wide variety of domains and constructions in Archi manifest agreement. This makes Archi particularly valuable language for investigating the mechanisms and constraints on this important part of the grammatical system.

Intonational Variation in Arabic

The Intonational Variation in Arabic project is adapting methodology used to document intonational variation in English, to generate a public-access corpus of Arabic speech, using a parallel set of sentences, stories and conversations, recorded with 18-24 year olds in five regions of the Arab world. Additional data from older speakers (50+) and in nearby cities will reveal changes in progress and local variation. Detailed prosodic transcription will yield intonational descriptions of individual dialects and cross-dialectal comparisons, for use by linguists, learners and teachers of Arabic and other users.

Investigating the 'Supported Conversation' Intervention Technique: A Study of Interactions between Health Care Professionals and People with Aphasia

  • Co-I: Traci Walker
  • Jan 2013 - Jan 2014
  • Funding by C2D2

This research is a pilot study of the linguistic structures used in communication between people with aphasia after stroke and speech and language therapists. The study will focus on the use of an intervention technique known as Supported Conversation for Persons with Aphasia (SCA). Using the methodology of Conversation Analysis, the research addresses a gap in knowledge about which linguistic and sequential structures are used in SCA, and which (if any) are avoided. This information can then be compared to what is known about the linguistic and sequential structures generally deployed in typical, non-impaired conversation in order to 1) translate the findings to clinicians with different specialisms, to maximise the person with aphasia's right to participate in decision-making about their care, and 2) develop linguistically-informed methods of evaluating the efficacy of the SCA intervention. This is a joint project with Ian Watt (Health Sciences, University of York).


Meeting Darwin's last challenge: toward a global tree of human languages and genes

LanGeLin (Language and Gene Lineages) is the acronym for the ERC-funded research project 'Meeting Darwin's last challenge: toward a global tree of human languages and genes' coordinated by Professor Giuseppe Longobardi, PI, running from December 2012 to November 2018. The project addresses one question, formulated by Charles Darwin in The Origin of Species, namely whether the cultural transmission and differentiation of languages over the period of human history matches the biological transmission and differentiation of the genetic characters which define the populations of the world. The project is by definition interdisciplinary, and the work of the linguistic researchers in York is complemented by the participation of population geneticists and molecular anthropologists based at the University of Ferrara and the University of Bologna-Alma Mater Studiorum.

Marie Curie FP7 Network BBfor2

The FP7 Marie Curie Initial Training Network BBfor2 (Bayesian Biometrics for Forensics) consists of nine European research institutes and three associated partners. The Network provides regular workshops and summer schools, so that the PhD students and senior researchers can exchange research experience, insights and ideas. The main areas of research are speaker recognition (comparison), face recognition, and fingerprint recognition. These areas are studied both individually and in combination. The challenge of applying biometric techniques in a forensic context is to be able to deal with the uncontrolled quality of the evidence, and to provide calibrated likelihood scores.

Two projects are based at York, and York staff also co-supervise projects based elsewhere.

Multimodal speech and speaker recognition

  • Research Fellow: Natalie Fecher
  • Supervisors: Dominic Watt and David van Leeuwen (TNO & Radboud University, Nijmegen)

With various forms of biometric technologies becoming available, there is a growing need for scientists who are able to assess the merits of these technologies when applied to forensics. This project investigates multimodal speech and speaker recognition from a forensic perspective through assessments of the performance of human subjects and automated speech and speaker recognition systems where the quality and quantity of the information available in the audiovisual signal is manipulated experimentally. For example, the talker's face may be partially or completely obscured by clothing such as face-concealing garments or safety equipment worn for occupational, recreational or religious reasons, or for the commission of robberies, assaults or terrorist activities.

Calculation of likelihood ratios using phonetic and linguistic features

  • Research Fellow: Erica Gold
  • Supervisors: Peter French and Didier Meuwly (Netherlands Forensic Institute)

The most prevalent strain of forensic speaker comparison work across European experts and institutions involves auditory-phonetic and acoustic examinations of speech samples on a wide range of parameters including voice quality, intonation, rhythm and consonant and vowel realisations. Those working within this tradition express their conclusions within a variety of different frameworks. Most do not currently utilise a likelihood ratio (logical inference) framework, which is acknowledged to be the goal for forensic science generally.

There are practical difficulties in interpreting the results of a phonetic acoustic comparison of speech samples in a likelihood ratio framework. These include the use of speech data from the appropriate sections of the background population to estimate empirically the evidential value of the results and to combine them in an overall statistical assessment. The aim of the project is to find solutions to these problems, thus paving the way for forensic speaker comparison work to become conceptually aligned with more developed areas of forensic science such as DNA analysis.

Meaning in Language Learning

The Meaning in Language Learning network is a forum for multi-disciplinary dialogue among language learning experts. language teaching practitioners and other stakeholders in language learning who may not typically come into contact with each other. The theme of the dialogue centres on the most fundamental aspect of cross-cultural communication: transmission of meaning from one language to another. The focus on meaning comes out of state-of-the-art theoretical linguistic research into the role that grammar plays in the multiple ways that different languages express meaning.

Two key goals of the network are (1) to develop collaborative research projects that differ from existing research by incorporating insights from both theoretical linguistics and classroom practice; and (2) to share insights about language learning from linguistic research and from classroom practice, through a series of workshops and events.

Modelling Features for Forensic Speaker Comparison‬

In forensic speaker comparison (FSC), experts compare speech patterns in criminal and suspect audio recordings to assess the evidence under competing prosecution and defence hypotheses, i.e. the criminal voice is that of the suspect versus that of someone else. There is a move toward expressing expert evidence in the form of Bayesian likelihood ratios. Speech presents considerable difficulties for this approach, as different types of data are analysed in forensic casework: linguistic data can be normally or non-normally distributed; variables can be continuous or discrete; and complex correlations exist between variables. It is imperative to develop statistical models that cater for these difficulties. This project brings together leading forensic statisticians with forensic phoneticians. We will explore typical forensic phonetic data to assess the value of complex datasets for statistical modelling. We aim to develop new statistical models that incorporate a broader array of phonetic variables into FSC analyses and thus quantify forensic phonetic evidence more reliably.

Morphological Complexity: Typology as a Tool for Delineating Cognitive Organization

This ERC-funded project is a comprehensive typological investigation of morphological complexity and involves collaboration with colleagues at Surrey and Brighton. Work at York focuses on two research strands. The first strand, Discovering Complexity, with Roger Evans (Brighton), concentrates on the machine learning of inflectional classes, where we investigate how much can be learned without building language-specific knowledge into the system. The second strand, with colleagues at Surrey, uses the Network Morphology theoretical framework to investigate defaults and irregularity in morphological systems.

Network for the Interdisciplinary Study of Second Language Learning (NISSLL)

NISSLL aims to serve as a forum for interaction between language teachers and language learning researchers of all disciplines. Traditionally, opportunities for interaction between language teaching practitioners and second language acquisition researchers have been rare, despite the shared focus of their activities. This network aims to overcome this communication gap through a series of workshops, and through the development of joint research and engagement activities.

Perfecting the Babble App: Turning a Voicing Detection App into a Full Babble Detector

An app has been developed which produces shapes on the screen of an ipad when an infant makes a vocalisation which is voiced. The colour and movement of the shapes is random but the size of the shape changes depending on the loudness of the utterance. Images only appear during voicing (they disappear as soon as the voicing ends). The interface is currently being improved, and the aim is to make this available on the AppStore. The aim of this project is to finish developing the algorithm for detecting consonants and to combine it with the app, to create an app that responds only to vocalizations which contain consonants. The aim for the final app is to have clinical application as well as research uses for investigating language development in typical and atypical populations, whose babble we hope to encourage using this app. This will hopefully lead to better language outcomes for these populations (i.e., deaf infants).

Phonological Database of Scottish English (PDSE)

  • Principal Investigator: Dom Watt
  • Fieldworkers: Mhairi Urquhart, Beth Cole, Jillian Oddie (Aberdeen), Paula Sochanik (York)

This project, funded by an IAFPA research grant (€2,200), ran between 2006-2008 and produced a set of recordings of 87 male and female speakers of Scottish English from Aberdeen, Dundee, Edinburgh and Glasgow. Each speaker read (twice) a 192-item wordlist made up of words containing consonants and vowels of particular interest. These were /r/, /l/, /hw/, /x/, /p t k/, vowels exemplifying the Scottish Vowel Length Rule, and the vowels of the FACE, GOAT and NURSE lexical sets (the last of which is in fact subsumes three separate phonemes for some Scottish English speakers, such that the vowels of bird, he(a)rd and curd are contrastive). Speakers were also asked to read two text passages: 'Comma Gets a Cure' (Honorof, McCullough & Somerville 2000) and 'The North Wind and the Sun'. Analyses of Voice Onset Time (VOT) in the voiceless and voiced oral stop series and of average fundamental frequency (f0) were presented at the 2008 IAFPA conference.

Daniel Ezra Johnson (York / US National Census Bureau) and Tom Fitz-Hugh (York) also assisted with the organisation and analysis of the data.

Pluralised Mass Nouns as a Window to Linguistic Variation

This project explores the properties of the mass/count distinction across languages focusing specifically on languages that display the peculiar property of "plural mass nouns" (There were waterS on the floor).  We employ experimental as well as theoretical techniques in order to establish the relevant syntactic and semantic properties of these languages and more generally understand the ways in which languages can vary in the ways they encode the distinction between objects and substances.

Schwa Project

In 2010, an annual forensic science research project was established to encourage collaborative work amongst current PhD students within the Forensic Speech Science Research Group. Each year, we aim to work on a different team project that is forensically relevant. Our goal is to foster new and relevant research in the field of Forensic Speech Science, while working as a cohesive group.

For 2010-2011 our topic was:

Establishing the inter- and intra- speaker variability of word-final schwa in varieties of English and German

This research will analyze formant frequency measurements for word-final schwa in spontaneous speech from a large number of adult males in varieties of both English and German.

Research goals

  • Establish population statistics for word-final schwa in varieties of English and German
  • Assess the inter- and intra-speaker stability of schwa as a speaker discriminant
  • Aid in current research on vowel normalization techniques by providing distributions for schwa formant frequency data
  • Compare results across dialects and languages in order to consider possible socio-phonetic differences in schwa


  • Louisa Stevens
  • Natalie Fecher
  • Erica Gold (coordinator)
  • Colleen Kavanagh
  • Christin Kirchhuebel
  • Richard Rhodes
  • Lisa Roberts
  • Sukanya Thaitechawat

Temporal Co-ordination in Conversation

  • PI: Richard Ogden
  • Sep 2012 - Sep 2014
  • British Academy Research Project

Successful turn-taking in real-time in conversation requires a great deal of co-ordination; when people's activities are co-ordinated, their sense of well-being and togetherness ('sociality') is enhanced. This project is part of an exploratory collaborative study with Sarah Hawkins and Ian Cross at the Centre for Music and Science at Cambridge. Our longer-term aim is to investigate parameters governing successful interaction in conversation and music-making through a major interdisciplinary project. We are looking at how adjacent pairs of turns (such as question + answer) in conversation are mutually timed in speech and gesture, with a parallel study of interactions in music.

The Effect of Textbook Instruction on Non-Native English Knowledge of Any

Most coursebooks for learners of English include 'rules' about the use of any and its compounds, along the lines of 'Use any in questions and in negated sentences'. This project aims to shed light on the development of implicit L2 knowledge, by finding out what learners know about uses of any that are not covered by textbook generalisations. We are collecting data with possible influences from Chinese and Arabic in mind, since the majority of our participants will be native Chinese or native Arabic speakers. This is a joint project with Melinda Whong (Leeds) and Kook-Hee Gil (Sheffield).

The Oxford Corpus of Old Japanese

The Oxford Corpus of Old Japanese (OCOJ) is a long-term research project which will develop a comprehensive annotated digital corpus of all extant texts, with an associated dictionary and translations, from the Old Japanese period. This is the earliest attested stage of Japanese, from the Asuka and Nara periods of Japanese history (7-8th centuries AD), and the formative literate period of Japan. These texts are therefore of paramount importance for the study and understanding of the origins and development of civilization in Japan, including language, writing, literature, religion, history, and culture. The corpus is designed to support research in any of these areas.

Translating the Strengths and Difficulties Questionnaire into British Sign Language

  • PI: Richard Ogden
  • Funding by the NHS to Lime Trees Deaf CAMHS, York

The Strengths and Difficulties Questionnaire (SDQ) is a brief behavioural screening questionnaire about 3-16 year olds. It is used to as part of their initial assessment in clinics and can influence how the assessment is carried out and which professionals are involved in that assessment. I have been working as a linguistic consultant on a project based at Lime Trees Child and Adolescent Mental Health Service in York to translate the English version of the SDQ into British Sign Language for use with Deaf children whose first language is BSL. This has involved making the questionnaire linguistically and culturally appropriate. The translation is now complete, and is being trialled.