An introduction to the science of the singing voice

Helena Daffern and Jude Brereton 1

1. Introduction 

Below is a brief summary of the paper presented on Thursday 9th July 2009 with a list of references for further reading. The session provided an introduction to the three main aspects of voice science – physiology; acoustics; perception – and briefly considered the importance of scientific understanding in the study of musicological evidence. The partnership In2Voice, run by Jude Brereton and Helena Daffern, provides lectures, workshops and private tuition in musical science and vocal acoustics, covering topics from the basic concepts of sound and hearing to the latest research areas being explored in musical acoustics, voice analysis and music technology.

2. Understanding

Although the intricate workings of the complex musculature system involved in the singing process are still not fully understood, comprehension has advanced greatly in the last fifty years with the advancement of a number of technologies such as video stroboscopy, recording technology and voice synthesis. Prior to the invention of the video imaging devices and electrolaryngograph systems that are available today which allow live, real-time assessment of the vocal apparatus, much of the understanding of the voice was based on speculation and informed guess-work. Even as late as the mid-19th century basic misunderstandings about the vocal apparatus and its workings were still prevalent; Mancini’s 1777 account of the larynx acting like a flute with air crossing the hole of the windpipe creating the sound was still popular even though there were also theories, based on results of dissection, suggesting the workings of the voice as a reed-like mechanism. It was the development of the laryngoscope (a system of mirrors making it possible to view the working larynx for the first time) by Manuel Garcia in 1854 that launched a new scientifically assured method for exploring the workings of the vocal apparatus.

3. Physiology

The vocal system can be analysed in three parts, the Power Source (lungs), the Sound Source (Vocal folds) and the Resonator (or Sound Modifiers i.e. lips, tongue, jaw, mouth, vocal tract)


The lungs are connected to the diaphragm and the intercostal muscles of the rib cage and behave like a set of bellows: as the diaphragm (a flat, dome-shaped muscle cross-sectioning the body at the bottom of the rib cage) contracts, the rib cage expands and the abdominal organs move downwards, which in turn means that the lungs expand, creating negative pressure in the thorax causing air to move into the lungs. As the diaphragm rises and the rib cage returns to its pre-expanded state, air leaves the lungs via the wind pipe. If the glottis is closed, or partially closed when air leaves the lungs phonation occurs and a pitched sound is produced.


The vocal folds’ primary function is to act as a valve to protect the lungs from solids such as food stuffs and liquids. The positioning of the larynx and function of the vocal folds have evolved and are now also well adapted for use in communication through speech. Situated within the larynx, a highly complex system of musculature suspended in the wind pipe, the vocal folds vibrate when air is expelled from the lungs and passes through the glottis, the space between the vocal folds. The vibrating motion of the vocal folds results in a complex periodic waveform. This waveform consists of a fundamental frequency and its relative partials (overtones) in the harmonic series. The creation of a pitched sound through the vibration of the vocal folds is known as phonation. The fundamental frequency (and associated heard pitch) produced by the vocal folds corresponds to the number of times they vibrate every second. So, for example, a sung fundamental frequency of 440Hz (A4 at modern pitch) arises from 440 vocal fold vibration cycles per second.

The phonation frequency is controlled by the complex musculature of the laryngeal mechanism with prominent control from the cricoid and thryroid cartilages, which contribute to the tilting of the larynx, via a hinged mechanism, in order to stretch the vocal folds. The folds stretch and lengthen to raise the fundamental frequency of a sung note through the contraction of the cricothyroid muscle, which tilts the thyroid forwards and tenses and elongates the vocal folds.

The complex musculature system of the larynx is explained clearly in Atkinson and MacHanwell (see resources lists below)

Sound Modifiers: RESONATOR

Garcia was the first to consider and prove that the area above the vocal folds (the sound modifiers); the vocal tract, tongue, mouth, jaw, teeth, lips and nasal passages act as a resonator. As mentioned above, the speed at which the vocal folds vibrate controls the heard pitch of the sound being sung (i.e. the phonation frequency). However, it is the manipulation of the sound modifiers which changes the quality of the sound, this includes the perception of different timbres, vowel sounds and perceived loudness.

4. Perception – Formants

The position and shape of the sound modifiers changes the relative energy of the harmonics (overtones) in the sound spectrum, producing broad peaks in the spectral envelope, known as formants. It is the patterning of these peaks (formants) that the ear interprets as vowels. As figure 1 shows, the first two peaks (formants 1 and 2) are close together whilst the third formant is further separated: the brain recognises this pattern of peaks as an ‘ah’ vowel. When the second formant is raised towards the third formant by movement of the middle of the tongue upwards towards the hard palate, the ear recognises an ‘ee’ vowel.

Figure 1: Images showing the impact of the sound modifiers on the voice source sound spectrum to create formant peaks when producing an ‘ah’ vowel.

(reproduced with permission from Howard and Murphy: 2008)

5. Perception – projection

Several techniques are employed by singers, particularly opera singers, to increase their perceived loudness and allow them to be heard over large orchestras in concert halls and opera houses.

The most commonly known of these techniques is the ‘singer’s formant’ or ‘singer’s formant cluster’ which applies most significantly to the operatic tenor voice. The singer’s formant cluster manifests as a peak in the spectral envelope between around 2kHz–4kHz. It is usually achieved by manipulation of the sound modifiers (e.g. an increase of pharyngeal space) which results in the bunching together of formants 4,5 and 6 and is particularly related to the technique of lowering the larynx. Although the spectral envelopes of each orchestral instrument differ (giving each instrument its identifiable timbre), an idealised spectral envelope of the acoustic output from a whole orchestra, illustrated below (figure 2), produces a gradual decay in amplitude of harmonics, meaning relatively low amplitude in the frequency region of the singer’s formant cluster. The 2kHz–4kHz frequency region of the singer’s formant cluster is also the most sensitive frequency the human hearing response, adding to the auditory significance of the technique.

Figure 2: The impact of the singer’s formant cluster on an idealized spectral envelope of an orchestra and singer.

(reproduced with permission from Howard & Murphy 2008)

The acoustic possibilities of the singer’s formant cluster become less useful the higher the voice-type of the singer. Whilst low female voices may make use of the technique, particularly soprano voices make use of different techniques, both physiologically and acoustically.

The singer’s formant cluster increases the relative amplitude of harmonics in the spectrum between 2–4kHz. As a complex tone includes harmonics at integer multiples of the fundamental frequency (F0), (i.e. 1st harmonic (H1) = F0*1, 2nd harmonic (H2) = F0*2, 3rd harmonic (H3) = F0*3 etc.) the higher the fundamental frequency, the fewer harmonics will be present in the singer’s formant area of the spectrum. For example, Table 1 shows the first twenty harmonics of the note A3 (220Hz) sung by an alto or tenor voice and the first twenty harmonics of the note A5 (880Hz) which might be sung by a soprano:

Table 1: Showing the values of the first 20 harmonics in a sung tone of A3 (220Hz) and A5 (880Hz)

As the table shows,  a soprano singing A5 (880Hz) will produce fewer harmonics within the singer’s formant cluster range (2kHz – 4kHz) . Therefore, the manipulation of formant frequencies in the singer’s formant region would not be of use as there is much less sound energy in that region of the spectrum to enhance. Instead, it is thought that sopranos employ a method known as formant tuning, which makes use of an obsolete first formant frequency in the natural speech position. As an example, consider an ‘ah’ vowel. In the speech of an adult female the expected frequencies of the first 3 formants for this vowel would be 850Hz, 1200Hz and 2800Hz. However if singing an A5 the fundamental frequency of this note being 880Hz  the first formant is not being made use of as there is no sound energy in this frequency range to enhance (amplify). Also the spectral envelope no longer has distinguishable peaks putting into question vowel recognition when higher fundamental frequencies are sung (see idealised illustration in Figure 3)

Figure 3: Showing the spacing of the harmonics at different fundamental frequencies changing the impact of the formant frequencies on the spectral envelope

(Reproduced with permission from Howard & Murphy 2008)

The soprano singer therefore tunes the (now redundant) first formant (again by manipulating the placement of the sound modifiers) up to (or near to) the fundamental frequency, greatly increasing the relative intensity of the fundamental and increasing the perceived loudness of the sung tone.

The brief explanation of some of the key concepts in voice science given above is intended as an introduction to the research discipline of voice science. Voice science is a relatively new and very much developing interdisciplinary field of research which needs to be embraced by any researcher considering the voice from any perspective, be it musicological, historical or performance based. Below is a list of resources that explain in more detail some of the topics of voice science introduced in the summary above. 

Useful Resources 


Texts of the science of singing:

Howard, D.M. (2008) ‘Acoustics of the castrato voice’, in Moreschi and the voice of the castrato, Clapton, N., London: Haus Books, 227–58

Howard, D.M., and Murphy, D.T. (2008) Voice science, acoustics and recording, San Diego: Plural Press

Sundberg, J. (1987) The science of the singing voice, Illinois: Dekalb Titze, I. (1994) Principles (sic) of voice production, Englewood: Prentice Hall.

‘All-rounder’ texts on musical acoustics: 

Hall, D.E. (1980) Music Acoustics: An Introduction, Wadsworth Publishing Co.

Howard, D.M., and Angus, J.A.S. (2000) Acoustics and psychoacoustics, 2nd Ed., Oxford: Focal Press

Rossing, T.D. (1982) The Science of Sound, Addison-Wesley Publishing Co.

Sundberg, J. (1989) The Science of Musical Sounds, San Diego: Academic Press

Anatomy and physiology of voice: 

Atkinson, M. and McHanwell, S. (2002) Basic Medical Science for Speech, Hearing and Language Students, Whurr Publishers

Texts on Hearing Music:

Deutsch, D. (1982) Ed. The psychology of music, Academic press

Moore, B.C.J., (1989) Introduction to the psychology of hearing, Academic Press

Pickles, J.O. (1988) Introduction to the physiology of hearing, Academic Press

Yost, W.A., and Neilson, D.W. (1984) Fundamentals of hearing: An introduction, Holt Rinehart


British Voice Association: a group which brings together the different professionals who work with the voice – their website includes some down-loadable resources and plenty of useful information.

Voice and Speech trainers Association, Inc, USA: links to lots of useful website resources including sections on Anatomy and Vocal Health

Thanks to David Howard for providing the figures for this summary

  1. Audio Lab, Department of Electronics, University of York:;