Helping the young soprano bring emotional truth to seventeenth- and eighteenth-century recitative through the use of speech mode

Christopher Allan


Is there a synergy between contemporary voice science and early music? Can the developing soprano be brought to a more emotionally true performance of 18th-Century recitative by using some of the precept of voice science, especially the use of the speech (modal) mode of the voice defined by researchers such as Jo Estill?

In conservatories today much of the music that commencing undergraduate soprano will perform was composed in the 17th and 18th centuries, therefore it has great relevance to the vocal pedagogue and student alike. Much of the recitative found in the 18th century is within the range C4–E5. This falls directly in the speech mode range of the soprano voice. We also know that much of the dramatic and emotional content of a cantata or operatic work is contained in the recitative as the composer moves the singer towards the aria that follows.

The developing soprano will often have difficulty establishing control over that range of the voice, especially if she has used falsetto quality in the past and has not been able to access the lower fourth from C4–F4. This can result in a colourless reading of recitative.

This paper will describe a range of exercises developed to facilitate access of the full quality of the voice across the range, giving much needed vibrancy and emotional response to the performance.

1. Introduction

Singing teachers have long sought ways to introduce concepts to their students that are both effective in the manner of transmission and effective in the manner in which they engage the student’s ability to assimilate those concepts and make them part of their singing technique. Thus much time is spent by voice teachers speaking with each other about successful (and sometimes unsuccessful) ways that they have experienced in teaching such and such an idea, or asking each other what we might do to help a certain student find the solution to a particular vocal challenge. What I am attempting to do, in my current work, is to find an effective way of measuring some of these ideas so that I can establish with some sort of certainty that my methods are having a concrete effect on my students.

I was appointed to the University of Newcastle, Australia, in 1995 as a teacher of singing. I have taught at Newcastle ever since with around about 15 tertiary students per year as well as those who simply wish to learn to sing without completing a bachelor’s degree. I was teaching my students in a manner that was essentially the same as I was taught, whilst attempting to avoid what I saw as the negative aspects of the teachers that I had as a student. I had, at this stage in my life, a fairly strong notion of what was good and bad with regard to the singing voice, I knew which noises I liked and which I didn’t – and if I was at all honest, I would say that many sounds produced by non-classical singers were not to my liking. In 1996 I attended a week long course run through the then National Voice Centre at Sydney University given by the American pedagogue, Jo Estill. I came away a changed man: not only was I prepared to embrace many more types of vocal sound, I was also now interested in the field of voice science.

Perhaps Estill’s greatest contribution was to attempt to identify various vocal qualities and see if she could then find out what was happening anatomically. In the course of her research, she produced a number of videos in collaboration with doctors and other voice scientists which sought to identify those qualities and see, in real time, what was occurring at the vocal folds and in the structures above and around the larynx. Estill identified six vocal qualities, which were published in 1997 in two books. These publications (Compulsory Figures for Voice, Level One: Primer of Basic Figures) and Level Two (Six Basic Voice Qualities) outline her exercises for gaining control over the various parts of the vocal anatomy and differentiates the attributes of the six qualities (Estill, 1997b). One of the qualities she described was that of the basic mode of the voice – speech quality (Estill, 1997b) (often referred to by many as ‘Modal’ voice).

As defined by Estill, the characteristics of speech quality are:

  • It is the quality heard in everyday educated speech.
  • The vocal tract is in a neutral position and relaxed.
  • The perception of effort is at the larynx.
  • As there is more acoustic energy in the lower range, and because with lower frequencies, there are more partials in the upper spectrum, this quality has the highest intelligibility.
  • It can be an exciting component in the classic Opera quality. (Estill, 1997a, 11)

I have found value in using this information when working with my students, most of whom are of the age group where they have what might be considered a ‘transitional voice’, a voice that is moving from an adolescent sound to an adult sound.

2. The benefits of using Speech Mode to the student and the teacher

Why is speech mode useful in the development of young singers? Many of our voice students at the University of Newcastle are female and aged around eighteen on entry to the Bachelor of Music degree. Many come from a background of choral singing/training as well as those who have had one-on-one training leading to their entry into the degree programme. Most singing teachers will not bring a young voice on too early so voices are not overly developed; indeed, it is usual for a young singer to enter a tertiary course with a background in voice production, but it is not expected that their technique will be overly advanced at the age of eighteen. Rather it is the potential in the voice and person that is the guide when selecting students for entry at that age. It is understood that the voice is in an early stage of development on entry to an undergraduate programme and that there is an expectation of further advanced study once the student completes that degree programme.

If the student has a strong choral background then one can expect that the sound is most likely to have a falsetto quality. In my experience, most choral conductors of youth choirs will opt for an essentially falsetto quality, certainly in Australia this would appear to be the norm. I speak of falsetto quality, rather than what we might know as falsetto sound typical of say, the counter-tenor. This quality has the following traits: the vocal folds are not fully adducted,1 rather they touch at either end rather than along the full length of the fold, as a result the onset of the voice is aspirant, resulting in a breathy quality and often poor breath management. The voice may have what could be described as a  ‘hollow’ quality and vibrato is most often absent. In addition the vocal folds stiffen and the mucosal wave produced by them is quite different to that produced by fully adducted vocal folds.

In addition, young choral students are often encouraged by their directors to ‘take big breaths’, which most will interpret as ‘overfilling’ with air. The result of this phenomenon is that majority will present with an aspirated onset from that cause as well as from the posture of the vocal folds. The sound produced may be breathy, mostly without vibrato but, in the upper register, will carry quite well. If the chorister has a good ear, we know that this can produce a singer with a lovely quality – often described as ‘angelic’ or ‘ethereal’.

In the 2005 edition of the Estill Voice Training Systems publication of Estill’s Level Two (Six Voice Qualities) redefined by author Mary McDonald Klimek et al, Falsetto Quality is described in this manner:

  • At the vocal fold the body cover of the fold is stiff.
  • It shows an absent or fleeting closed phase during vibration, a minimally obstructed flow of breath, with the highest air-flow rate of any quality.
  • The vocal tract structures are Mid, relaxed.
  • Acoustically there will be very little intensity about the first formant (0.5–1.0 kHz).
  • The quality favours the high range. In mid to low range Falsetto Quality is generally weak and trying to make lower pitches louder can result in voicing and or pitch breaks.
  • Generally, it lacks emotional intensity. (Klimek et al., 2005, 22)

The final point is of particular importance to this discussion as it is the lack of emotional intensity that will inhibit an emotionally truthful performance.

The benefits of using speech quality to encourage a vibrant sound in a developing voice are:

  • Many of the desirable postures in the larynx are already established in speech – they do not have to be taught.
  • Speech, in the majority of persons, has a balanced onset, shows adduction of the vocal folds, and the intake of breath is ‘natural’ (i.e. the student will not ‘overfill’ if that is their habitual manner of taking in breath to sing).
  • The move from speech to singing is therefore to increase the sub-glottic pressure. In other words, to increase the level of breath support.
  • It feels ‘natural’ to the student.

3. Perceptual problems

There are some challenges that both the student and the teacher will face in attempted to change the type of sound the student is habitually used to producing.

  • Many young singers produce a ‘singing noise’.
  • The singer hears the voice differently to the listener.
  • Vocal colour may remain static in the developing singer due to the student’s reluctance to try a range of sounds.
  • The physical effort of producing the voice may be in the wrong place.

It is my experience that many young singers present with what I term a ‘singing noise’. This is often croon-like, may be aspirated and more importantly will not have a central core to the sound. Due to that fact that they may have been praised for producing a certain range of sounds during their teenage years, have passed the local examining body’s grade examinations and won prizes in junior competitions, the singer may be unwilling to change their sound and to explore and adopt a new range of alternate sounds suggested by a singing teacher. Those from a church or cathedral choir background may have become reliant on the building to supply the sorts of resonance necessary for carrying the sound and be unwilling to adopt an internal acoustic based on vowel resonance that so clearly typifies the sound made by a professional singer (not just what many may term an opera singer, but also the concert singer and professional chorister).

Furthermore due to the embryonic state of the singer’s technique, developing singers often display a lack of adequate strength of breath support. This normally means that the student is unable to maintain an open and balanced vocal tract. As voice teacher Janice Chapman maintains:

The interaction of airflow and the resonating system cannot be ignored. Inadequate airflow will cause the pharyngeal space to reduce which will affect the beauty of the tone. (Chapman, 2006, 84)

The development of an adequate breath management system is obviously mandatory.

As the throat is not able to be maintained in an ‘open’ posture, a consequence is often a lazy soft palate which further dissipates the sound by allowing air to escape through the nose. The resulting sound produced by the singer may be pleasant but will often show a reduction in the range of overtones that are produced by the voice. It is the spread of overtones in the classical singer’s sound that give the richness to the tone and the energy in the sound required to adequately perform the repertoire. Many authors, such as Vennard (1967), McKinney (1994), Miller (1996) and Chapman (2006) note that it is the resonant potential of the human voice that must be developed to obtain a mature classical sound. The use of speech quality, with the range of partials in the upper spectrum (as defined by Estill), may be a method of introducing a young singer to the possibilities of a resonant sound. McKinney has the following useful observation:

The modal voice has a broad harmonic spectrum, rich in overtones, because of the rolling motion of the cords. (McKinney, 1994, 97)

It is important, therefore, to begin to equip the singer with the techniques suitable for producing a classical sound.

As with most forms of training, there are negative aspects of using speech mode that need to be taken into consideration. These include the following issues:

  • The student may attempt to push speech range beyond its register limit if the mode is confused with chest voice – which may be described as that voice which uses the heaviest mechanism of the voice exclusively.
  • As pitches rise there may be a tendency to tighten the throat and for the larynx to rise excessively.
  • Constriction of muscles in the pharynx and tension in the jaw may occur if the student forces the voice.

The singing teacher will, at the same time as introducing the concept of speech mode, teach a coordinated vocal onset, and be watchful for jaw and throat tension. It is at this stage that the singer’s perception of her own sound may raise some challenges. The introduction of concepts, such as a neutral position for the larynx, will alter the sound and the feeling of singing for the student. Indeed, work on an effective closure of the soft palate will change both the sensations the student will feel and the sound they will hear. These, perhaps new and untried, vocal postures will result in changes in the sound, which may prove difficult for the student to accept. As well as changes in the sound, there may also be changes in the kinaesthetic feedback that a student receives. Bunch notes:

It can be disturbing when one realises that what others hear is different, particularly when the sound is perceived as representing one’s self-image and personality in some cases a teacher will have trouble implementing a change in tonal quality because the student incorrectly feels he is being asked to change his personality. (Bunch, 1997, 11)

There are some basic techniques that singing teachers may introduce to further assist the student towards a more mature sound. These include:

  • The concept of the open throat or retraction of the false vocal folds as identified by Estill (1997). The concept of the ‘pre-yawn’ posture, used by many teachers, will also assist the student to establish an open throat.
  • The alignment of the body, especially the position of the jaw, neck, shoulders and hips will need to be carefully watched to ensure that the student is not adding unwanted and unhelpful tension while singing.

4. The move from speech to singing

I use some simple exercises to enable the student to begin explore a full sound. I move from speech into singing using basic speech as a starting point.

Some examples of the exercises are:

  • Counting. Simply from 1–5 and back. This corresponds (of course) to a five-note scale. I then add in repeated 5s. I.e. 1 2 3 4 5 5 5 5 5 4 3 2 1. The exercises are begun around middle C. It is important that the student speaks the numbers aloud first; they are then asked to sing the scale while maintaining the feeling of speaking. We begin slowly but gradually increase the speed until the scale can be sung very quickly. What then happens is that the speed of breath flow is increased as an automatic result, moving from a speaking sound to a singing sound. I am especially alert at this stage that the onset of sound is speech like – that is NOT aspirated. With some careful guidance, especially not allowing the student to sing the lower pitches too strongly or loudly, I have found that this simple exercise can allow the student to move from speech to singing while maintaining full adduction and therefore a full sound.
  • Speaking (declaiming) the text aloud. This process typically produces a number of consequences, one of which is to begin to equip the student to declaim recitative effectively. Apart from the obvious point of allowing them to invest the text with meaning, the student is encouraged to find the emotional context of the words and to feel that in their body. This means that we are also now encouraging the abdominal support to engage, an important factor in ensuring an emotional performance. Students are often mystified to find that they are able to speak long vocal phrases when the text is placed on a monotone, yet they are unable to sing the same length of phrase without running out of breath. Once again speech mode assists by promoting a coordinated onset, and speaking out aloud (to an imaginary audience some feet away) engages the support muscles. Students are encouraged to speak the phrase and then to sing it. The benefits of a balanced vocal tract brought about by using this technique means that pressed phonation is often avoided.
  • Speaking on a monotone enables students to monitor their phonation, checking for aspirant beginnings and places where aspirated consonants and breathy vowels can lose or waste breath during a phrase. There is an adjunct to this technique that I have found very useful. That is the concept of the ‘Spooky Lady’. This was suggested by an American teacher, Clayne Robison, in a workshop at Newcastle University and is the idea of varying the pitch whist speaking as if telling a story using an exaggerated form of delivery. This technique is useful because it combines the balanced onset with a freedom and flexibility in the support muscles. It allows the student to move within a range of pitches whilst essentially using the ‘monotone’ idea. This is performed in lower and higher ranges and serves to further assist with the maintenance of a uniform quality to the voice.

‘Si canta come si parla’ is an often-quoted phrase from the earlier days of bel canto teaching (Miller, 1996, 74). I have ruminated about a number of ways that one could interpret that phrase – from the obvious ‘sing as you speak’ (which is essentially the subject of this paper) but also that students should be encouraged to speak as they would sing – in other words, to use the vocal apparatus in normal everyday speech as they would whilst singing. This is quite a challenge for many young singers. We learn to speak by experience from childhood and can develop any number of different ways of producing similar sounds and may, in fact, develop speech habits that are detrimental to vocal function. A singer, however, needs a relaxed and open vocal tract, a lifted soft palate and a neutral position for the larynx, all of which may be in completely different positions during everyday speech.

The important thing, as far as increasing the listener’s emotional response to music, is to ensure that there is a balanced range of partials. That is, the fundamental energy of the note is balanced with the range of higher peaks of energy that the voice is capable of producing. The peak of energy around 2800–3500 KHz is known as the singer’s formant and is often referred to as the ‘ring’ of the voice (Vennard, 1967, 96). As well as contributing greatly to the carrying power of the voice, the presence of balanced partials will transfer the emotional content of the music more easily to the listener. If the tone is breathy, fuzzy or overly strident for example, the listener may feel uncomfortable knowing that ‘something’ is lacking. Regardless of how beautiful the initial impression of the sound may be, sooner or later we realize that we want more – we want to be moved. A balanced set of partials will assist here.

In order to gain a truthful and emotional performance, two main things are needed. Firstly, that the vocal apparatus is ‘set up’ in the most appropriate manner. Secondly, that the voice is rich in overtones and presents the listener with range of vocal colours, to which the listener may respond – an essential element of Baroque vocal performance. Speech mode may well set the voice up so that the singer can attain both objectives. Recitative, the spoken ‘dialogue’ that is a major component of seventeenth- and eighteenth-century vocal music, requires both clear articulation and a range of vocal colours commensurate with the emotional requirements of the text. If the young singer is using a falsetto quality as a habitual mode of voice production, the resultant sound will not have an even spread of partials. In addition, the use of falsetto quality in the lower range of the voice will mean (as Estill found) that the sound is weak and will not either carry to an audience members effectively or have an emotional intensity.

As a way of illustrating the difference that the use of speech quality can make, a student of mine agreed to make a short recording of part of the recitative ‘Thy Hand, Belinda’ from Purcell’s opera Dido and Aeneas. The particular student has a problem with the range of notes previously mentioned from C4–F4. At F4 she tends to slip in and out of falsetto quality. I asked her to record the recitative firstly without concentrating on what was happening as she negotiated the phrase ‘More I would but death invades me’ which moves around the troublesome area (Eb4–G4). Following that recording, she was asked to speak the recitative aloud as if delivering it to an audience. The recitative was then re-recorded. A spectral analysis was then made of the two recordings – finding an average analysis for the previously mentioned phrase. The analysis was made using Sony Sound Forge programme. The results of two recordings are shown in the following graphs.

Recording 1:

Note area on the graph between 1500 Hz and 4000 Hz

Recording 2:

Note that the peaks of energy in the 1500–2000 Hz area are more even and that the spread of peaks is more consistent in the area from 3000–4000 Hz in recording 2 than in recording 1.

The spectral graphs show decibels (loudness) on the vertical axis and hertz (amplitude) along the horizontal axis. The major difference between the first and second recording is that the second recording exhibits stronger peaks of energy visible in the region from 1500–2000 Hz and in the region between 3000–4000 Hz. The spread of the peaks of energy is more even and consistent in the second analysis.  The three major peaks of energy seen on the left hand side of the analyses show as higher peaks on the second analysis essentially because the volume (energy) in the sound is stronger in the second recording.

The simple analyses would appear to agree with the assertions of Estill and McKinney noted earlier. They suggest that speech quality (or ‘modal voice’ as McKinney terms it) has a broad harmonic spectrum that is rich in overtones (McKinney, 1994, 97). In addition the energy shown in the second analysis shows stronger and more even peaks in the spectral region between 2800–3500 Hz corresponding to the area of energy known as the singer’s formant.

5. Conclusions

Although this research is at an early stage, there are some conclusions that may be drawn.

  • Speech mode may be an ideal way to find the core of the voice, the individual sound of the person.
  • It is an easy concept for the student to grasp, and, perhaps more importantly, one they can work with outside the studio.
  • Working on general speech quality and the speaking voice has a considerable effect on the singing voice.
  • Using speech mode to rehearse and declaim the text of recitatives and songs has a strong application for an emotional connection to the text.

There is much still to be done, including the use of a more sensitive microphone and more favourable conditions under which to obtain samples, as well as learning to use the Sound Forge programme to greater effect. The singing students with whom I work confirm the conclusions I have noted above.  I can say anecdotally, that having worked with speech mode for some time now, I find that its use promotes the development of a unified vocal quality which is able to be effectively used throughout the entire singable range required for the majority of baroque and classical works.

For the singer at a transitional stage of development, it has proven to be an effective method of establishing a coordinated onset and to assist in the building a dynamic breath management system. All of this means that the student’s sound improves, the voice carries well in a hall and – again anecdotally – they find an ease of production that they may not have experienced before. As well, a range of vocal colours is encouraged, using speech as a modelling factor, which can further enhance the student’s performance. Although it is certain that dynamic vocal colour will play a part in a performance of any musical style or period, the extensive use of recitative in the Baroque era necessitates the use of a range vocal colours in order to clearly communicate the emotional content found therein. The student must be able to negotiate such a range of colour in order to effectively portray the varied emotion of recitative.

6. Wider application

As all of the concepts noted above are based on both scientific and technical knowledge including experience in the vocal studio, the use of speech mode and the technical concepts of open throat and breath support may be transferred to assist any professional voice user. They also have application to professional singers – of any genre – with appropriate adjustments for style and delivery. Effective posture for the voice user has significant effects on the efficiency of vocal production and may be applied to any profession that requires extensive use of the voice. Further research into areas of muscle use including tension and relaxation may provide more insights into the effective function of the voice whether it is being used for speech or singing.


BUNCH, M. (1997) Dynamics of the Singing Voice, New York, Springer Wien.

CHAPMAN, J. L. (2006) Singing and Teaching Singing: a Holistic Approach to Classical Voice, Abington, Plural Publishing, Inc.

ESTILL, J. (1997a) Compulsory Figures for Voice: a User's Guide to Voice Quality. Level Two, Six Basic Voice Qualities, Santa Rosa, Estill Voice Training Systems.

ESTILL, J. (1997b) Compulsory Figures for Voice: a User's Guide to Voice Quality. Level One, Primer of Basic Figures, Santa Rosa, Estill Voice Training Systems.

KLIMEK, M. M., OBERT, K. & STEINHAUER, K. (2005) The Estill Voice Training System. Level Two, Figure Combinations for Six Voice Qualities: Workbook, Estill Voice Training Systems International, LLC.

MCKINNEY, J. C. (1994) The Diagnosis and Correction of Vocal Faults, Nashville, Genevox Music Group.

MILLER, R. (1996) The Structure of Singing: System and Art in Vocal Technique, Belmont CA Wadsworth Group, Thomson Learning Inc.

VENNARD, W. (1967) SINGING the Mechanism and the Technic, New York, Carl Fischer Inc.

Christopher Allen

Well known as a soloist with major choral societies such as the Sydney Philharmonia Choirs, Newcastle University Choir and chamber choir Coro Innominata, Christopher Allan is respected for his interpretation of works ranging from the sixteenth century to the present day. Recent major performances have included appearances in Bach’s St John and St Matthew Passions, Messiah, Orff’s Carmina Burana and the Requiems by Mozart and Duruflé. From 1994 to 2001 Christopher Allan was guest artist with the Sydney-based vocal ensemble The Song Company, with whom he has performed both Early Music and contemporary works in concert, for broadcast and on CD. With The Song Company he appeared in the world premiere performances of Vincent Plush’s Funereal Rites and Raffaello Marcellino’s Fish Tales. Additionally as an ensemble singer, he is a member of the professional chamber choirs ACO Voices and Cantillation, both in concert and on recording. For many years as a member of the Opera Australia Chorus, Christopher Allan has appeared in such operas as Billy Budd, Tannhäuser, The Barber of Seville, Turandot and Don Carlos as well as undertaking understudy work for the company. In recital he has presented several major cycles including Copland’s American Songs and Vaughan Williams’ Five Mystical Songs. Christopher is a noted interpreter of the vocal works of Nigel Butterley, most recently giving a performance of Butterley’s Blake Songs at Canberra’s National Library and at Newcastle Conservatorium with the composer as an accompanist. Christopher Allan is Deputy Head of the School of Drama, Fine Art and Music and Head of the Vocal Department at the University of Newcastle and is presently undertaking PhD studies in vocal pedagogy.

  1. Adduct: to draw together, in this case the word describes the inner edges of the vocal folds meeting along their length.