D.G. Malham

Department of Music

University of York.

Updated version of paper presented at AES UK "Second Century of Audio" Conference, London, 7 -8th June 1999

In the dying years of the twentieth century, after more than a hundred years of recorded sound and half a century in which the use of two channel stereo has been widely regarded as synonymous with the high fidelity reproduction of recorded music, multichannel surround sound has finally begun to make real inroads into the audio market. This has largely been engendered by the public's exposure to a growing number of films making use of what we may term "cinema style" surround systems. Unfortunately, cinema style systems, just like their two or three channel stereo predecessors, are firmly rooted in the classical Western concert paradigm of the performers being positioned on a restricted stage area situated in front of a seated audience. The assumption that all music can be treated the same way is totally unwarranted and is simply untrue for much non-Western music, for Western music before the Seventeenth Century and for much contemporary music, from Electroacoustic to Techno. It is the author's firmly held belief that the term "High Fidelity" should mean just that - fidelity to the original performers and composers musical intentions. This should include all the spatial aspects of the performance as well as the more obvious requirements of flat frequency response and low distortion. As we move into the new century, we should be concentrating on ways to achieve this and it is the purpose of this paper to review some of the progress made so far and look at the prospects for further developments.


It is useful here to introduce definitions of two terms related to the nature of soundfields and, in particular, those which a recording/reproduction system generates. A homogeneous sound reproduction system is defined as one in which no direction is preferentially treated and a coherent system as one in which the image remains stable, ie is subject to no significant discontinuities, if the listener changes position within it, though the image may change as, indeed, a natural soundfield does. Note that for the purpose of this paper, a natural soundfield will always be assumed to be both homogeneous and coherent. Even though it is possible to conceive of ways in which a natural soundfield might not be homogeneous or even coherent, we will be concerned solely with the homogeneity and coherency of the sound reproduction system since high fidelity requires that the recording and reproduction system in use does not impose its own spatial context.

Cinema Surround Systems

Cinema systems are, in general, not homogeneous, since the location of sounds coming from on-screen action is far more rigorously controlled than those coming from off-screen, especially in the rear surround area. This is widely regarded as essential in the case of the presentation of visual material on a front-mounted screen, since a lack of correspondence between visual and audio images (especially when on-screen characters are speaking) has been found to be very distracting. On the other hand, such systems can, in one sense, be regarded as coherent since the coherency of the frontal image is ensured by the use of decorrelated sounds in the rest of the audio image. This is usually achieved by delaying and diffusing the sound coming from the surround speakers and restricting their frequency response (although this is not the case in newer systems) Despite the fact that they were not originally intended for the purpose and the fact that they meet the letter rather than the spirit of coherency, such systems are being pressed more and more into use for music recording and composition. As we have seen above, it can be argued that the ideal system for music is one in which the image of the reproduced soundfield, whether recorded or synthesised, is both homogeneous and coherent. As a result of a deliberate design choice, Cinema Style surround does not meet the first of these criteria and only meets the second in a limited fashion. It seems from an informal analysis of the various reports that have been published, that recording engineers and producers who are trying to use these systems for pure music recordings may, through the way they distribute sounds between the limited number of channels in use, be attempting to circumvent the lack of homogeneity in an empirical fashion. For instance, it is possible, by careful tailoring of the speaker feeds, to approach the homogeneous/coherent criteria within the context of a particular system's actual layout(1). However, as has long been recognized [1], if recordings made using this approach are to work on more than just the originating system, similarity of a layout is essential. Unfortunately the lack of effective standardisation of loudspeaker locations for Cinema Style systems (despite the existence of a standard in BS1116) rules this approach out. Indeed, since it is widely recognized that only a minority of existing stereo system owners have their speakers correctly set up, it seems unlikely that standardisation could ever form the basis of a practical mass market system. An alternative is to develop systems based on the concept of utilising a transformation matrix between the loudspeaker layout in the originating studio and the listeners home system.

An early but good example of this approach is the Ambisonic system devised in the 1970's by Michael Gerzon, Peter Fellgett, Peter Craven and Geoffrey Barton [2][3][4], which was also independently developed by Cooper and Shiga [5]. In the Ambisonic system, the sounds and their directional components are encoded vectorially in a set of spherical harmonics. In the simplest fully three dimensional version (ie in a first order system), only four components are required, the four signals collectively being termed the B Format signals. By applying a suitable transformation matrix (or decoder) almost any regular, three-dimensional array of speakers placed around a listener may be driven. The results over the whole of the sphere round the listener can be nearly as good as stereo is capable of over a mere 60 arc in front of the listener. The nature of B format is such that it can be treated computationally as a single entity. Note that it does not matter whether a soundfield encoded in this manner contains only a single sound source or, as in the case of a natural soundfield, a multiplicity of them with a multiplicity of different positions. In either case, it can be subject to transformations, such as rotation, tilting, tumbling, or mirroring by applying identical algorithms [6][7]. Many different transforms can be applied simultaneously to an arbitrarily complex B format coded soundfield using, for the first order case, just one multiplication of the 4x1 input signal matrix by a 4x4 matrix of coefficients [8]. The computing power required to do so in real time, even on better than CD quality audio, is easily within the reach of most modern PC's or workstations. The approach can even be used to form the basis of a spatial computing engine within a system intended to output binaural sound to headphones or to speakers using transaural algorithms [9]. This approach is now in use in the Lake DSP Huron processor where it is employed in order to reduce the computational loading which is associated with pure binaural systems where realistic or near-realistic soundscapes are involved. By placing all the sound sources in a B format soundfield including, if required, full complexity natural soundfields recorded with a Soundfield microphone [10][11], the processing involved in rotating, tilting, etc. the full soundfield as required in a head tracking configuration is significantly simplified compared to that involved when directly processing the HRTF's. The B format signals can then be decoded to virtual speaker feed signals and only these need to be passed through HRTF filters. Since these are limited to a single fixed set of HRTF's, it is possible to do all necessary operations on standard hardware, even when full head tracking is in use.


Although binaural systems are in general homogeneous, they can only meet the criteria set out above for coherence if head tracking can be used. Since, at present, it is not possible to apply the necessary spatial manipulations to a binaurally recorded natural soundfield, this is currently only practicable in the case of a fully synthesized binaural soundfield or one which was originally recorded using a homogeneous/coherent technique such as Ambisonics. This restricts fully homogeneous/coherent applications of binaural techniques to Virtual Reality systems, computer games and multitrack studio-based recordings. The reader will no doubt have noted that the limitations of binaural recordings in this respect have very carefully been tagged "at present". Since this paper is to be given at a conference entitled "Audio - the second century", it has to be born in mind that the number and capacity of the transmission/recording channels we have available will continue to increase and that it is by no means unthinkable that in not too many years it may be practicable to distribute multichannel binaural recordings from a dummy head with a multiplicity of "ears" from which head tracking could be used to select (and, if necessary interpolate between) pairs according to the direction the listener is facing. Although such a system might lose some of the detail in the HRTF's (since the "head" would probably have to be simplified to a sphere), it is the present author's experience that adding head tracking can stabilise even very poor binaural presentations such as those caused by a severe mismatch between the recorded HRTF's and the listeners' own. It is the author's understanding that partial versions of such systems, operating in the horizontal plane only, have already been demonstrated. This approach does not necessarily have to lead to a totally unthinkable number of channels. Given the fact that storage capacity of computing systems is doubling roughly every eighteen months we should certainly be prepared to think about such possibilities. For instance, if 15 resolution in the horizontal and vertical plane were sufficient, which it might be with suitable interpolation algorithms, then only 272 channels would be needed [12] and the computing power necessary to interpolate between the selected channel pairs for a given head orientation would not be excessive, even for a typical home computer now, let alone what might be commonplace in a few years time.

A further, and probably insurmountable, problem inherent in using headphones alone to present sonic images is that the air-ear-brain path is not the only one through which we perceive sound, since bone conduction is known to play a part as are body cavity resonances where bass frequencies are concerned. For full fidelity, these mechanisms also need to be allowed for.

Pure binaural systems also fail to meet the coherence criteria when dealing with recorded natural soundfields, except within a very small area. This is due to the fact that under lateral translation the imaging rapidly becomes chaotic as the listener position ceases to be equivalent to the recording position. Of course, if it is regarded as permissible for the listener to "carry" the sound image with him/her, then a restricted form of coherency remains, based solely on the correctness of the image under rotational variation. However, this is sufficiently divorced from reality not to be viable for the majority of cases, even in the long run.


Non-binaural systems, ie. those based solely on loudspeaker presentation of material, have their own problems. For systems based on an extension of the simple form of phantom imaging employed in two channel stereo systems, homogeneity is difficult if not impossible to achieve. Any source placed in the direction of an individual speaker always tends to be preferentially treated. This can also be seen in Poletti's modification's to the Ambisonic system as contained in his 1996 paper [13]. Although the laws for panning sounds that he proposes do lead to lower errors, this lowering is very unevenly spread, with significantly smaller errors occurring in the direction of the speakers. The system cannot, therefore, meet the homogeneity criterion, even though it should meet the coherency criteria provided an even distribution of speakers is used and the listener is not too close to any one speaker. However, for systems based on 5.1 or other cinema related layouts with uneven angular separations of the speakers, there is little doubt that it would be very difficult to achieve both homogeneity and coherency simultaneously.

For material which is amplitude panned between pairs (or at most, small numbers) of speakers, there is always likely to be problems associated with sound sources placed between speakers because of the nature of phantom images. To avoid this, one option would be to use a single speaker for each direction. For a system such as this to be fully homogeneous/coherent it would need to match the resolution of the human directional perception mechanisms. These have a resolving capacity of 1 at best [14]. If the transducers were positioned on a spherical surface around the listening area so as to match this Minimum Audible Angle (MAA), over 40,000 point sources would be required(2) and it may be some considerable time before we can contemplate this in the home! Moreover, there are equal if not greater problems inherent in trying to devise ways of capturing the necessary information from natural soundfields although if such a high capacity system could be implemented it would be amenable to the use of transformation matrices. We must therefore look at other approaches in order to reduce the problem to manageable levels.

There are presently three systems which allow for the use of the transformation matrix approach to differences between studio and home installations which are also capable of performances close to that implied in the definition of a homogeneous and coherent system. They are:

These have, in the past, been presented as competing technologies but it has emerged in recent papers that these approaches are closely related [17[18]. For instance, Ambisonics was recently shown [16] to be a special case of holophony(3) and both holophony and Wave Field synthesis can be regarded as equivalent in principle (if not in implementation detail) as they both exploit the Huygens' Principle(4) so that they can substitute a more limited number of real sources (ie loudspeakers) for the imaginary sources making up the wavefront.

Indeed, Nicol has stated(6)

"Wave Field Synthesis is one particular implementation of Holophony: it is based on the general Holophonic approach, but it uses some approximations/simplifications which make the specificity of the system. For instance, in theory, the acoustical field must be recorded over a surface: in the Wave Field Synthesis system, it is recorded along a curve. (and in the implementation of holophony in [16] - author). Moreover, the Wave Field Synthesis system uses specific solutions to the problems of truncature and spatial aliasing raised by the implementation of an Holophonic system."

All three systems, then, take essentially the same approach in that they all attempt to reconstruct in the listening space a set of audio wavefronts which matches those in the recording space. Of course, the systems are not restricted to recorded natural soundfields, since a set of audio wavefronts can be synthesized which will produce the required soundfield in those situations where either it is not desirable to retain the original natural soundfield (such as when the recording venue has poor acoustics) or where the recording itself is the artistic entity.

Wave Field Synthesis.

Wave Field synthesis derives from the work of Berkhout (see, for instance, Berkhout [20]) which in turn is related to the original Bell Labs work of the 1930's [21] [22] which used a curtain of microphones hung in front of the sound source feeding a similar curtain of loudspeakers. In much of the published work that has been reviewed for this paper [20] [15][18] the system is defined in terms of planar structures of transducers, whether microphones or loudspeakers. The elements in these structures can have their signals weighted so as to minimise the effects of spatial aliasing resulting from the excessive spacings necessary to allow for the non-zero size of transducers in a practical array(7). Further reductions in the large numbers of transducers involved are achieved by using one dimensional, linear arrays [18] rather than plane surfaces. A maximum of four arrays in a square is referred to, although there seems to be no theoretical reason why there should be any such limitation to the maximum. It is not entirely clear from the published work that I have read whether practical systems are available for homogenous and coherent reproduction of natural soundfields, even when limited to the simplest horizontal case.

For instance Horbach states [18]

"...it must be realised that it is very difficult to record only the directional field from behind the array, while suppressing the components from the other half space."

and he then goes on to suggest that it is much better to synthesise the Wave Field than record it. Other papers, however, indicate that it is at least possible to use a line of microphones connected to a line of speakers, so that sound coming from a stage (note the use of the western paradigm) can be amplified or recorded. Note that although this is a volume solution with its concomitant high coherency it would not be homogeneous.


In holophony (as described by Nicol) a spherical surface is employed, rather than a planar one, although this is simplified to a circular array in a single plane for the same reasons of practicality as the choice of one-dimensional arrays in Wave Field synthesis. With matching loudspeaker and microphone arrays having both monopole (ie omnidirectional) and dipole (ie figure-of-eight) transducers a one-to-one connection of the two arrays results in a full reconstruction of the plane wave within the array, up to a maximum frequency set by the transducer spacing. This is approximately 1.4KHz for a 12cm transducer spacing. Although this seems rather poor, perceptual experiments have shown that this does not severely degrade spatial imaging [23]. If the source direction is known, monopole microphones and loudspeakers can be used. However, this necessitates the use of a spatial weighting in which in the simplest case only those sources directly facing the direction of the original plane wave are switched on. This is obviously impractical under most real conditions of use and as a result degradation of the reconstruction has to be tolerated if recordings of real soundfields are required. This may lead to it failing on coherency, though it is homogeneous in the horizontal plane in current practical implementations. With a spherical system it should be fully homogeneous.



Ambisonic systems also work by reconstructing wavefronts. However, unlike either Wave Field synthesis or holophony, the sampling and reconstruction are done at a point, rather than over an area. As such, the demands for channel capacity are much reduced. The directional information is encoded in a set of channels which are related to each other and the directions of the sound source by coefficients defined in terms of the spherical harmonic series (see appendix B). Basic, ie first order, Ambisonic systems have been in use since the 1970's but it is only in the last few years that any serious work has been published on second and higher orders [24][25]. In these papers it is shown that whilst the Ambisonic system produces full reconstruction - and hence coherence and homogeneity - for all frequencies and directions only at the central point, the area over which the error is low increases asymptotically with increasing order. The frequency at which any given level of error is exceeded also increases with increasing order. The system is, like holophony, inherently homogeneous and is coherent at the centre, at least for head rotation at lower frequencies. Because it is performing reconstruction at a point, it cannot fully meet the criteria for coherency under conditions of translation. However the degradation as the listener moves off centre is relatively graceful and sufficiently similar to what would happen in a natural soundfield as not to be unduly disturbing to a listener [26]. Daniel has shown [24] by a statistical analysis of the pattern of localisation errors which occur as a listener moves off centre that the wide area performance is very significantly improved by simply going from first to second order, all other things staying the same. Unfortunately, there is currently no second (or higher) order microphone system available. This restricts systems using components above first order to synthesised images. However, as the papers published by Nicol and Emerit have shown, Ambisonics can be regarded as a special case of holophony. It is possible, therefore, to think in terms of using a Holophonic microphone array to produce the signals required for higher orders of Ambisonics [17]. How practical this might be remains to be seen as, as far as the author is aware, no rigorous error analysis has yet been undertaken, other than those relating to the spatial aliasing arising from the practical limitations on the number of microphones used. Error sources for which analysis results have not yet been published include variations in microphone sensitivity, frequency response and position. Some work has been done on the effect of using different polar patterns for arrays of nonidentical loudspeakers [27] but nothing has been published on the effect of polar plot variations between supposedly identical microphones or with frequency in individual microphones. Furthermore, the acoustic interactions and disturbances caused by having so many microphone capsules occupying a small volume, which can already be observed in the first order soundfield microphone, have not been examined though they will undoubtably pose more of a problem than in the first order case. This will be especially relevant in the spherical version necessary for full homogeneity.

Despite these problems, Ambisonics still seems to represent the most practical system to pursue in the short and medium term In particular, its elegant and highly efficient method for encoding directional information has been identified as extensible to storing many other forms of audio related information. Spherical harmonics have, for instance, been shown to be an efficient way of representing HRTF's [28]. Both Horbach [18] and Nicol [17] have shown that it is an efficient channel encoding method for volume solution wavefront reconstruction methodologies as well as for single point ones. The present author published at an earlier AES-UK conference a proposal for using spherical harmonics for encoding all relevant information about the channel contents of a multichannel recording [29]. For recordings made with nonhomogeneous or non-coherent systems this could include details such as speaker placement, sensitivities and frequency responses. A detailed description such as this, used in conjunction with the sort of computing power likely to be available to him/her, would allow the listener of the future with their fully homogeneous and coherent system to experience earlier recordings, even where they are not in themselves homogeneous/coherent, exactly as intended by the original recording engineer or producer.

Mixed Systems

In the medium term, mixed systems may appear as proposed by both Nicol and Horbach, with higher order Ambisonic at-a-point reconstruction used in combination with a limited implementation of a volume reconstruction solution such as Holophonics. This would give near full fidelity in the central area from the Ambisonic system together with the enhanced area of coherency typical of the use of a volume reconstruction solution. In the long term, full, three dimensional, volume reconstruction solutions will almost certainly become technologically feasible. Whether they are ever deployed for home systems (or even commercial or research ones) is likely to rest on considerations other than the technological, it being difficult enough to fit the five speakers plus sub-woofer of a cinema style system into the average living room, let alone the kind of numbers the sort of systems discussed here would require and the cost that would seem to imply. On the other hand, the very numbers involved may provide their own solution, since no one speaker would have to provide any significant level of sound output (a fact already noted in Ambisonic systems) they could be very small. Moreover, the appearance of flat panel transducers (Bank, 1998) with reasonable fidelity points the way to future possibilities in that direction. The cost may also be driven down by the large numbers involved.


It has shown that for an audio system to merit the term "high fidelity" in the fullest sense, it must meet not only the goals of low distortion and wide, flat frequency response, but also it must be both homogeneous in its treatment of the 3-D audio space the listener is immersed in and capable of remaining coherent as the listener moves within that space. For the home listener, those criteria are currently most closely met by systems based on the Ambisonic approach. In the medium to long term, as storage capacities of media and channel capacities of transmission systems (wired or un-wired) continue what appears to be an inexorable rise, volume solutions to recording, synthesising and reproducing soundfields will become practical for in-home use, not just for specialised commercial and research applications. If the appropriate approach is used and if sufficient information is available, all current approaches to surround sound recording could be transparently subsumed within such a system. This would make it possible to leave all the original artistic decisions about how and where to place material within the limitations of today's nonhomogeneous, non-coherent systems intact whilst allowing engineers and producers using the new system full artistic freedom in their use of sound in space.



The number of transducers with an angular separation of a radians that are needed to cover the surface of a unit sphere, ie one with a radius of one, can be approximated as follows, provided a is small enough. The transducers can be considered as occupying a circular area on the surface of the sphere generated by the intersection of the conical angle a with the surface. The radius r of such a circle is

and if the angle is small enough, the area A of the circle approximates to;

As the surface area of a unit sphere is 4pi and the packing efficiency of circles on a surface pi/4 for square packing, the number of transducers that can be packed in this fashion is approximated by;

Note that there are a number of approximations in this, not least in that there are more efficient packing schemes, but for small values of the conic angle, it gives at least an indication of the order of magnitude of the required numbers of transducers, although it does tend to underestimate somewhat. If the system is oversampled by a factor of 2, the number of transducers required is over 160,000.


The position of a sound source S within a three-dimensional soundfield is encoded in the four signals which make up the B format thus;

where phi is the anticlockwise angle from centre front and theta is the elevation. These signals are equivalent to three figure-of-eight microphones at right angles to each other, as indicated by the directions in the brackets. These three signals, together with an omnidirectional (pressure) signal, all have to be effectively coincident over the frequency range of interest. This defines the standard zeroeth and first order Ambisonic B format signals.

Second order components vary at twice the angular rate of the first order ones. For a fully homogeneous system, in which the height component is to be included, five extra channels are required. However, for a horizontal only second order system the required additional components are (after Bamford [25]);

giving a total of five channels, W,X,Y,U and V, which fits nicely into current cinema style distribution systems

One important factor which the presence of second order components influences is the maximum directivity of the source directional response (SDR) pattern [26] that each speaker exhibits as the apparent source location moves. For first order systems, the maximum ratio of response in the front half of the polar pattern to that in the rear half (ignoring polarity inversions) is only 9 dB, whereas in a second order system, this is increased to nearly 25dB. It is this factor, coupled with the higher frequency to which fully wavefront reconstruction operates, which is the physical mechanism behind the improved coherence of a second order system. Listeners moving off centres have to move further towards those speakers which are away from the source direction before the sound level from them seriously distorts the desired directional impression. Moreover, the increased directivity of the SDR pattern, coupled with larger numbers of speakers' results in there being a reduced reliance on phantom imaging as system order increases. In the limit, the directivity would make possible the use of a single speaker per MAA but well before that we may find that the additional improvement of going up another order may cease to give any worthwhile improvement.


Waveform reconstruction by Huygen's principle

Since both holophony and Wave Field synthesis use Huygen's principle to reconstruct wavefronts, it is shown here graphically. A linear array of eight transducers is used to recreate a curved wavefront as if it was originating from a source on the left. Note that in practise, the monopole (ie omnidirectional) speakers used in this array will result in a backwards propagating wavefront which will reflect off room boundaries and may cause problems. The use of unidirectional transducers, or omnidirectional ones mounted in the room boundaries is indicated.


[1] Weiland, F. Chr. 1975. "Electronic Music - Musical Aspects of the Electronic Medium" Institute of Sonology, Utrecht State University (internal publication)

[2] Gerzon, Michael A. 1972 'Periphony: With-height Sound Reproduction' Journal of the Audio Engineering Society, Vol. 21 No. 1 Jan/Feb 1972 pp.2-10

[3] Gerzon, Michael A. 1975a "Ambisonics. Part two: Studio Techniques" Studio Sound, August 1975 pp24-26,28 & 30

[4] Fellgett, Peter, 1975 'Ambisonics. Part one: general system description' Studio Sound, August 1975 pp20-22 & 40

[5] Cooper, D.H., and Shiga, T. 1972. 'Discrete Matrix Multi-channel Stereo' Journal of the Audio Engineering Society, Vol. 20, No. 5, June 1972 pp. 346-360

[6] Gerzon, M. A. 1975c 'Panpot and Soundfield Controls' NRDC Ambisonic Technology Report No. 3, August 1975.

[7] Gerzon, M. A. 1975d 'Artificial Reverberations and Spreader Devices' NRDC Ambisonic Technology Report No. 4. August 1975.

[8] Malham, D.G., 1987 'Computer Control of Ambisonic Soundfields' Preprint No. 2463(H2) presented at the 82nd AES convention 1987 10-13 March, London

[9] Malham, D.G. 1993 "3-D sound for virtual reality using Ambisonic techniques" Published as an addendum to the proceedings of the third annual conference on Vitual Reality.VR93-Virtual Reality International 93. London April 1993. Westport: Meckler

[10] Gerzon, M. A. 1975b ' The Design of Precisely Coincident Microphone Arrays for Stereo and Surround Sound' Preprint No. 20, 50th Convention of the Audio Engineering Society, London, March 1975

[11] Farrah, K. (1979) "The SoundField Microphone" Wireless World, November 1979 pp. 99-103

[12] Begault, D. R. 1994 "3-D Sound for Virtual Reality and Multimedia" AP Professional, Boston, San Diego, New York, London, Sydney, Tokyo, Toronto, p.143

[13] Poletti, M. 1996: "The Design of Encoding Functions for Stereophonic and Polyphonic Sound Systems" Journal of the Audio Engineering Society, Vol.44, No.11, November 1996, pp:948-963

[14] Stybel, T.Z. Manglias, C.L., and Perrot, D.R. (1992) "Minimum Audible Movement Angle as a Function of The Azimuth and Elevation of the Source" Human Factors, Vol. 34, pp 267-275, quoted in Begault 1994 page 50.

[15] Boone, M.M., Verheijen, E.N.G. and Jansen, G. 1996 " Virtual Reality by Sound Reproduction Based on Wave Field Synthesis" 100th. Convention of the Audio Engineering Society, preprint 4145, Copenhagen, May 1996

[16] Nicol, R. and Emerit, M. 1998 " Reproducing 3D-Sound for Video Conferencing: A Comparison Between Holophony and Ambisonic. Proceedings of the First COST-G6 Workshop on Digital Audio Effects (DAFX98) Barcelona November 1998, pp17-20 http://www.iua.upf.es/dafx98/

[17] Nicol, R. and Emerit, M. 1999 "3D-Sound Reproduction over an Extensive Listening Area: a Hybrid Method Derived from Holophony and Ambisonic" The Audio Engineering Society 16th International Conference on Spatial Sound Reproduction, Helsinki 1999, preprint no. s66819

[18] Horbach U., Boone, M.M., 1999 "Future Transmission and Rendering Formats for Multichannel Sound" The Audio Engineering Society 16th International Conference on Spatial Sound Reproduction, Helsinki 1999, preprint no.s59711

[19] Jessel, M. 1973 "Acoustique théoretique, propagation et holophonie" Masson, Paris 1973, (referenced in Nicol and Emerit 1998)

[20] Berkhout, A.J. 1988 "A Holographic Approach to Acoustic Control" Journal of the Audio Engineering Society, Vol. 36, No. 12, December 1988 pp. 977 - 995

[21] Snow, W.B. 1953 "Basic Principles of Stereophonic Sound" Journal of the SMPTE vol 61, November 1953, pp 567 - 589, reprinted in "Stereophonic Techniques", ed. John Eargle, Audio Engineering Society, New York 1986 pp 9 - 31

[22] Fox, Barry 1982. "Early Stereo Recording" Studio Sound, Vol 24, No. 5 May 1982, p36-42

[23] Start, E. 1997 "Direct Sound Enhancement by Wave Field Synthesis" Ph.D. Thesis, Delft University of Technology, Netherlands - quoted in Nicol, 1999

[24] Daniel, Jérôme, Rault, Jean-Bernard and Polack, Jean-Dominique 1998 'Ambisonics Encoding of Other Audio Formats for Multiple Listening Conditions', preprint no. 4795, 105th Audio Engineering Society Convention, Sept. 1998,. (Corrected version available by contacting the authors at Centre Commun d'Etudes de Télé-diffusion et Télécommunications, Cesson Sévigné, France)

[25] Bamford, J., Vanderkooy, J. 1995 "Ambisonic Sound for Us" In Proc. 99th Convention of the Audio Engineering Society, preprint 4138

[26] Malham, D.G., 1992, Experience with Large Area 3-D Ambisonic sound systems, Proceedings of the Institute of Acoustics, Volume 14 Part 5 November 1992 pp. 209-216

[27] Vries, D. 1996 "Sound Reinforcement by Wavefield Synthesis: Adaptation of the Synthesis Operator to the Loudspeaker Directivity Characteristics"Journal of the Audio Engineering Society, Vol. 44, No. 12, December 1996, pp.1120-1131

[28] Evans, M.J. Angus, J.A.S. and Tew, A.I., "Analyzing Head-Related Transfer Function Measurements using SurfaceSpherical Harmonics", Journal of the Acoustical Society of America, vol 104, no 4, October, 1998, ppJ2400-2411.

[29] Malham, D.G. 1996 "Ambisonics in the New Media - 'The Technology Comes of Age'", (invited paper), proceedings, AES UK Audio for New Media Conference, pp95-101, London, 25-26 March 1996

[30] Bank, G. and Harris, N. "The Distributed Mode Loudspeaker - Theory and Practise" Proceedings, AES UK Microphones & Loudspeakers Conference March 1998, pp:129-135

1. See, for instance, "Surround Sound Special" EQ Volume 8, Issue 10, October 1997, pp 70-107

2. According to Nicol [16], Holophony was originally proposed in 1973 by M.Jessel [19]

3. Huygen's Principle states that a propagating wavefront may be regarded as consisting of a large (ideally infinite) number of secondary sources which add up to form the wavefront.

4. Email to the sursound mailing list, 26/11/98

5. In a corollary to the sampling theory applied in the digitization of audio signals in the time domain, transducers must be spaced at no more than /2 for the highest frequency of interest. This implies in excess of 55,000 transducers for the recording and reconstruction of an audio wavefront passing through a two metre square surface if frequencies up to 20kHz are to be accurately handled.


All items on the Music Technology Group Web pages are copyright, either of the Group, or of the individuals concerned. Please seek our permission before using any of the text, images or sounds contained herein.

This page is administered by Dave Malham.
Last updated 4th. May, 2000

If you have any suggestions or comments about these Web pages you can reach me at dgm2@york.ac.uk or via the fax number above. All course or other enquires should go to one of the two Departmental contact addresses given above. 

Music Technology Home Page
To return to the YorkWeb home page click here