Speech technology has made great strides over the past 3 decades. Automatic speech recognition (ASR), text-to-speech synthesis, voice and language recognition, speech enhancement, and auditory prostheses have all matured during this interval. The Springer Handbook of Speech Processing describes many of the methods and practices used to accomplish these advances. It is a highly technical tome, with most of the 53 chapters laden with equations, illustrations, and tables. The quality of writing is excellent. The prose is clear, simple to understand, and succinct. The illustrations are professionally drawn and uniform in format and appearance. The volume has the look and feel of a professional textbook, an impressive feat given that 85 authors were involved. The Handbook would be well suited as a textbook for an advanced graduate-level course in speech technology and engineering.
A typical chapter begins by briefly summarizing its contents and providing a brief historical overview. More technical (i.e., mathematical) material follows. Many chapters conclude with a discussion of commercial applications as well as a brief summary. In short, the volume strives to strike a balance between the practical and the theoretical, and it usually succeeds.
The Handbook consists of nine sections: (1) Production, Perception and Modeling of Speech, (2) Signal Processing in Speech, (3) Speech Coding, (4) Text-to-Speech Synthesis, (5) Speech Recognition, (6) Speaker Recognition, (7) Language Recognition, (8) Speech Enhancement, and (9) Multichannel Speech Processing. A complete listing of chapter titles can be found at http:∕∕www.springer.com∕engineering∕signals∕book∕978-3-540-49125-5?detailsPage=toc. A particularly useful feature is the accompanying DVD, which contains a fully searchable electronic version (PDF format) of the Handbook. Its interface allows the reader to traverse the text in a highly intuitive way, making the book a pleasure to read in electronic form.
As with any book of this length and scope, some of the chapters are more successful than others in conveying the essence of a field. Particularly detailed and useful are the chapters on pitch extraction, speech synthesis, automatic speech recognition, and environmental robustness; many of these are definitive treatments and should prove useful for years to come. Particularly helpful is the detailed discussion of experimental and computational data, which serves to clarify and enhance the theoretical sections through concrete examples.
Perhaps the volume’s greatest topical weakness is its scanty treatment of speech perception and production. An additional chapter or two on the neuroscience and cognition of spoken language (and its visual analog) would have been welcome. The chapter on commercial applications of automatic speech recognition is dated due to significant changes in the industry over the past .
Another weakness of the Handbook is its focus on the past and present, and relative neglect of the future. Only a few chapters discuss future trends in a meaningful way, the most notable example being “Towards Superhuman Speech Recognition,” which provides a superb description of how ASR systems may function hence. Also lacking is a concerted attempt by the editors to link the chapters into an overarching theoretical framework. Given the book’s broad scope, this is understandable. However, many readers will wonder what ties the chapters together other than a focus on speech technology.
Many of the authors (as well as the editors) have had a professional association with Bell Laboratories at some stage of his∕her career. Fortunately, this slant generally enhances the Handbook’s utility based on Bell Labs’ distinguished research record. However, approaches pioneered at other institutions are also well represented.
The bibliography accompanying each chapter is extensive and comprehensive. Each citation in the electronic version is linked to the appropriate reference in the text, greatly facilitating its use. Occasionally, a chapter’s bibliography is overly selective, reflecting the authors’ particular point of view (e.g., the chapters on speech perception and nonlinear cochlear processing).
In summary, The Springer Handbook of Speech Processing is a first-rate production, providing a definitive treatment of the methods and techniques used in contemporary speech technology. Although its cost is high, the Handbook’s superb quality and comprehensive treatment of highly technical material should prove an attractive investment for advanced students and speech engineering professionals.