AUTOMATIC CUE-ENHANCEMENT OF NATURAL SPEECH FOR IMPROVED INTELLIGIBILITY
Investigators: Dr Valerie Hazan and Dr Mark Huckvale
Research Fellows: Dr Andrew Simpson (Yr 1) and Dr Marta Ortega (Yrs 2 and 3)
In this project, 'enhancement' refers to processing a clean source speech signal such that its intelligibility is more resistant to subsequent degradation. This kind of enhancement is relevant in at least two application areas: in telecommunications where a speech signal is degraded by characteristics of the channel (e.g. noise, band-limits, coding system or reverberation); and in speech and language therapy and second language learning where a speech signal can be emphasised in a computer-based training system to help a client develop phonetic discrimination abilities. Our approach therefore differs from conventional signal enhancement in that it takes a phonetically-motivated approach in the selection of regions of the signal that are to be enhanced via selective amplification.
In previous work, this ‘cue-enhancement’ approach was found to significantly increase the intelligibility of speech in noise. However, the practical application of the technique was limited by the fact that the regions of the speech signal to be enhanced needed to be manually labelled. The principal aim of this project was therefore to automate the identification and enhancement of ‘landmark’ regions containing a high density of acoustic cues and to demonstrate improvements in intelligibility at least equal to that obtained for manually-enhanced materials.
To achieve this required fulfilling the following objectives: (i) to find good methods for the automatic identification of potential enhancement regions (PERs), (ii) to investigate the effect of errors in automatic PER identification on the intelligibility of enhanced speech (iii) to work with more natural speech material, (iv) to work with typical signals and degradations.
We extended our previous findings by investigating the effect of speaker and listener characteristics on speech intelligibility for enhanced speech presented in noise. We also showed that cue-enhancement can improve the intelligibility of consonant distinctions that are particularly difficult for second language (L2) learners as they are assimilated to the same sounds in the listeners’ first language. We have implemented a technique for automatic cue-enhancement via the automatic identification of potential enhancement regions (PERs). Little loss in intelligibility was seen between the manually-tagged and automatically-enhanced materials. In some of the intelligibility tests, although enhancement did improve intelligibility scores overall relative to natural speech, the overall statistical effects were non-significant, partly due to the variance associated with the speech materials used.
We have produced Windows-based software ("UCL Enhance") that allows a user to automatically enhance speech materials using our phonetically-motivated cue-enhancement approach or standard techniques (amplitude compression, spectral substraction). This software is freely available from our project website. The user has control over the level of amplification and regions to be enhanced so that the software can be used for further evaluations of the technique as well as for specific applications. This software will be particularly useful for applications (such as auditory training) that require large amounts of speech materials as manual tagging is extremely time-consuming.
For further information contact: Dr Valerie Hazan (email@example.com)
Project website: http://www.phon.ucl.ac.uk/enhance/
EPSRC PROJECT GRANT GR/J10426
Start date: Sep 25 1993 End date: Sep 24 1996
ENHANCEMENT OF THE INTELLIGIBILITY
OF NATURAL AND SYNTHETIC SPEECH
Investigators: Dr Valerie Hazan and Prof. Adrian Fourcin
Research Fellow: Dr Andrew Simpson
Listeners may have great difficulty in perceiving speech when it is heard in the presence of noise. This is because much of the acoustic information which signals features of the speech sounds are masked by noise, and therefore inaudible. The problem is especially acute for the perception of consonants as the acoustic information which characterises them is often of low intensity and brief duration.
There has been much interest in the field of speech enhancement, which aims to make speech more ‘resistant’ to noise interference. Most studies take the approach of trying to filter out the noise and therefore reduce its effect on the perception of the speech. Unfortunately, this approach usually affects the quality of the speech signal as well. Our approach is quite different in that we aim to enhance the speech signal before it is affected by noise, in order to create a ‘super-speech’ which will be less affected by noise than normal speech. The enhanced speech does sound natural as we only manipulate short sections of the speech signal. The key is that these regions are carefully selected and are those which are known to contain a high concentration of information about the identity of consonants.
In this project, we first investigated which consonants produced by speech synthesis systems are most often misperceived by listeners. In the second phase of the work, we evaluated different types of enhancement targeted on the regions of the speech signal which carry information about these ‘vulnerable’ features. These included increasing the intensity and filtering the regions of the consonant itself, and also increasing the intensity of vowel onset and offset regions. We first evaluated the efficacy of these enhancements on natural speech, using highly controlled nonsense vowel-consonant-vowel words. Significant increases in speech intelligibility in noise were obtained for the enhanced words, relative to the unenhanced words presented in the same amount of background noise. We then applied these enhancements to complete sentences produced by the same male speaker. Again, significant increases in speech intelligibility were obtained. The final aim of the work was to see whether these enhancements would also be successful in increasing the intelligibility of synthetic speech produced by a text-to-speech system. The effects of enhancements on synthetic speech intelligibility were much more variable than with natural speech, but appeared to be related to the degree of naturalness of the synthesised sentence.
This project has demonstrated the significant effect of enhancing ‘landmark regions’ of the speech signal on speech intelligibility when the signal is subsequently degraded. There are clear practical applications for the type of material developed so far (pre-recorded and annotated natural speech and text-to-speech synthesis). These include speech technology applications such as voicemail and telephone-based information services but also other applications such as the development of auditory training material for language-impaired listeners and second-language learners.
Contact person: Dr Valerie Hazan, Dept of Phonetics and Linguistics,
UCL, 4 Stephenson Way, London NW1 2HE. Email: firstname.lastname@example.org