Enhancing speech intelligibility
Using knowledge about acoustic cues: Approaches which enhance before contamination can be divided into those which perform automatic enhancement based on more gross physical characteristics of the signal, and those which precisely target and enhance the acoustic cues known to encode phonetic contrasts. We have used the latter approach and have chosen to locate and enhance regions containing acoustic cues to phonetic contrasts in consonants.
Identifying the acoustic cues: Inspiration for how to identify the cues to enhance comes from several sources: research on how speakers change their speech when communicating in background noise shows systematic changes in how they encode phonetic contrasts when in competing noise. In stops, for example, stop gaps are longer, formant transitions longer, and bursts more salient. Mimicking such effects could be hypothesised to improve the intelligibility of a speech signal.
Automatic enhancement of acoustic cues: In our early work, the regions in the speech signal to be enhanced were manually labelled using set criteria. This process is very labour-intensive especially if a large corpus is required (e.g. for auditory training). In our second project, we developed a process to automatically identify potential enhancement regions. UCL Enhance software allows a user to automatically enhance speech materials using our phonetically-motivated cue-enhancement approach or standard techniques (amplitude compression, spectral substraction). This software is freely available from our project website.
Our experimental findings: In our first EPSRC-funded project, we took VCV and semantically unpredictable sentence (SUS) materials produced by a male adult speaker and annotated acoustic cues in information-rich regions of the signals. These regions comprised the formant transitions at the formation and release of the constriction/occlusion, and the cues at release of or during the constriction/occlusion (friction, nasal murmur, burst and aspiration). Formant transition cues were amplified to counteract the reduction in amplitude near the constriction, the weakest voicing cycles being given the most amplification. Occlusion/constriction cues were also amplified to increase their salience. When stimuli manipulated in this way were combined with speech-shaped noise at 0dB SNR and presented to normally-hearing listeners, these manipulations produced improvements in consonant intelligibility of an average of 5% for sentence material, and 10% for VCV material.
Does enhancement work for all speakers?: We have extended this experimental work to investigate the effect of speaker by measuring the effect of enhancement on VCV stimuli produced by two male and two female speakers without phonetic traning. All showed the benefits of enhancement, and the lower the initial intelligibility of the speaker, the greater the improvement produced by enhancement.
Does enhancement work for all listeners? Enhancement has been shown to improve intelligibility in noise for native and non-native listeners. The degree to which listeners benefit from enhancement does vary across listeners and a minority of listeners fail to show higher intelligibility rates for cue-enhanced speech.
What are the applications of this technique? There has been much recent interest in the use of speech enhancement for auditory training for second-language learners or for children with language or reading disorders. We have shown that cue-enhancement is successful in improving the intelligibility of consonant distinctions that are particularly difficult for second language (L2) learners, even without auditory training.