Johnson, Kevin (1997) Identification and correction of speech repairs in the context of an automatic speech recognition system. Doctoral thesis, Durham University.
Recent advances in automatic speech recognition systems for read (dictated) speech have led researchers to confront the problem of recognising more spontaneous speech. A number of problems, such as disfluencies, appear when read speech is replaced with spontaneous speech. In this work we deal specifically with what we class as speech-repairs. Most disfluency processes deal with speech-repairs at the sentence level. This is too late in the process of speech understanding. Speech recognition systems have problems recognising speech containing speech-repairs. The approach taken in this work is to deal with speech-repairs during the recognition process. Through an analysis of spontaneous speech the grammatical structure of speech- repairs was identified as a possible source of information. It is this grammatical structure, along with some pattern matching to eliminate false positives, that is used in the approach taken in this work. These repair structures are identified within a word lattice and when found result in a SKIP being added to the lattice to allow the reparandum of the repair to be ignored during the hypothesis generation process. Word fragment information is included using a sub-word pattern matching process and cue phrases are also identified within the lattice and used in the repair detection process. These simple, yet effective, techniques have proved very successful in identifying and correcting speech-repairs in a number of evaluations performed on a speech recognition system incorporating the repair procedure. On an un-seen spontaneous lecture taken from the Durham corpus, using a dictionary of 2,275 words and phoneme corruption of 15%, the system achieved a correction recall rate of 72% and a correction precision rate of 75%.The achievements of the project include the automatic detection and correction of speech-repairs, including word fragments and cue phrases, in the sub-section of an automatic speech recognition system processing spontaneous speech.
|Item Type:||Thesis (Doctoral)|
|Award:||Doctor of Philosophy|
|Copyright:||Copyright of this thesis is held by the author|
|Deposited On:||24 Oct 2012 15:07|