STUDY OF RECOGNITION OF MANDARIN SPEECH IN A MICROCOMPUTER | ScholarBank@NUS

Please use this identifier to cite or link to this item: https://scholarbank.nus.edu.sg/handle/10635/166307

DC Field	Value
dc.title	STUDY OF RECOGNITION OF MANDARIN SPEECH IN A MICROCOMPUTER
dc.contributor.author	EDWARD TAN
dc.date.accessioned	2020-04-01T03:14:34Z
dc.date.available	2020-04-01T03:14:34Z
dc.date.issued	1989
dc.identifier.citation	EDWARD TAN (1989). STUDY OF RECOGNITION OF MANDARIN SPEECH IN A MICROCOMPUTER. ScholarBank@NUS Repository.
dc.identifier.uri	https://scholarbank.nus.edu.sg/handle/10635/166307
dc.description.abstract	Speech recognition processing is highly computation intensive and is normally done on a minicomputer or a mainframe. On a microcomputer, with its limited processing power and data storage, it is difficult or impossible to achieve the same level of performance. The main objective of this thesis is to study and develop speech recognition techniques suitable for recognizing isolated Mandarin speech using a microcomputer. Mandarin, unlike some other spoken languages, has some natural features which are favourable towards speech recognition. Each word in Mandarin in monosyllabic and hence, relatively short, thus saving processing time. The phonetic structure of Mandarin sound is orderly and conformed without exceptions to only a few rules. There is also only a finite number of different sounds or pronunciations of words in Mandarin and thus, the structure and unique characteristic of each sound can be studied and categorised. The knowledge gained from this study can be used to develop a classification scheme to further shorten the processing time. Another unique characteristic of Mandarin is that it is tonal. The tonal inflection of the spoken word is related to the change of pitch of the word being spoken. Knowing this behavior of the pitch, the tone of the spoken word can be identified. This provides an independent supplementary information to assist in the identification of the correct work from among several possible candidates and hence to increase the successful speech recognition rate. This thesis is organized as a conglomeration of self contained modules. Chapter 1 gives some background information on the structure of Mandarin speech sounds and an introduction to the two different speech recognition techniques to be used in this thesis. One of them, the spectral analysis method, uses the ear as a model. It recognises speech by analysing it into its spectral components and using the assumption that the same speech sound has similar spectral components. The other is the linear predictive method which uses the mouth as a model. This method constructs mathematically a speech producers is the most likely to produce an unknown sound. Chapter 2 gives a complete description of how the spectral analysis method is adapted for use in a microcomputer. A recognition rate of 79% and a response time of just under 2 second are achievable using this method in an IBM AT microcomputer. Chapter 3 gives a detailed description of how the linear predictive method is adapted for use in a microcomputer. The recognition rate is not as good as that obtained using the spectral analysis method. Without the help of a floating-point co-processor or a signal processing co-processor, this method is clearly not feasible for used in a microcomputer. Chapter 4 gives a detailed description of how the tones of Mandarin words are determined in a microcomputer. An average of 95% recognition rate and an average response time of 3.2 second are achievable using an IBM AI microcomputer. The structure of each of the possible 1241 Mandarin speech sound is studied. It has been found that many of these structures closely resemble one another, that is, they have an inherent tendency to cluster into groups of similar structures. This tendency has been found to be independent of the speaker, the sex of the speaker and the method used for analysing the speech sounds. On the average, about 278 such groups have been found, thus, all the 1241 Mandarin speech sounds are distributed among them. Chapter 5 gives the full details of how this is done using the spectral analysis method and the linear predictive method and how the use of the grouping technique speeds up the recognition process by a factor of at least 3. The effectiveness of the use of tone in the speech recognition process is explored in Chapter 6. Three strategies are used to evaluate this effectiveness. The first strategy is a tone priority one, that is the entire reference set of word templates are distributed into four subsets according to their tone. The potential candidates to identify with the unknown word are selected from only the subset that is of the same tone as the unknown word. The second and third strategies are group priority ones, that is, the most favourable groups are selected first using the grouping technique discussed in Chapter 5. Under the second strategy, the members in the short listed groups are considered potential candidates only if they have the same tone as that of the unknown word. In contrast, under the third strategy, the members who have the same tone as that of the unknown word are ranked above the others in the short listed groups before the potential candidates are selected. All the three strategies have their own merits, thus, the choice of one of them to incorporate into the speech recogniser depends on the relative effect of the successful tone recognition rate on the successful speech recognition rate. With the average achievable successful tone recognition rate of 96%, the third strategy has been found to give an improvement to the successful recognition rate by about 2% while the other two strategies give poorer performances although in both cases, the processing times have been shortened. The unique problem faced by a developer of speech recognizer for Mandarin is that even if the speech recogniser is technically perfect, it is still not possible to identify a Chinese character based on its pronunciation as there are many other Chinese characters with exactly the same pronunciation. This problem is addressed in Chapter 7 and the solution is to have all these characters displayed on the monitor screen and the desired character selected using either the keyboard or speech. When this function is implemented in the operating system rather than in the speech recogniser, it has been found that this method provides a very affective user interface for the input of Chinese characters using speech. Finally, in Chapter 8, the findings of the various investigations are presented and some salient points discussed.
dc.source	CCK BATCHLOAD 20200327
dc.type	Thesis
dc.contributor.department	INFORMATION SYSTEMS & COMPUTER SCIENCE
dc.contributor.supervisor	POO GEE SWEE
dc.description.degree	Master's
dc.description.degreeconferred	MASTER OF SCIENCE
Appears in Collections:	Master's Theses (Restricted)

Show simple item record

Files in This Item:

File	Description	Size	Format	Access Settings	Version
b16814368.PDF		7.14 MB	Adobe PDF	RESTRICTED	None	Log In

Google Scholar^TM

Check

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.