31/10/2014

Journal 2007

INTERNATIONAL PROCEEDING

A ROBUST WAVELET-BASED TEXT-INDEPENDENT SPEAKER IDENTIFICATION

Phung Trung Nghia, Pham Viet Binh, Nguyen Huu Thai, Nguyen Thanh Ha, Prayoth Kumsawat.

Abstract: This study proposed a robust text-independent speaker identification based on the Discrete Wavelet Transform (DWT), the Mel-Frequency Discrete Wavelet Coefficients (MFDWC), the wavelet-based sub-band weighting and the Likelihood Combination Gaussian Mixture Model (LCGMM). This method was used in the text-independent speaker identification in compare to the widely used MFCC features recognizer, full-band recognizer and equal sub-band weighting recognizer. Our experimental results show that our proposal achieved higher recognition rate than the others for our Vietnamese speech corpus with clean and white noisy speech.

Proceeding of the IEEE International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2007), Sivikasi, India, pp. 219 – 223, 12/2007.

A LOW BIT RATE WIDE-BAND SPEECH CODER IN THE PERCEPTUAL WAVELET PACKET DOMAIN,

Phung Trung Nghia, Vu Ngoc Phan

Abstract: Speech is the most popular information in telecommunications. There are several methods and standards for speech coding. Most of them are used for narrowband speech. In modern telecommunication systems, wideband speech coding is very important. Wavelet is an efficient signal processing tool for speech coding. Conventional wavelet speech coders use wavelet global or sub-band dependent threshold to allocate bits in each sub-band. It is not very efficient for wideband speech because these thresholds are not close to human auditory hearing. Using psychoacoustic model with temporal and simultaneous properties, we will be able to estimate the threshold close to human hearing. Most of speech and audio coding algorithms rely solely on simultaneous masking models. This paper presented a wavelet packet based wideband speech coding incorporating both backward temporal, forward temporal and simultaneous masking models. The coder used also other lossless compressions. By applying this model we were calculated the bit rate results of approximately 25 kbps while preserving perceptual quality with single channel wide-band speech sampled at 16 KHz.

Proceeding of the International Symposium on Electrical and Electronics (ISEE 2007), Ho Chi Minh city, Vietnam, Track 2, pp 139 – 144, 10/2007.

Download Journal 2007

THAI NGUYEN UNIVERSITY OF INFORMATION AND COMMUNICATION TECHNOLOGY

THAI NGUYEN UNIVERSITY OF INFORMATION AND COMMUNICATION TECHNOLOGY

Publish results

Journal 2007