Journal 2007

INTERNATIONAL PROCEEDING 

A ROBUST WAVELET-BASED TEXT-INDEPENDENT SPEAKER IDENTIFICATION

Phung Trung Nghia, Pham Viet Binh, Nguyen Huu Thai, Nguyen Thanh Ha, Prayoth Kumsawat. 

Abstract: This study proposed a robust text-independent speaker identification based on the Discrete Wavelet Transform (DWT), the Mel-Frequency Discrete Wavelet Coefficients (MFDWC), the wavelet-based sub-band weighting and the Likelihood Combination Gaussian Mixture Model (LCGMM).   This method was used in the text-independent speaker identification in compare to the widely used MFCC features recognizer, full-band recognizer and equal sub-band weighting recognizer. Our experimental results show that our proposal achieved higher recognition rate than the others for our Vietnamese speech corpus with clean and white noisy speech.

Proceeding of the IEEE International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2007), Sivikasi, India, pp. 219 – 223, 12/2007.

A LOW BIT RATE WIDE-BAND SPEECH CODER IN THE PERCEPTUAL WAVELET PACKET DOMAIN,

Phung Trung Nghia, Vu Ngoc Phan 

Abstract: Speech  is  the most  popular  information  in  telecommunications. There  are  several methods  and standards  for  speech  coding.  Most  of  them  are  used  for  narrowband  speech.  In  modern telecommunication systems, wideband speech coding is very important. Wavelet  is  an  efficient  signal  processing  tool  for  speech  coding.  Conventional wavelet  speech coders use wavelet global or sub-band dependent threshold to allocate bits in each sub-band. It is not very efficient for wideband speech because these thresholds are not close to human auditory hearing.  Using psychoacoustic model with  temporal and  simultaneous properties, we will be able  to estimate the threshold close to human hearing. Most of  speech  and  audio  coding algorithms  rely  solely on  simultaneous masking models. This paper presented a wavelet packet based wideband speech coding incorporating both backward temporal, forward temporal and simultaneous masking models. The coder used also other lossless compressions. By  applying  this  model  we  were  calculated  the  bit  rate  results  of  approximately  25  kbps  while preserving perceptual quality with single channel wide-band speech sampled at 16 KHz.

Proceeding of the International Symposium on Electrical and Electronics (ISEE 2007), Ho Chi Minh city, Vietnam, Track 2, pp 139 – 144, 10/2007.

Download Journal 2007