Comparative Study of Speaker Segmentation Using MFCC and LSF

##plugins.themes.academic_pro.article.main##

Kiran C. Jamgade
Naresh P. Jawarkar

Abstract

Speaker segmentation is defined as the process by which a speech signal of long duration is partitioned into homogenous segments by detecting changes of speaker identity. Speaker segmentation algorithms can be broadly classified into three categories: model based, metric based and hybrid. In the model-based segmentation, a set of models is derived and trained for different speaker classes from a training corpus. The incoming speech streams are classified using these models [4]. However, in many cases, the pre-knowledge of speakers and acoustic classes are often not available.  Some of the model based classification methods are Gaussian Mixture Models (GMM), GMM with multilayer perceptron (MLP), K-nearest neighbour (KNN), Support Vector Machines (SVM), etc. [5].The objective of the paper is to implement speaker diarization system using distance metric. The database used are the recorded files that contains random number of male and female voice and nonspeech signals such as music, noise etc. The feature extraction process include two techniques: Mel Frequency Cepstral Coefficients(MFCC) and Linear Spectral Frequencies(LSF). For segmentation Hotelling T2-statistic and Bayesian Information Criterion (BIC) technique are used. In segmentation coarse segmentation is carried out with T2 distance and for refinement and confirmation of speaker change BIC distance is used. In this paper, a speaker segmentation system has been performed on recorded data using feature extraction methods such as mel frequency cepstral coefficient (MFCC), linear spectral frequency (LSF) and models for Distance Calculations like T2 and Bayesian Information Criterion (BIC), and thereafter classifying a signal into segments. Finally, we present an analysis of speaker segmentation performance as reported through the MFCC and LSF on and identify important areas for future research. And precision, recall and figure of merit are calculated.

##plugins.themes.academic_pro.article.details##

How to Cite
Jamgade, K. C., & Jawarkar, N. P. (2016). Comparative Study of Speaker Segmentation Using MFCC and LSF. The International Journal of Science & Technoledge, 4(5). Retrieved from http://internationaljournalcorner.com/index.php/theijst/article/view/123849