Gaussian Mixture Model (GMM) Implementation Using EM Algorithm

Resource Overview

This paper presents an effective implementation of Gaussian Mixture Models (GMM), a classic speaker recognition algorithm, using the Expectation-Maximization (EM) algorithm. The study primarily simulates GMM's noise robustness performance under various acoustic environments, yielding valuable insights for practical applications. Key implementation aspects include parameter initialization strategies and convergence criteria for the EM iteration process.

Detailed Documentation

In the field of speech recognition, Gaussian Mixture Models (GMM) are widely employed. This study investigates the prosodic performance of GMM models under different noise conditions through implementation of EM-based GMM modeling. The EM algorithm implementation involves two key phases: the E-step computes posterior probabilities using Bayes' theorem, while the M-step updates model parameters through maximum likelihood estimation. To simulate diverse acoustic environments, we incorporated multiple noise types including white noise, pink noise, and brown noise. The results demonstrate slight variations in GMM performance across different noise conditions. Furthermore, we observed significant performance improvement in certain noisy environments, a finding that holds considerable importance for enhancing speech recognition algorithms. The implementation utilizes logarithmic probabilities to prevent numerical underflow during likelihood calculations.