Module 3

Speech and Language Processing with Applications

Objective:

This course concerns with analysis and processing of speech signals for development of different voice biometric based applications. The course will provide hands-on experience to the participants about various tasks involved in the analysis of speech signal for extraction of different information, detection of different events in speech signal, development of speech enhancement systems, development of speech recognition systems, speaker recognition systems, speaker diarization systems, language identification systems and their applications for different real word applications. The course will also include several invited talks from the leading experts working in different application areas of speech and Language processing. Such invited talks may ignite the research community to look the speech and Language processing technology from a different perspective.

Course Outcome:

Upon successful completion of this course, students, faculties and researchers should be able to understand the following:

  • Speech production and perception, information sources in speech, linguistic aspect of speech, acoustic and articulatory phonetics, nature of speech signal.
  • Basic concepts on the analysis and processing of speech signal at the front end for extracting relevant information from speech signal, issues involved in robust feature extraction from the speech signal.
  • Basic concepts on speech enhancement, need, issues in development of state-of-the-art speech enhancement systems and its applications for different real world applications.
  • Concepts on different modeling techniques for development of different pattern recognition systems using voice biometric.
  • Provide hands-on experience on the development of state-of-the-art speaker identification and verification systems and the applications for development of remote authentication systems and forensic applications. Issues in development of robust speaker recognition system under practical operating conditions and limited data conditions will also be addressed.
  • Provide hands-on experience on development of speaker diarization systems and their applications for audio retrieval and the segmentation of multi-speakers speech signal. The Issues involved in development of speaker diarization systems will discussed at length and resolved.
  • Provide hands-on experience on development of state-of-the-art speech recognition system and its applications for different real world applications.
  • Provide hands-on experience on development of language identification system and its applications for different real world problems.
Course Outline:
  • Module 1:

    Speech production and perception, Information sources in speech, Linguistic aspect of speech, Acoustic and articulatory phonetics, Nature of speech signal, Models for speech analysis.

  • Module 2: Short-term processing:

    Overview of Fourier representation, Short-term Fourier transform (STFT), Filter-bank views of STFT, Time, Frequency and Time-Frequency analysis.

  • Module 3: Cesptrum analysis:

    Basis and Development, Homomorphic signal processing, Real and Complex cepstrum, Mel-frequency cepstral coefficient (MFCC), Delta and Delta-Delta.

  • Module 4: Linear Prediction (LP) analysis:

    Basis and Development, Levinson-Durbin’s method, Normalized error, LP spectrum, LP cepstrum, LP residual.

  • Module 5: Modeling techniques for speech and Language processing:

    Dynamic Time Warping (DTW), Vector Quantization (VQ), Gaussian Mixture Model (GMM), GMM-Universal Background Model (UBM), Hidden Markov Model (HMM), N-grams, Artificial Neural Network (ANN), Support Vector Machine (SVM), Joint Factor Analysis, I-vector.

  • Module 6: Applications of speech and Language processing :
    • Development of speech enhancement system:

      Objective, Issues, Development of speech enhancement system by spectral, temporal processing methods.

    • Development of speaker recognition system:

      Objective, Issues, Block diagram description, Classification, Development of text-dependent, text-independent and voice password based speaker identification and verification systems.

    • Development of speaker diarization system:

      Objective, Issues, Block diagram description, Development of speaker diarization systems.

    • Development of speech recognition system:

      Objective, Issues, Block diagram description, Development of speech recognition systems.

    • Development of Language Identification system:

      Objective, Issues, Block diagram description, development of Language identification systems.