2012 International Summer School in Language and Speech Technologies (SSLST 2012), Tarragona, Spain, July 30 – August 3, 2012
|Registration||Accommodation||Travel and Venue||Students||Slides||Contact|
In this series of lectures, we shall study speech perception and the perceptual space of consonants. First we will address the auditory system, including the cochlea and the early auditory pathway. Next we explore auditory and phoneme feature spaces that define the plosive (p,t,k,b,d,g) and fricative (s,S,f,t,T,z,Z) consonants. For example, what distinguishes /t/ from /d/ and /p/ or /s/ from /S/? Knowing the features that define the consonants is critical to improving speech coding and speech recognition software, or to understand hearing aid signal processing. The information presented in these lectures is based on research by the author and his students, as described in the references given below. Issues rarely addressed will include the dynamic range of the auditory system and of speech, and the phoneme error rate as a function of the signal to noise ratio. Software will be provided so that the students may modify speech sounds themselves. This will require the students to have their own PC, running Matlab.
Hours | discussion/topic
After reviewing current state-of-the-art in HMM-based automatic speech recognition (ASR), we will discuss hybrid systems using HMM and artificial neural networks (ANNs), as well as the current trends towards using phone and subword unit posterior distributions (also often referred to as “categorical distributions”) in new types of HMMs, or directly as new HMM features.
In the second part of our course, we will then focus on new trends in multilingual speech processing, including multilingual speech recognition, multilingual speech synthesis, and the convergence between the two. Indeed, over the last decade, ASR and TTS technologies have shown a convergence towards statistical parametric approaches. And we believe that properly addressing complex multilingual ASR and TTS tasks (including for low-resourced languages), with the goal to improve the robustness and quality of both speech recognition and speech synthesis systems, will require looking at those problems in such an integrated way.
As part of the most advanced topics, one of the objectives of the present course is thus to investigate multiple, related facets of the multilingual ASR and TTS problems, mainly focusing on the key aspects of cross-language and speaker adaptation, while also primarily focusing on those approaches that aim at reducing the gap between speech recognition and speech synthesis.
This course will assume some minimum knowledge in statistical pattern processing and speech signal processing.
Statistical machine translation is nowadays among the most popular and active research fields in natural language processing. This crash course offers a general introduction to the problem and applications of machine translation, followed by five lectures focusing on core techniques and approaches of statistical machine translation. References to open source software, language resources and benchmarks will be also given to let interested students put into practice what acquired during the course.
Spoken language understanding (SLU) investigates human-machine and human-human communication by leveraging technologies from signal processing, pattern recognition, machine learning and artificial intelligence. SLU systems are designed to extract the meaning from speech utterances and their applications are vast, from conversational agents (or companions) to meeting summarization and speech and language analytics. In these lectures, we will define the problem of speech understanding, current grammar-based and data-driven models as well as the type of semantic structures used in the latest advanced SLU systems. In the last part, we will review current research challenges and SLU system case studies.
This course covers key ideas at the junction of natural language processing (NLP) and machine learning. The goal is to make it easier for NLP researchers to follow relevant research in machine learning, and to contribute to the growing body of research that uses advanced statistical modeling techniques to solve hard language processing problems. The tutorial breaks down into three main parts.
Probabilistic Graphical Models. Probabilistic graphical models are a major topic in machine learning. They provide a foundation for statistical modeling of complex data, and starting points (if not full-blown solutions) for inference and learning algorithms. They generalize many familiar methods in NLP. We'll cover Bayesian networks, Markov networks, the relationship between them, and present inference as the central question when working with graphical models.
Linear structure models. Most problems in linguistic analysis are currently solved by applying discrete optimization techniques (dynamic programming, search, and others) to identify a structure that maximizes some score, given an input. We describe a few ways to think about the problem of prediction itself (a kind of inference), and review key approaches to learning structured prediction models. An emphasis will be placed on unifying a wide range of approaches (generative models, conditional models, structured perceptron, structured max margin).
Incomplete data. Since we will never have as much annotated linguistic data as we'd like in all the languages, domains, and genres for which we'd like to do NLP, semisupervised and unsupervised learning have become hugely important. We show how the foundations from the first two parts can be extended to provide a framework for learning with incomplete data. We’ll review Expectation-Maximization in light of what we have covered so far and discuss recently proposed Bayesian techniques.
The objective of this course is to introduce the basics of processing speech signals to extract features of speech production for various applications such as speech recognition, speaker recognition and speech enhancement. No prior background of speech or signal processing is required. Knowledge of basic mathematics at degree level is assumed. This self-contained course consists of four parts: