Course Description

Eneko Agirre (U Basque Country), [introductory/intermediate, 8 hours]

Semantic Processing of Text: Word Sense Disambiguation, Entity Linking and Semantic Similarity

Grasping the meaning of text beyond simple keyword match is one of the key challenges of text processing applications, including machine translation and web search. This course will overview semantic processing techniques dealing with the meaning of words from two perspectives. Word Sense Disambiguation (WSD) examines words in context, and selects the intended sense among the possibilities listed in a dictionary. Semantic Similarity returns the degree to which two words mean the same. We will present a range of techniques, including corpus-based supervised and unsupervised systems and knowledge-based systems. Beyond traditional WSD, we will show that the same techniques can be applied to Entity Linking, where the target is to disambiguate mentions to real people and organizations.


William J. Byrne (Cambridge), [introductory/advanced, 6 hours]

Weighted Finite State Transducers in Statistical Machine Translation

This short course will present some recent advances in statistical machine translation (SMT) using modelling approaches based on Weighted Finite State Transducers (WFSTs) and Finite State Automata (FSA). The course focus will be on decoding procedures for SMT, i.e. the generation of translations using stochastic translation grammars and language models. WFSTs can offer a very powerful modelling framework for language processing. For problems which can be formulated in terms of WFSTs or FSAs, there are general purpose algorithms which can be used to implement efficient and exact search and estimation procedures. This is true even for problems which are not inherently finite state, such as translation with some stochastic context free grammars.

The course will begin with an introduction to WFSTs, pushdown automata, and semirings in the context of SMT. The use of WFST and FSA modelling approaches will be presented for: SMT decoding with phrase-based models; SMT decoding with stochastic synchronous context free grammars (e.g. Hiero); SMT parameter optimisation (MERT); the use of large language models and 'fast' grammars in translation; translation lattice generation; and rescoring procedures such as minimum Bayes risk decoding and system combination. Implementations using the OpenFst toolkit will also be described.

The course material will be suitable for researchers already familiar with SMT and who wish to learn about alternative methods in decoder design. Enough background will be given so that researchers new to machine translation or unfamiliar with applications of WFSTs in natural language processing will also find the material appropriate.


Marcello Federico (Fondazione Bruno Kessler, Trento), [introductory/advanced, 8 hours]

Statistical Language Modeling

Statistical language models (LM) are now a fundamental component of language processing technologies such as speech recognition, machine translation, optical character recognition, etc. The availability of software toolkits such as SRILM and IRSTLM permits now everyone to easily build and integrate LMs in every application. However, given the many different options they offer, it is not always easy to find the optimal configuration and training method for a particular case. These lectures will survey basic and advanced concepts of n-gram LMs, covering both theoretical and implementation issues. Several practical cases of LM estimation and adaptation will be discussed and solved using the IRSTLM open source toolkit.

Software tools:

References (most of them available on the web):

Ralph Grishman (New York), [intermediate, 8 hours]

Information Extraction

Information extraction is the process of creating semantically structured information from unstructured text. We will present methods for identifying and classifying names and other textual references to entities; for capturing semantic relations; and for recognizing events and their arguments. We will consider hand-coded rules and various machine learning approaches, including fully-supervised learning, semi-supervised learning, and distant supervision. (Basic machine learning concepts will be reviewed, but some prior acquaintance with machine learning methods or corpus-trained language models will be helpful.) Several application domains will be briefly described.  Notes for an earlier version of this course (for SSLST 2011) can be found here.


Geoffrey K. Pullum (Edinburgh), [introductory/intermediate, 8 hours]

The Formal Properties of Human Languages: Description with a View to Implementation

This short course, concentrating primarily on English, gives a view of the general properties of human languages designed to be relevant to someone working in the context of language and speech technologies.

In the morning classes I survey what we know from a mathematical standpoint, presupposing only an elementary knowledge of formal languages such as a beginning course on the theory of computation might provide. I outline the basic results on complexity of syntactic structure that might be relevant to the machine recognition, analysis, and translation of human languages; give a critical analysis of the reasons for thinking that English is not finite-state; and argue that context-free parsing is probably adequate, which means the problem of syntactic complexity has been somewhat overestimated. I will also touch on the descriptive complexity view of these issues, which gives a somewhat different perspective.

In the evening classes I will provide a brief and lively review of the fundamental grammatical constructions of English, drawing mainly on The Cambridge Grammar of the English Language (CGEL). The lectures will constitute a guide to how to make intelligent use of CGEL as a reference work, and will also include a brief comparison between the category system that CGEL uses and the remarkably similar one employed for the annotation of the Penn Treebank.


Jian Su (Institute for Infocomm Research, Singapore), [advanced, 4 hours]

Coreference Resolution and Discourse Relation Recognition

Coreference resolution, the task of linking different mentions of the same entity or event in the text, is important for an intelligent natural language processing system. We will present methods for intra- and inter-document coreference resolutions on both entities and events. The part on cross document coreference resolution on entities and entity linking will be a further extension from the corresponding portion on entity linking in the Semantic Processing of Text course. Furthermore, we will also present approaches for recognizing discourse relations to capture the internal structures and logical relations of coherent text units, such as Temporal, Causal and Contrast relations. Discourse relation recognition is an important technology for many natural language processing applications as well, such as summarization, information extraction and question answering.


Christoph Tillmann (IBM T.J. Watson Research Center), [intermediate, 6 hours]

Simple and Effective Algorithms and Models for Non-hierarchical Statistical Machine Translation

In the area of statistical machine translation, non-hierarchical alignment and phrase-based translation models yield close to the state-of-the-art translation results despite their conceptual simplicity. We will give a detailed presentation of algorithms and models for some key problems in statistical machine translation: dynamic-programming based beam-search, large-scale discriminative training, and comparable data extraction. We will trace related work in the field with respect to these algorithms while presenting some simple and efficient solutions.


David R. Traum (U Southern California), [introductory, 8 hours]

Approaches to Dialogue Systems and Dialogue Management

This introductory course will present an overview of some of the most popular current approaches to dialogue system organization. In the first lecture, we will briefly survey some prominent dialogue domains and systems to engage in dialogue within those domains. We will go over the different functional components of a dialogue system and some different approaches to provide that functionality. In the remaining lectures, we will focus on the dialogue management component, and discuss different approaches to dialogue manager construction, including initiative-response techniques, Information Retrieval-inspired techniques, finite state systems, frame based, plan- and agent-based, and information-state based methods. Included will be demonstrations of some representative systems and dialogue system toolkits.