Eneko Agirre (U Basque Country), [introductory/intermediate, 8 hours]
Semantic Processing of Text: Word Sense Disambiguation, Entity Linking and Semantic Similarity
Grasping the meaning of text beyond simple keyword match is one of the key challenges of text processing applications, including machine translation and web search. This course will overview semantic processing techniques dealing with the meaning of words from two perspectives. Word Sense Disambiguation (WSD) examines words in context, and selects the intended sense among the possibilities listed in a dictionary. Semantic Similarity returns the degree to which two words mean the same. We will present a range of techniques, including corpus-based supervised and unsupervised systems and knowledge-based systems. Beyond traditional WSD, we will show that the same techniques can be applied to Entity Linking, where the target is to disambiguate mentions to real people and organizations.
- Eneko Agirre and Philip Edmonds (Eds.), Word Sense Disambiguation: Algorithms and Applications, Springer 2006
- Roberto Navigli, Word Sense Disambiguation: a Survey, ACM Computing Surveys, 41(2):1-69, ACM Press, 2009
- Daniel Jurafsky and James H. Martin, Speech and Language Processing: An Introduction to Natural Language Processing, Speech Recognition, and Computational Linguistics, 2nd edition, Prentice-Hall, 2009 (Chapters 19 and 20 cover WSD and Similarity)
- Mark Dredze, Paul McNamee, Delip Rao, Adam Gerber, and Tim Finin, Entity disambiguation for knowledge base population, In Proceedings of the 23rd International Conference on Computational Linguistics (COLING), Stroudsburg, USA: 277-285, 2010
- Heng Ji and Ralph Grishman, Knowledge Base Population: Successful Approaches and Challenges, In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, Portland, USA: 1148–1158, 2011
William J. Byrne (Cambridge), [introductory/advanced, 6 hours]
Weighted Finite State Transducers in Statistical Machine Translation
This short course will present some recent advances in statistical machine translation (SMT) using modelling approaches based on Weighted Finite State Transducers (WFSTs) and Finite State Automata (FSA). The course focus will be on decoding procedures for SMT, i.e. the generation of translations using stochastic translation grammars and language models. WFSTs can offer a very powerful modelling framework for language processing. For problems which can be formulated in terms of WFSTs or FSAs, there are general purpose algorithms which can be used to implement efficient and exact search and estimation procedures. This is true even for problems which are not inherently finite state, such as translation with some stochastic context free grammars.
The course will begin with an introduction to WFSTs, pushdown automata, and semirings in the context of SMT. The use of WFST and FSA modelling approaches will be presented for: SMT decoding with phrase-based models; SMT decoding with stochastic synchronous context free grammars (e.g. Hiero); SMT parameter optimisation (MERT); the use of large language models and 'fast' grammars in translation; translation lattice generation; and rescoring procedures such as minimum Bayes risk decoding and system combination. Implementations using the OpenFst toolkit will also be described.
The course material will be suitable for researchers already familiar with SMT and who wish to learn about alternative methods in decoder design. Enough background will be given so that researchers new to machine translation or unfamiliar with applications of WFSTs in natural language processing will also find the material appropriate.
- Cyril Allauzen, Mehryar Mohri, and Brian Roark, Generalized algorithms for constructing statistical language models. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics: 40-47, 2003.
- Graeme Blackwood, Adrià de Gispert, and William Byrne, Efficient path counting transducers for minimum Bayes-risk decoding of statistical machine translation lattices. In Proceedings of the Annual Meeting of the Association for Computational Linguistics: 27-32, 2010.
- David Chiang, Hierarchical phrase-based translation. Computational Linguistics, 33(2): 201-228, 2007.
- Adrià de Gispert, Gonzalo Iglesias, Graeme Blackwood, Eduardo R. Banga, and William Byrne, Hierarchical phrase-based translation with weighted finite state transducers and shallow-n grammars. Computational Linguistics, 36(3): 505-533, 2010.
- Gonzalo Iglesias, Cyril Allauzen, William Byrne, Adrià de Gispert; Michael Riley, Hierarchical Phrase-based Translation Representations. Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing: 1373-1383
- K. Knight and Y. Al-Onaizan, Translation with Finite-State Devices. Proceedings of the 4th AMTA Conference: 421-437, 1998.
- S. Kumar, Y. Deng, and W. Byrne, A weighted finite state transducer translation template model for statistical machine translation. Journal of Natural Language Engineering, 12(1): 35-75, 2006.
- Mehryar Mohri, Fernando C. N. Pereira, and Michael Riley, Weighted Finite-State Transducers, in Speech Recognition, Computer Speech and Language, 16(1): 69-88, 2002.
- Mehryar Mohri, Weighted automata algorithms. In Manfred Droste, Werner Kuich, and Heiko Vogler, editors, Handbook of Weighted Automata. Monographs in Theoretical Computer Science. Springer: 213-254, 2009.
- Roy Tromble, Shankar Kumar, Franz Och, Wolfgang Macherey, Lattice Minimum Bayes-Risk Decoding for Statistical Machine Translation. Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing: 620-629.
Marcello Federico (Fondazione Bruno Kessler, Trento), [introductory/advanced, 8 hours]
Statistical Language Modeling
Statistical language models (LM) are now a fundamental component of language processing technologies such as speech recognition, machine translation, optical character recognition, etc. The availability of software toolkits such as SRILM and IRSTLM permits now everyone to easily build and integrate LMs in every application. However, given the many different options they offer, it is not always easy to find the optimal configuration and training method for a particular case. These lectures will survey basic and advanced concepts of n-gram LMs, covering both theoretical and implementation issues. Several practical cases of LM estimation and adaptation will be discussed and solved using the IRSTLM open source toolkit.
References (most of them available on the web):
- A.L. Berger, V.J. Della Pietra, S.A. Della Pietra, A maximum entropy approach to natural language processing, Computational Linguistics, 22(1):39-71, 1996
- S.F. Chen, J. Goodman, An empirical study of smoothing techniques for language modeling, TR-10-98, Harvard University, 1998
C.D. Manning, H. Schütze, Foundations of Statistical Natural Language Processing, MIT Press, 1999
M. Federico, N. Bertoldi, Broadcast news LM adaptation using contemporary texts, Eurospeech 2001: 239-242
M. Federico, Language model adaptation through topic decomposition and MDI estimation, ICASSP 2002: 773-776
J.R. Bellegarda, Statistical language model adaptation: review and perspectives, Speech Communication, 42:93-108, 2004
T. Brants, A.C. Popat, P. Xu, F.J. Och, J. Dean, Large language models in machine translation, EMNLP 2007: 858-867
D. Jurafsky and J.H. Martin, Speech and Language Processing, Prentice Hall, 2009
H. Schwenk, Continuous space language models, Computer Speech & Language, 21:492-518, 2007
N. Ruiz, M. Federico, Topic adaptation for lecture translation through bilingual latent semantic models, 6th Workshop on Statistical Machine Translation: 294-302, Edinburgh, 2011
Ralph Grishman (New York), [intermediate, 8 hours]
Information extraction is the process of creating semantically
structured information from unstructured text. We will present
methods for identifying and classifying names and other textual
references to entities; for capturing semantic relations; and for
recognizing events and their arguments. We will consider
hand-coded rules and various machine learning approaches,
including fully-supervised learning, semi-supervised learning, and
distant supervision. (Basic machine learning concepts will be
reviewed, but some prior acquaintance with machine learning
methods or corpus-trained language models will be helpful.)
Several application domains will be briefly described. Notes
for an earlier version of this course (for SSLST 2011) can be
- R. Grishman, Information Extraction. The Handbook of
Computational Linguistics and Natural Language Processing, eds.
A. Clark, C. Fox, and S. Lappin, Wiley-Blackwell, 515–-530,
- R. Grishman, Information Extraction. Oxford Handbook
of Computational Linguistics, Ed: Ruslan Mitkov, Oxford,
- Marie-Francine Moens, Information
Extraction: Algorithms and Prospects in a Retrieval
Context, Springer, 2006.
Geoffrey K. Pullum (Edinburgh), [introductory/intermediate, 8 hours]
The Formal Properties of Human Languages: Description with a View to Implementation
This short course, concentrating primarily on English, gives a
view of the general properties of human languages designed to be
relevant to someone working in the context of language and speech
In the morning classes I survey what we know from
a mathematical standpoint, presupposing only an elementary knowledge
of formal languages such as a beginning course on the theory of computation
might provide. I outline the basic results on complexity of syntactic
structure that might be relevant to the machine recognition, analysis,
and translation of human languages; give a critical analysis of the
reasons for thinking that English is not finite-state; and argue that
context-free parsing is probably adequate, which means the problem of
syntactic complexity has been somewhat overestimated. I will also
touch on the descriptive complexity view of these issues, which gives
a somewhat different perspective.
In the evening classes I will provide a brief and lively
review of the fundamental grammatical constructions of English, drawing
mainly on The Cambridge Grammar of the English Language (CGEL). The
lectures will constitute a guide to how to make intelligent use of CGEL
as a reference work, and will also include a brief comparison between
the category system that CGEL uses and the remarkably similar one employed
for the annotation of the Penn Treebank.
Rodney Huddleston and Geoffrey K. Pullum (2005): The Cambridge Grammar
of the English Language. Cambridge University Press. [This is a very
large reference work (1,860 pages). It will be referred to, but is not an
item that students need to purchase.]
Rodney Huddleston and Geoffrey K. Pullum (2005): A Student's Introduction
to English Grammar. Cambridge University Press. [It would be useful
for students to own a copy of this book, though it is not essential.]
Pullum, Geoffrey K. and Gerald Gazdar (1982): Natural languages and
context-free languages. Linguistics and Philosophy 4, 471-504. [An
illustrative work on the context-freeness issue, giving a good idea
of what most of the early arguments are like.]
Pullum, Geoffrey K. and Kyle Rawlins (2007): Argument or no argument?
Linguistics and Philosophy 30(2), 277-287. [A more recent defense of
the view that English has never been shown to be non-context-free,
interesting for showing how issues of syntax dissolve away to reveal
semantic issues of a very different character.]
Beatrice Santorini (1990):
Part-of-Speech Tagging Guidelines for the Penn Treebank Project.
University of Pennsylvania.
Available at various locations on the web, including
Jian Su (Institute for Infocomm Research, Singapore), [advanced, 4 hours]
Coreference Resolution and Discourse Relation Recognition
Coreference resolution, the task of linking different mentions of the same entity or event in the text, is important for an intelligent natural language processing system. We will present methods for intra- and inter-document coreference resolutions on both entities and events. The part on cross document coreference resolution on entities and entity linking will be a further extension from the corresponding portion on entity linking in the Semantic Processing of Text course. Furthermore, we will also present approaches for recognizing discourse relations to capture the internal structures and logical relations of coherent text units, such as Temporal, Causal and Contrast relations. Discourse relation recognition is an important technology for many natural language processing applications as well, such as summarization, information extraction and question answering.
- Vincent Ng, Supervised Noun Phrase Coreference Research: The First Fifteen Years, Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 1396–1411, Uppsala, July 2010
- Xiaofeng Yang, Jian Su, Jun Lang, Chew Lim Tan, Ting Liu, Sheng Li, An Entity-Mention Model for Coreference Resolution with Inductive Logic Programming, Proceedings of ACL-08: HLT, pp. 843-851, Columbus, OH, June 2008
- Xiaofeng Yang, Jian Su, and Chew Lim Tan, A twin-candidate model for learning-based anaphora resolution. Computational Linguistics, 34(3):327–356, 2008
- Bin Chen, Jian Su and Chew Lim Tan, Resolving Event Noun Phrases to Their Verbal Mentions, Proceedings of the 2010 conference on Empirical Methods in Natural Language Processing, pp. 872-881, Cambridge, MA, October 2010
- Bin Chen, Jian Su and Chew Lim Tan, A Twin-Candidate Based Approach for Event Pronoun Resolution using Composite Kernel, Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), pp. 188-196, Beijing, August 2010
- Cosmin Adrian Bejan, Sanda Harabagiu, Unsupervised Event Coreference Resolution with Rich Linguistic Features, Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 1412–1422, Uppsala, July 2010
- Heng Ji, Ralph Grishman, Knowledge Base Population: Successful Approaches and Challenges, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, pp. 1148–1158, Portland, OR, June 2011
- Wei Zhang, Yanchuan Sim, Jian Su and Chew-Lim Tan, Entity Linking with Effective Acronym Expansion, Instance Selection and Topic Modeling, Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence (IJCAI’11), pp. 1909-1914, Barcelona, July 2011
- Wei Zhang, Jian Su and Chew Lim Tan, Entity Linking Leveraging Automatically Generated Annotation, Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), pp. 1290-1298, Beijing, August 2010
- WenTing Wang, Jian Su and Chew Lim Tan, Kernel Based Discourse Relation Recognition With Temporal Ordering Information, Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 710–719, Uppsala, July 2010
- Emily Pitler, Annie Louis, Ani Nenkova, Automatic sense prediction for implicit discourse relations in text, Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, pp. 683–691, Suntec, Singapore, August 2009
- Zhi Min Zhou, Man Lan, Zheng Yu Niu, Yu Xu and Jian Su, The Effects of Discourse Connectives Prediction on Implicit Discourse Relation Recognition, Proceedings of the SIGDIAL 2010 conference, pp. 139-146, Tokyo, September, 2010
Christoph Tillmann (IBM T.J. Watson Research Center), [intermediate, 6 hours]
Simple and Effective Algorithms and Models for Non-hierarchical Statistical Machine Translation
In the area of statistical machine translation, non-hierarchical alignment and phrase-based translation models yield close to the state-of-the-art translation results despite their conceptual simplicity. We will give a detailed presentation of algorithms and models for some key problems in statistical machine translation: dynamic-programming based beam-search, large-scale discriminative training, and comparable data extraction. We will trace related work in the field with respect to these algorithms while presenting some simple and efficient solutions.
- P. Koehn, F.J. Och, and D. Marcu. Statistical Phrase-Based Translation. In Proceedings of HLT-NAACL'03, pp. 127-133, Edmonton, Alberta, Canada, May-June. (http://aclweb.org/anthology-new/N/N03/N03-1017.pdf), 2003
- C. Tillmann. A Unigram Orientation Model for Statistical Machine Translation. In Companion vol. of HLT/NAACL'03, pp. 101-104, Boston, Massachusetts, May. (http://aclweb.org/anthology-new/N/N04/N04-4026.pdf), 2004
- F.J Och and H. Ney. The Alignment Template Approach for Statistical MT. Computational Linguistics 30(4):417-450. (http://aclweb.org/anthology-new/J/J04/J04-4002.pdf), 2004
- C. Tillmann and T. Zhang. A Discriminative Global Training Algorithm for Statistical MT. In Proceedings of Coling/ACL’06, pp. 721-728, Sydney, Australia, July. (http://aclweb.org/anthology-new/P/P06/P06-1091.pdf), 2006
- M. Galley and C.D. Manning. A Simple and Effective Hierarchical Phrase Reordering Model. In Proceedings of EMNLP'08, pp. 848-856, Honolulu, Hawaii, October. (http://aclweb.org/anthology/D08-1089), 2008
- C. Tillmann. A Beam-Search Extraction Algorithm for Comparable Data. In Short Papers of ACL’09, pp. 225-228, Singapore, August. (http://aclweb.org/anthology-new/P/P09/P09-2057.pdf), 2009
- C. Tillmann. Handling Complexity in Decoding for SMT. In Handbook of Natural Language Processing and Machine Translation: DARPA Global
Autonomous Language Exploitation, J. Olive, C. Christianson and J. McCary (eds.), pp. 280-287, Springer: Berlin.
David R. Traum (U Southern California), [introductory, 8 hours]
Approaches to Dialogue Systems and Dialogue Management
This introductory course will present an overview of some of the most popular current approaches to dialogue system organization. In the first lecture, we will briefly survey some prominent dialogue domains and systems to engage in dialogue within those domains. We will go over the different functional components of a dialogue system and some different approaches to provide that functionality. In the remaining lectures, we will focus on the dialogue management component, and discuss different approaches to dialogue manager construction, including initiative-response techniques, Information Retrieval-inspired techniques, finite state systems, frame based, plan- and agent-based, and information-state based methods. Included will be demonstrations of some representative systems and dialogue system toolkits.
Kristiina Jokinen, Michael McTear, Spoken Dialogue Systems, Morgan & Claypool, Synthesis Lectures on Human Language Technologies, 2010
- Anton Leuski, David R. Traum, NPCEditor, Creating Virtual Human Dialogue Using Information Retrieval Techniques. AI Magazine 32(2): 42-56 (2011)
- Dave Raggett, Getting started with VoiceXML 2.0, 2001
- Charles Rich, Candace L. Sidner, Neal Lesh, COLLAGEN: Applying Collaborative Discourse Theory to Human-Computer Interaction, AI Magazine 22(4): 15-26 (2001)
- David Traum, Staffan Larsson, The Information State Approach to Dialogue Management, in Current and New Directions in Discourse and Dialogue, Eds. Jan van Kuppevelt and Ronnie Smith, Kluwer, 2003: 325-354
- Richard Wallace, AIML Overview