International Summer School on Deep Learning
 17th — 21th July 2017, Bilbao, Spain 

Course Description


Keynotes (To be completed)

Li Deng Citadel

Recent Advances in Unsupervised Deep Learning

Most of deep learning algorithms are supervised, requiring a large amount of paired input-output data to train the parameters (e.g. DNN weights) in the learning systems. Such data are often very expensive to acquire in many practical applications. Unsupervised learning is aimed to eliminate the use of such costly training data in learning the system parameters, and is expected to become a new driving force for the future breakthroughs in artificial intelligence (AI) applications.

The key to successful unsupervised learning is to intelligently exploit rich sources of world knowledge and prior information, including inherent statistical structures of input and output, nonlinear (bi-directional) relations between input and output (both inside and outside the application domains), and distributional properties of input/output sequences. In this keynote, I will present a set of recent experiments on unsupervised learning in sequential classification tasks. The novel unsupervised learning algorithm to be described, inspired by concepts from cryptography research, carefully explores the statistical structure in output sequences, and is shown to achieve classification accuracy comparable to the fully supervised system.

Li Deng recently joined Citadel, one of the most successful investment firms in the world, as its Chief AI Officer. Prior to Citadel, he was Chief Scientist of AI and Partner Research Manager at Microsoft. Prior to Microsoft, he was a tenured Full Professor at the University of Waterloo in Ontario, Canada. He was a Fellow of the IEEE, the Acoustic Society of America, and the International Speech Communication Association. He received the 2015 IEEE SPS Technical Achievement Award for his 'Outstanding Contributions to Automatic Speech Recognition and Deep Learning' and numerous best paper awards detailing these contributions.

Richard Socher Salesforce

Tackling the Limits of Deep Learning

Deep learning has made great progress in a variety of language tasks. However, there are still many practical and theoretical problems and limitations. In this talk I will introduce solutions to some of these:
How to predict previously unseen words at test time.
How to have a single input and output encoding for words.
How to grow a single model for many tasks.
How to use a single end-to-end trainable architecture for question answering.

Richard Socher is Chief Scientist at Salesforce where he leads the company's research efforts and works on bringing state of the art artificial intelligence solutions to Salesforce.
Prior to Salesforce, Richard was the CEO and founder of MetaMind, a startup acquired by Salesforce in April 2016. MetaMind's deep learning AI platform analyzes, labels and makes predictions on image and text data so businesses can make smarter, faster and more accurate decisions than ever before.
Richard was awarded the Distinguished Application Paper Award at the International Conference on Machine Learning (ICML) 2011, the 2011 Yahoo! Key Scientific Challenges Award, a Microsoft Research PhD Fellowship in 2012, a 2013 'Magic Grant' from the Brown Institute for Media Innovation, the 2014 GigaOM Structure Award and is currently a member of the WEF Young Global Leaders Class of 2017.
Richard obtained his PhD from Stanford working on deep learning with Chris Manning and Andrew Ng and won the best Stanford CS PhD thesis award.

30 Courses

Narendra Ahuja University of Illinois, Urbana-Champaign

Basics of Deep Learning with Applications to Image Processing, Pattern Recognition and Computer Vision


This course covers the fundamentals of different deep learning architectures, which will be explained through three types of mainstream applications, to image processing, pattern recognition and computer vision. A range of network architectures will be reviewed, including multi-layer perceptrons, sparse auto-encoders, restricted Boltzmann machines, and convolutional neural networks. These networks will be illustrated with applications in three categories, each characterized by the type of output into which the input image is transformed. Specifically, the categories are characterized by:
i) Image to image transformation (e.g., image denoising and colorization -- here the output is an image of the same complexity as the input)
ii) Image to mid-level representation (e.g., image segmentation, contour detection -- here the output image is compact as it depicts only certain succinct properties of the image)
iii) Image to high-level representation (e.g., image classification, object detection and face recognition -- here the output is a description of the input image, confined to a small number of bits)

Basics and Network Architectures:
1) Basic concepts from Probability and information theory
2) Introduction to linear classifiers, logistic regression, gradient descent and optimization
3) Neural networks and backpropagation
4) Sparse auto-encoders and restricted Boltzmann machines
5) Basics of convolutional neural networks
6) General practices in model training
7) Deconvolution, ConvNet visualization
8) Popular ConvNet architectures used in computer vision.

1) Image Denoising, Image Colorization
2) Clustering, Image Segmentation, Contour Detection
3) Image Classification, Object Detection, Face Recognition

[1] Christopher. M. Bishop, Neural Networks for Pattern Recognition, 1995.
[2] Ian Goodfellow, Yoshua Bengio and Aaron Courville, Deep Learning, MIT Press, 2017.
[3] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard and L. D. Jackel, Backpropagation Applied to Handwritten Zip Code Recognition, Neural Computation, 1989.
[4] A. Krizhevsky, I. Sutskever and G. E. Hinton, Imagenet Classification with Deep Convolutional Neural Networks, NIPS, 2012.
[5] Ross Girshick, Jeff Donahue, Trevor Darrell and Jitendra Malik, Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, CVPR, 2014.
[6] Yaniv Taigman, Ming Yang, Marc'Aurelio Ranzato and Lior Wolf, DeepFace: Closing the Gap to Human-Level Performance in Face Verification, CVPR, 2014.
[7] Jonathan Long, Evan Shelhamer and Trevor Darrell, Fully Convolutional Networks for Semantic Segmentation, CVPR, 2015.
[8] E Ergul, N Arica, N. Ahuja and S. Erturk, Clustering Through Hybrid Network Architecture With Support Vectors, IEEE Trans. on Neural Network and Learning Systems, 2016.
[9] Richard Zhang, Phillip Isola, Alexei A. Efros. Colorful Image Colorization, ECCV, 2016.

Linear Algebra and Calculus, Probability and Statistics.Basics of Image Processing, Pattern Recognition and Computer Vision

Narendra Ahuja is Research Professor at the University of Illinois at Urbana-Champaign Dept. of Electrical and Computer Engineering, Beckman Institute, and Coordinated Science Laboratory, and Founding Director of Information Technology Research Academy, Ministry of Electronics and Information Technology, Government of India. He received B.E. with honors in electronics engineering from Birla Institute of Technology and Science, Pilani, India, M.E. with distinction in electrical communication engineering from Indian Institute of Science, Bangalore, India, and Ph.D. in computer science from University of Maryland, College Park, USA. In 1979, he joined the faculty of the University of Illinois where he was Donald Biggar Willet Professor of Engineering until 2012. During 1999-2002, he served as the Founding Director of International Institute of Information Technology, Hyderabad, India. He has co-authored the books Pattern Models (Wiley), Motion and Structure from Image Sequences (Springer-Verlag), and Face and Gesture Recognition (Kluwer). He awards include: Emanuel R. Piore award of the IEEE, Technology Achievement Award of the International Society for Optical Engineering, and TA Stewart-Dyer/Frederick Harvey Trevithick Prize of the Institution of Mechanical Engineers; and with his students, best paper awards from International Conferences on Pattern Recognition (Piero Zamperoni Award, etc.), Symposium on Eye Tracking Research and Applications, IEEE International Workshop on Computer Vision in Sports and IEEE Transactions on Multimedia. He has received 4 patents. His algorithms and prototype systems have been used by about 10 companies/other organizations. He is a fellow of IEEE, AAAI, IAPR, ACM, AAAS and SPIE.
This course will be offered in collaboration with Dr. Jagannadan Varadarajan, Research Scientist, Advanced Digital Sciences Center, Singapore. Dr. Jagannaddan received his Ph.D. in computer science from EPFL, Switzerland in 2012. His interests include computer vision and machine learning.

Pierre Baldi University of California, Irvine

Deep Learning: Theory and Applications to the Natural Sciences


The process of learning is essential for building natural or artificial intelligent systems. Thus, not surprisingly, machine learning is at the center of artificial intelligence today. And deep learning--essentially learning in complex systems comprised of multiple processing stages--is at the forefront of machine learning. The lectures will provide an overview of neural networks and deep learning with an emphasis on first principles and theoretical foundations. The lectures will also provide a brief historical perspective of the field. Applications will be focused on difficult problems in the natural sciences, from physics, to chemistry, and to biology.

1: Introduction and Historical Background. Building Blocks. Architectures. Shallow Networks. Design and Learning.
2: Deep Networks. Backpropagation. Underfitting, Overfitting, and Tricks of the Trade.
3: Two-Layer Networks. Universal Approximation Properties. Autoencoders.
4: Learning in the Machine. Local Learning and the Learning Channel. Dropout. Optimality of BP and Random BP.
5: Convolutional Neural Networks. Applications.
6: Recurrent Networks. Hopfield model. Boltzmann machines.
7: Recursive and Recurrent Networks. Design and Learning. Inner and Outer Approaches.
8: Applications to Physics.
9: Applications to Chemistry.
10: Applications to Biology.

Basic algebra, calculus, and probability at the introductory college level. Some previous knowledge of machine learning could be useful but it not required.

Pierre Baldi earned MS degrees in Mathematics and Psychology from the University of Paris, and a PhD in Mathematics from the California Institute of Technology. He is currently Chancellor's Professor in the Department of Computer Science, Director of the Institute for Genomics and Bioinformatics, and Associate Director of the Center for Machine Learning and Intelligent Systems at the University of California Irvine. The long term focus of his research is on understanding intelligence in brains and machines. He has made several contributions to the theory of deep learning, and developed and applied deep learning methods for problems in the natural sciences such as the detection of exotic particles in physics, the prediction of reactions in chemistry, and the prediction of protein secondary and tertiary structure in biology. He has written four books and over 300 peer-reviewed articles. He is the recipient of the 1993 Lew Allen Award at JPL, the 2010 E. R. Caianiello Prize for research in machine learning, and a 2014 Google Faculty Research Award. He is and Elected Fellow of the AAAS, AAAI, IEEE, ACM, and ISCB.

Sven Behnke University of Bonn

Visual Perception using Deep Convolutional Neural Networks


The abundance of devices with cameras and numerous real-world application scenarios for AI and autonomous robots create an increasing demand for human-level visual perception of complex scenes. Hierarchical convolutional neural networks have a long and successful history for learning pattern recognition tasks in visual perception. They extract increasingly complex features by local, convolutional computations, and create invariances to transformations by pooling operations. In recent years, advances in parallel computing, such as the use of programmable GPUs, the availability of large annotated image and video data sets, and advances in deep learning methods yielded dramatic progress in visual perception performance. The course will cover motivations for deep convolutional networks from visual statistics and visual cortex, feed-forward networks for image categorization, object detection, and object-class segmentation, recurrent architectures for visual perception, 3D perception, and spatial transformations in convolutional networks. State-of-the-art examples from computer vision and robotics will be used to illustrate the approaches.

1. Motivation of deep convolutional networks by image statistics
2. Biological background: Visual cortex
3. Feed-forward convolutional networks for image categorization
4. Object detection
5. Semantic segmentation
6. Recurrent convolutional networks for video processing
7. 3D perception
8. Spatial image transformations

[1] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11):2278-2324, 1998
[2] S. Behnke: Hierarchical neural networks for image interpretation, LNCS 2766, Springer, 2003.
[3] D. Scherer, A. C. Müller, S. Behnke: Evaluation of pooling operations in convolutional architectures for object recognition. ICANN, 2010.
[4] D Scherer, H Schulz, S Behnke: Accelerating large-scale convolutional neural networks with parallel graphics multiprocessors. ICANN, 2010.
[5] H Schulz, S Behnke: Learning Object-Class Segmentation with Convolutional Neural Networks, ESANN, 2012.
[6] R. Memisevic: Learning to Relate Images. IEEE Trans. Pattern Anal. Mach. Intell. 35(8): 1829-1846, 2013.
[7] J. Schmidhuber: Deep learning in neural networks: An overview. Neural Networks 61: 85-117, 2015.
[8] M Schwarz, H Schulz, S Behnke: RGB-D object recognition and pose estimation based on pre-trained convolutional neural network features. ICRA, 2015.
[9] I. Goodfellow, Y. Bengio, A. Courville: Deep learning, MIT Press, 2016.
[10] M. S. Pavel, H. Schulz, S. Behnke: Object class segmentation of RGB-D video using recurrent convolutional neural networks. Neural Networks 88:105-113, 2017.
[11] M. Schwarz, A. Milan, A.S. Periyasamy, S. Behnke: RGB-D object detection and semantic segmentation for autonomous manipulation in clutter. International Journal of Robotics Research, 2017.
[12] J. Dai, H.i Qi, Y. Xiong, Y. Li, G. Zhang, H.Hu, Y. Wei: Deformable Convolutional Networks. arXiv:1703.06211, 2017

Basic knowledge of neural networks, image processing, and machine learning

Sven Behnke is professor for Autonomous Intelligent Systems at University of Bonn, Germany. He received a MS degree in Computer Science in 1997 from Martin-Luther-Universität Halle-Wittenberg and has been investigating deep learning since. In 1998, he proposed a hierarchical recurrent convolutional neural architecture – Neural Abstraction Pyramid – for which he developed unsupervised methods for learning feature hierarchies and supervised training for computer vision tasks like superresolution, image reconstruction, semantic segmentation, and object detection. In 2002, he obtained a PhD in Computer Science on the topic Hierarchical Neural Networks for Image Interpretation from Freie Universität Berlin. He spent the year 2003 as postdoctoral researcher at the International Computer Science Institute, Berkeley, CA. From 2004 to 2008, Sven Behnke headed the Humanoid Robots Group at Albert-Ludwigs-Universität Freiburg. His research interests include deep learning and cognitive robotics.

Mohammed Bennamoun University of Western Australia

Deep Learning for Computer Vision


There has been a surge of opportunities for the development of deep learning algorithms & platforms for advanced vision systems and smart robots to operate indoors (in messy living environments), underwater, and in the air (with drones). This has been boosted by the availability of high performance computing, large amounts of visual data (big data), and the recent introduction of new sensors (e.g., 3D video sensors). These systems will reduce the expensive costs associated with elder's health and home care expenses, and enhance competitiveness in agriculture & marine economies. This lecture will give a brief introduction to Computer Vision, then provides a detailed cover of Artificial neural networks, and focus on two main deep learning networks, namely Convolutional Neural Networks (CNNs), and Auto-encoders and their applications in the development of vision systems.

Session 1: In this session we will address the following: what is computer vision; feature extraction and classification; why deep learning (engineered features vs learned features); importance of sensing, high performance computing, and big data for deep learning; image understand and ultimate goal of computer vision.
Session 2: Artificial Neural Networks basics: artificial neuron characteristics (activation function); types of architectures (feed-forward networks vs. recurrent networks); types of learning rules.
Session 3: Feed-forward networks and their training: Single Layer Perceptron (SLP), Multi-layer Perceptron (MLP), and back-propagation.
Session 4: Deep learning and why training is difficult with more layers, and how to solve it (how to train and debug large-scale and deep multi-layer neural networsk).
Session 5: Convolutional Neural Networks & variants (with tools & libraries), and their application to computer vision.
Session 6: Auto-Encoders and their application to computer vision

Artificial Neural Network basics:
[1] M. Bennamoun, Lecture notes on Slideshare:
[2] Richard Lippmann “An Introduction to Computing with Neural Nets”, IEEE ASSP Magazine, April 1997.
[3] Richard Lippmann, “Pattern Classification using Neural Networks”, IEEE Communication Magazine, November 1989.
[4] Anil Jain, Jianchang Mao, and K.M. Mohiuddin, “Artificial Neural Networks: A Tutorial”, IEEE Computer Magazine, Vol. 29, Issue 3, March 1996.
[5] L. Fausett, “Fundamentals of Neural Networks”, Prentice-Hall, 1994.
[6] J.M. Zurada, “Introduction to Artificial Neural Systems”, West Publishing Company, 1992.
Deep learning:
[7] H. Larochelle, Y. Bengio, J. Louradour, and P. Lamblin, “Exploring Strategies for Training Deep Neural Networks”, Journal of Machine Learning Research, 2009.
[8] Y. Bengio, “Learning Deep Architectures for AI”, Foundations and Trends in Machine Learning, 2009.
[9] Y. LeCun, L. Bottou, G. Orr, and K.-R. Mller, “Efficient backprop,” in Neural Networks: Tricks of the Trade. New York, NY, USA: Springer, 2012, vol. 7700, pp. 9–48.
[10] S. H Khan, M. Bennamoun, F. Sohel, and R. Togneri, “Automatic shadow detection and removal from a single image”, IEEE transactions on pattern analysis and machine intelligence, Vol. 38(3), 2016.
[11] M Hayat, M Bennamoun, S An, “Deep reconstruction models for image set classification”, Vol. 37 (4), 2015.

Basic knowledge of linear algebra, and statistics

Mohammed Bennamoun is currently a W/Professor at the School of Computer Science and Software Engineering at The University of Western Australia. He lectured in robotics at Queen's, and then joined QUT in 1993 as an associate lecturer. He then became a lecturer in 1996 and a senior lecturer in 1998 at QUT. In January 2003, he joined The University of Western Australia as an associate professor. He was also the director of a research center from 1998-2002. He is the co-author of the book Object Recognition: Fundamentals and Case Studies (Springer-Verlag, 2001). He has published close to 100 journals and 250 conference publications. His areas of interest include control theory, robotics, obstacle avoidance, object recognition, artificial neural networks, signal/image processing, and computer vision. More information is available on his website.

This course will be delivered with the help of Dr. H. Rahmani & Dr. S.A. Shah:

Syed Afaq Ali Shah obtained his PhD from the University of Western Australia (UWA) in the area of 3D computer vision. He currently works as a research associate at UWA. His research interests include deep learning, 3D face/object recognition, 3D modelling and image segmentation.

Hossein Rahmani completed his PhD from The University of Western Australia. He has published several papers in conferences and journals such as CVPR, ECCV, and TPAMI. He is currently a Research Fellow in the School of Computer Science and Software Engineering at The University of Western Australia. His research interests include computer vision, action recognition, 3D shape analysis, and machine learning.

Hervé Bourlard Idiap Research Institute

Deep Sequence Modeling: Historical Perspective and Current Trends


In this lecture, I will review the key approaches towards sequence processing, including hidden Markov models (HMM), multilayer perceptrons (MLP, new called Deep Neural Networks, DNNs), and hybrid HMM/DNN approaches (now simply referred to as deep learning). Instead of going through multiple experiments and applications, comparing different DNN architectures, and demonstrating the DNN magic, I will mainly focus on the basic understanding of those families of hierarchical models, with key relationships with statistical modelling and linear algebra, and more recent approaches such as sparse recovery modelling.
The first part of the talk will recall the key basis of HMMs, with training formulated either in terms of maximum likelihood (production model) or maximum a posteriori (recognition/discriminant) model. Indeed, proper understanding of a posterior probability and posterior distributions of sequences is particularly important in the context of DNNs. The second part of the talk will then review the use and properties of MLPs/DNNs as powerful discriminant classifiers. We will not have time to recall most of the basics (like Error Back-Propagation, etc), known for nearly 40 years, but will focus on their principled relationships with statistical modeling, linear algebra, and quality evaluation, yielding to the fully fleshed hybrid HMM/DNN models. The third and last module of the lecture will then focus on current research trends in improving DNN-based sequence modeling, including new HMM models (e.g., Kullback-Leibler based HMM), sparse recovery/compressive sensing modeling, and different posterior-based modeling approaches better exploiting/complementing DNN properties.

Bourlard, H. and Morgan, N., Connectionist Speech Recognition - A Hybrid Approach, Kluwer Academic Publishers, ISBN 0-7923-9396-1, 1994.

[1] Morgan, N. and Bourlard, H. (1995), “Neural Networks for Statistical Recognition of Continuous Speech,” Proceedings of the IEEE, vol. 83, no. 5, pp. 741-770, May 1995.
[2] Bourlard, H. and Morgan, N. (1993), “Continuous Speech Recognition by Connectionist Statistical Methods,” IEEE Trans. on Neural Networks, vol. 4, no. 6, pp. 893-909.
[3] Bourlard, H. and Kamp, Y. (1988), “Auto-Association by Multilayer Perceptrons and Singular Value Decomposition,” Biological Cybernetics, vol. 59, pp. 291-294.

This talk will assume some minimum knowledge in statistical pattern processing and linear algebra. Although we will not explicitly on specific types of temporal sequences, most of what will be discussed here has been successfully exploited for speech recognition over the last 30 years. Speech recognition will thus often be taken as a typical example.

Hervé Bourlard received the Electrical and Computer Science Engineering degree and the PhD degree in Applied Sciences both from “Faculté Polytechnique de Mons”, Mons, Belgium. After having been a member of the Scientific Staff at the MBLE Philips Research Laboratory of Brussels and an R&D Manager at L&H SpeechProducts, he is now (since 1996) Director of the Idiap Research Institute, Full Professor at the Swiss Federal Institute of Technology Lausanne (EPFL), and Founding Director of the Swiss NSF National Centre of Competence in Research on `Interactive Multimodal Information Management (IM2)`. Having spent (since 1988) several long-term and short-term visits (initially as a Guest Scientist) at the International Computer Science Institute (ICSI), Berkeley, CA, he is now an ICSI External Fellow and a member of its Board of Trustees. His main research interests mainly include signal (and speech) processing, statistical pattern classification, applied mathematics, multi-channel processing, artificial neural networks, with applications to a wide range of Information and Communication Technologies, including spoken language processing, speech and speaker recognition, language modeling, multimodal interaction, augmented multi-party interaction, and distant group collaborative environments. H. Bourlard is the author/coauthor/editor of 8 books and over 330 reviewed papers (including one IEEE paper award) and book chapters. He is a Fellow of the IEEE, a Fellow of the International Speech Communication Association (ISCA), a Senior Member of ACM, Member of the Swiss Academy of Engineering Sciences, and an elected member of the ACM Europe Council. He is (or has been) a member of the program/scientific committees of numerous international conferences and on the Editorial Board of several journals (e.g., past co-Editor- in-Chief of “Speech Communication”). He is the recipient of several scientific and entrepreneurship awards.

Thomas Breuel NVIDIA Corporation

Segmentation, Processing, and Tracking, with Applications to Video, Gaming, VR, and Self-driving Cars


Image-to-Image Transformations, Semantic Segmentation, Mid-Level Vision

The course will cover deep learning for image-to-image transformations; such transformations are important in a number of application areas:

  • self-driving cars
  • inside-out tracking for VR/AR
  • gesture recognition and analysis
  • smart cameras
  • video gaming
  • robotics
We will be covering important image-to-image network architectures and their applications to these domains:
  • image filtering with deep learning
  • image and video enhancement and coding with deep learning
  • optical flow and motion segmentation
  • texture detection, classification, and segmentation
  • semantic image segmentation, including medical applications
  • face and object detection and tracking
  • modeling image degradation and style transfer
  • training on synthetic data and model-based vision
The course will cover the major network architectures, training methods, and data sets, as well as connections with classical image processing, statistics, computer vision, and human vision.


Participants should have basic familiarity with deep learning, including common architectures and training methods.

Thomas Breuel works on deep learning and computer vision at NVIDIA Research, with applications to self-driving cars, gaming, and image/video analysis. Prior to NVIDIA, he was a full professor of computer science at the University of Kaiserslautern (Germany) and worked as a researcher at Google, Xerox PARC, the IBM Almaden Research Center, IDIAP, Switzerland, as well as a consultant to the US Bureau of the Census. He is an alumnus of the Massachusetts Institute of Technology and Harvard University.

George Cybenko Dartmouth College

Deep Learning of Behaviors


For the purposes of this short course, behaviors are the dynamics exhibited by systems over time. There is growing interest and success in applying various types of machine learning techniques to the modeling and learning of human, natural and machine generated behaviors. This course will review a variety of behavior modeling approaches and provide a tutorial on classical, recurrent neural network and recurrent deep learning methods for learning behaviors. Application areas will include social behaviors, computer systems behaviors for security purposes and as well as other domains.

- Overview of behavior modeling and application domains
- Goals of and metrics for behavior learning
- Recurrent neural networks and long short term memory (LSTM)
- Software implementations and hands on exercises (TensorFlow, Keras)
- Comparisons with classical methods


  • Rabiner, Lawrence R. 'A tutorial on hidden Markov models and selected applications in speech recognition.' Proceedings of the IEEE 77.2 (1989): 257-286.
  • Graves, Alex, Abdel-rahman Mohamed, and Geoffrey Hinton. 'Speech recognition with deep recurrent neural networks.' Acoustics, speech and signal processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013.
  • Cybenko, George, and Valentino Crespi. 'Learning hidden Markov models using nonnegative matrix factorization.' IEEE Transactions on Information Theory 57.6 (2011): 3963-3970.
  • Williams, Ronald J., and David Zipser. 'A learning algorithm for continually running fully recurrent neural networks.' Neural computation 1.2 (1989): 270-280.
  • Pascanu, Razvan, et al. 'How to construct deep recurrent neural networks.' arXiv preprint arXiv:1312.6026 (2013).
  • Borja de Balle Pige, Learning Finite-State Machines,

Familiarity with numerical linear algebra, probability and statistics, machine learning basics.

George Cybenko is the Dorothy and Walter Gramm Professor of Engineering at Dartmouth. Professor Cybenko has made research contributions in signal processing, neural computing, parallel computing and computational behavioral analysis. He was the Founding Editor-in-Chief of IEEE/AIP Computing in Science and Engineering, IEEE Security & Privacy and IEEE Transactions on Computational Social Systems. Professor Cybenko is a Fellow of the IEEE, received the 2016 SPIE Eric A. Lehrfeld Award for poutstanding contributions to global homeland security and the US Air Force Commander's Public Service Award. He obtained his BS (Toronto) and PhD (Princeton) degrees in Mathematics. He has held visiting appointments at MIT, Stanford and Leiden University where we has the Kloosterman Visiting Distinguished Professor. Cybenko is co-founder of Flowtraq Inc ( which focuses on commercial software and services for large-scale network flow security and analytics.

Rina Dechter & Alexander Ihler University of California, Irvine

Algorithms for Reasoning with Probabilistic Graphical Models


The course will cover the primary exact and approximate algorithms for reasoning with Probabilistic Graphical models (e.g., Bayesian and Markov networks, influence diagrams, and Markov decision processes). I will present inference-based, message-passing schemes (e.g., variable-elimination,) and search-based, conditioning schemes (e.g., cycle-cutset conditioning and AND/OR search). Each class possesses distinguished characteristics and in particular has different time vs. space behavior. I will emphasize the dependence of both schemes on few graph parameters such as the treewidth, cycle-cutset, and (the pseudo-tree) height. I will start from exact algorithms and will move to approximate schemes that are anytime, including weighted mini-bucket schemes with cost-shifting.


  • Introduction to graphical models; queries (i.e., MAP, Partition function, Marginals and Marginal Map) and algorithms (Inference and search).
  • Inference algorithms: Bucket-elimination and tree-clustering schemes,
  • Bounded inference: mini-bucket and weighted mini-buckets, belief propagation schemes (BP and IJGP), cost-shifting schemes.
  • AND/OR search spaces for graphical models, AND/OR Branch and Bound for combinatorial optimization (MAP/MPE).
  • Generating heuristics using mini-bucket elimination with tightening by cost-shifting tightening. (e.g., weighted Mini-bucket with moment matching)
  • Marginal Map by AND/OR search
  • Sampling: Cutset-sampling, SampleSearch and AND/OR sampling (as time permits)

Rina Dechter:
Reasoning with Probabilistic and Deterministic Graphical Models: Exact Algorithms. Synthesis Lectures on Artificial Intelligence and Machine Learning, Morgan & Claypool Publishers 2013
Alexander Ihler, Natalia Flerova, Rina Dechter, and Lars Otten. "Join-graph based cost-shifting schemes" in Proceedings of UAI 2012
Dechter, R. and Rish, I., "Mini-Buckets: A General Scheme for Bounded Inference" In "Journal of the ACM", Vol. 50, Issue 2: pages 107-153, March 2003.
Robert Mateescu, Kalev Kask, Vibhav Gogate, and Rina Dechter. "Join-Graph Propagation Algorithms." JAIR'2009

Basic Computer Science

Rina Dechter research centers on computational aspects of automated reasoning and knowledge representation including search, constraint processing, and probabilistic reasoning. She is a professor of computer science at the University of California, Irvine. She holds a Ph.D. from UCLA, an M.S. degree in applied mathematics from the Weizmann Institute, and a B.S. in mathematics and statistics from the Hebrew University in Jerusalem. She is an author of Constraint Processing published by Morgan Kaufmann (2003), and Reasoning with Probabilistic and Deterministic Graphical Models: Exact Algorithms by Morgan and Claypool publishers, 2013, has co-authored over 150 research papers, and has served on the editorial boards of: Artificial Intelligence, the Constraint Journal, Journal of Artificial Intelligence Research (JAIR), and Journal of Machine Learning Research (JMLR). She is a fellow of the American Association of Artificial Intelligence 1994, was a Radcliffe Fellow 2005–2006, received the 2007 Association of Constraint Programming (ACP) Research Excellence Award, and she is a 2013 ACM Fellow. She has been Co-Editor- in-Chief of Artificial Intelligence since 2011. She is also co-editor with Hector Geffner and Joe Halpern of the book Heuristics, Probability and Causality: A Tribute to Judea Pearl, College Publications, 2010.

Alex Ihler is an associate professor of computer science at the University of California, Irvine. He received his MS and PhD degrees from the Massachusetts Institute of Technology in 2000 and 2005, and a BS from the California Institute of Technology in 1998. His research spans several areas of machine learning, with a particular focus on probabilistic, graphical model representations, including Bayesian networks, Markov random fields, and influence diagrams, and with applications to domains such as sensor networks, computer vision, and computational biology. He is the co-author of over 60 research papers, and the recipient of an NSF CAREER award. He is the director of UC Irvine's Center for Machine Learning, and has served on the editorial boards of Machine Learning (MLJ), Artificial Intelligence (AIJ), and the Journal of Machine Learning Research (JMLR).

Li Deng Citadel

An Overview of Deep Learning for Speech, Image, Text, and Multi-modal Processing



Part I: Basics of Machine Learning and Applications --- deep and shallow
- machine learning founding principles
- machine learning and deep learning
- shallow machine learning vs. deep machine learning
- taxonomy of machine learning: a learning-paradigm perspective
- taxonomy of speech, image, text, and multi-modal applications: a signal-processing perspective
Part II: Deep Neural Networks (DNN): Why gradient vanishes & how to rescue it
- history of neural nets for speech recognition: why they failed
- one equation for backprop update --- why gradients may easily vanish for DNN learning
- five ways of rescuing gradient vanishing
- an alternative way of training DNN (deep stacking net)
- recurrent nets: my experiments in 90s (for speech) and current perspectives
Part III: How Deep Learning Disrupted Speech (and Image) Recognition
- shallow models dominating speech: 30+ years from 80s
- deep generative models for speech: 10 years of research before DNN disruption
- pros and cons of generative vs discriminative models
- how speech is produced and perceived by human: a comprehensive computational model
- several theories of human perception
- variational inference/learning for deep generative speech model (experiments late 90's to mid 2000)
- a very different kind of deep generative model: deep belief nets (2006)
- the arrival of DNN for speech and its early successes: a historical perspective (2009-2011)
- more recent development of deep learning for speech
- a perspective on recent innovations in speech recognition
- how to do truly unsupervised learning for future speech recognition (and other AI tasks)
Part IV: Deep Learning for Text and Multi-Modal Processing
- AI to move from perception to cognition: key roles of language/text
- concept of symbolic/semantic embedding
- word and text embedding
- build text embedding on top of sub-word units: practical necessity for many applications
- distant supervised embedding
- deep structured semantic modeling (DSSM)
- use of DSSM for multi-modal deep learning: Microsoft's first generation image captioning system
- DSSM for contextual search in Microsoft Office/Word
Part V: Limitations of Current Deep Learning and How to Overcome Them
- Interpretability problem
- Symbolic-neural integration for reasoning: tensor-product representations
- How do labels come from: the need for unsupervised learning via rich priors and self learning via interactions
- Vertical applications

G. E. Hinton, R. Salakutdinov. 'Reducing the Dimensionality of Data with Neural Networks'.Science 313: 504–507, 2016.
G. E. Hinton, L. Deng, D. Yu, etc. 'Deep Neural Networks for Acoustic Modeling in Speech Recognition: The shared views of four research groups,' IEEE Signal Processing Magazine, pp. 82–97, November 2012. (plus other papers in the same special issue)
G. Dahl, D. Yu, L. Deng, A. Acero. 'Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition'. IEEE Trans. Audio, Speech, and Language Processing, Vol 20(1): 30–42, 2012. (plus other papers in the same special issue)
Y. Bengio, A. Courville, and P. Vincent. 'Representation Learning: A Review and New Perspectives,' IEEE Trans. PAMI, special issue Learning Deep Architectures, 2013.
J. Schmidhuber. 'Deep learning in neural networks: An overview,' arXiv, October 2014.
Y. LeCun, Y. Bengio, and G. Hinton. 'Deep Learning', Nature, Vol. 521, May 2015.
J. Bellegarda and C. Monz. 'State of the art in statistical methods for language and speech processing,' Computer Speech and Language, 2015.
Li Deng, Navdeep Jaitly. CHAPTER 1.2 Deep Discriminative and Generative Models for Speech Pattern Recognition, in Handbook of Pattern Recognition and Computer Vision (Ed. C.H. Chen), World Scientific, 2016,
Dong Yu and Li Deng, Automatic Speech Recognition – A Deep Learning Approach, Springer, 2015.
Li Deng and Dong Yu, DEEP LEARNING — Methods and Applications. NOW Publishers, June 2014.
Goodfellow, Bengio, Courville. Deep Learning, MIT Press, 2016.
Li Deng and Yang Liu (eds), Deep Learning in Natural Language Processing, Springer, 2017-2018.
Li Deng and Doug O’Shaughnessy, SPEECH PROCESSING — A Dynamic and Optimization-Oriented Approach, Marcel Dekker Inc., June 2003.


Li Deng recently joined Citadel, one of the most successful investment firms in the world, as its Chief AI Officer. Prior to Citadel, he was Chief Scientist of AI and Partner Research Manager at Microsoft. Prior to Microsoft, he was a tenured Full Professor at the University of Waterloo in Ontario, Canada. He was a Fellow of the IEEE, the Acoustic Society of America, and the International Speech Communication Association. He received the 2015 IEEE SPS Technical Achievement Award for his 'Outstanding Contributions to Automatic Speech Recognition and Deep Learning' and numerous best paper and technical awards related to these and other contributions to artificial intelligence, machine learning, multimedia signal processing, and their industrial applications.

Jianfeng Gao Microsoft Research

An Introduction to Deep Learning for Natural Language Processing


In this talk, I start with a brief introduction to the history of deep learning and its application to natural language processing (NLP) tasks. Then I describes in detail the deep learning technologies that are recently developed for three areas of NLP tasks. First is a series of deep learning models to model semantic similarities between texts and images, the task that is fundamental to a wide range of applications, such as Web search ranking, recommendation, image captioning and machine translation. Second is a set of neural models developed for machine reading comprehension and question answering. Third is the use of deep learning for various of dialogue agents, including task-completion bots and social chat bots.

Part 1. Introduction to deep learning and natural language processing (NLP)
- A brief history of deep learning
- An example of neural models for query classification
- Overview of deep learning models for NLP tasks
Part 2. Deep Semantic Similarity Models (DSSM) for text processing
- Challenges of modeling semantic similarity
- What is DSSM
- DSSM for Web search ranking
- DSSM for recommendation
- DSSM for automatic image captioning and other tasks
Part 3. Deep learning for Machine Reading Comprehension (MRC) and Question Answering (QA) - Challenges of MRC and QA
- A brief review of symbolic approaches
- From symbolic to neural approaches
- State of the art MRC models
- Toward an open-domain QA system
Part 4. Deep learning for dialogue
- Challenges of developing open-domain dialogue agents
- The development of task-oriented dialogue agents using deep reinforcement learning
- The development of neural conversation engines for social chat bots


Part 1: Yih, He & Gao. Deep learning and continuous representations for natural language processing. Tutorial presented in HLT-NAACL-2015, IJCAI-2016.

Part 2 (DSSM): We have developed a series of deep semantic similarity models (DSSM, also a.k.a. Sent2Vec), which have been used for many text and image processing tasks, including web search [Huang et al. 2013, Shen et al. 2014], recommendation [Gao et al. 2014a], machine translation [Gao et al. 2014b], and QA [Yih et al. 2015].

Part 3 (MRC): We released a new MRC dataset, called MS MARCO; and have developed a series of reasoning networks for MRC, aka ReasoNet and ReasoNet with shared memory.

Part 4 (Dialogue): We have developed neural network models for social bots trained on Twitter data [project site] and task-completion bots [Lipton et al. 2016;Bhuwan et al. 2016] trained via deep reinforcement learning using a user simulator.

No prerequisites.


Jianfeng Gao is Partner Research Manager in Deep Learning Technology Center (DLTC) at Microsoft Research, Redmond. He works on deep learning for text and image processing and leads the development of AI systems for dialogue, machine reading comprehension (MRC), question answering (QA), and enterprise applications. From 2006 to 2014, he was Principal Researcher at Natural Language Processing Group at Microsoft Research, Redmond, where he worked on Web search, query understanding and reformulation, ads prediction, and statistical machine translation. From 2005 to 2006, he was a research lead in Natural Interactive Services Division at Microsoft, where he worked on Project X, an effort of developing natural user interface for Windows. From 1999 to 2005, he was Research Lead in Natural Language Computing Group at Microsoft Research Asia. He, together with his colleagues, developed the first Chinese speech recognition system released with Microsoft Office, the Chinese/Japanese Input Method Editors (IME) which were the leading products in the market, and the natural language platform for Windows Vista.

Michael Gschwind IBM T.J. Watson Research Center

Deploying Deep Learning Applications at the Enterprise Scale


A confluence of new artificial neural network architectures and unprecedented compute capabilities based on numeric accelerators has reinvigorated interest in Artificial Intelligence based on neural processing. Initial first successful deployments in hyperscale internet services are now driving broader commercial interest in adopting Deep learning as a design principle for cognitive applications in the enterprise. In this class, we will review hardware acceleration and co-optimized software frameworks for Deep Learning, and discuss model development and deployment to accelerate adoption of Deep Learning based solutions for enterprise deployments


Session 1:
1. Hardware Foundations of the Great AI Re-Awakening
2. Deployment models for DNN Training and Inference
Session 2:
1. Optimized High Performance Training Frameworks
2. Parallel Training Environments
Session 3:
1. Developing Models with Expressive Interfaces
2. Lab Demo

M. Gschwind, Need for Speed: Accelerated Deep Learning on Power, GPU Technology Conference, Washington DC, October 2016.


Dr. Michael Gschwind is Chief Engineer for Machine Learning and Deep Learning for IBM Systems where he leads the development of hardware/software integrated products for cognitive computing. During his career, Dr. Gschwind has been a technical leader for IBM’s key transformational initiatives, leading the development of the OpenPOWER Hardware Architecture as well as the software interfaces of the OpenPOWER Software Ecosystem. In previous assignments, he was a chief architect for Blue Gene, POWER8, and POWER7. As chief architect for the Cell BE, Dr. Gschwind created the first programmable numeric accelerator serving as chief architect for both hardware and software architecture. In addition to his industry career, Dr. Gschwind has held faulty appointments at Princeton University and Technische Universität Wien. While at Technische Universität Wien, Dr. Gschwind invented the concept of neural network training and inference accelerators. Dr. Gschwind is a Fellow of the IEEE, an IBM Master Inventor and a Member of the IBM Academy of Technology.

Yufei Huang University of Texas, San Antonio

Deep Learning for Precision Medicine and Biomedical informatics


Precision medicine represents a new paradigm for disease prevention and therapy. It aims at determining a more personalized treatment by considering individual variability in genomics makeups, environment, and lifestyle. Such paradigm changing approach to medicine relies heavily on the collection of heterogeneous high throughput data of healthy and patient populations at multiple levels over a long time period. Identifying individualized genomics and proteomics markers from these diverse, large dimensional datasets is a grant challenges that calls for the development of new and powerful computational and machine learning models. As integrated biomedical informatics has become a focus area in precision medicine, we will discuss in this lecture recent development of deep learning methods in the area of biomedical informatics and precision medicine.

The course will cover the following topics: 1. Background of molecular biology and high throughput sequencing technologies

a. DNA, RNA, proteins
c. Genotype, mutation, SNPs, precision medicine
d. High through sequencing technology: DNA-seq, ChIP-seq, RNA-seq, WES, CLIP-seq, Hi-C, MeRIP-seq, etc
2. Overview of deep learning methods
b. Visualization of deep learning models
3. Deep learning for understanding gene regulation
a. Deep learning for predicting transcription factor binding
b. Deep learning for predicting splicing codes
c. Deep learning for predicting gene expression changes
4. Deep learning for understanding RNA regulation
a. Deep learning for predicting long noncoding RNA
b. Deep learning for predicting microRNA targets
c. Deep learning for predicting m6A methylation
5. Deep learning for precision medicine
a. Deep learning for predicting SNPs
b. Deep learning for classifying cancer types
c. Deep learning for predicting drug targets
d. Deep learning for predicting patient survival

Park, Yongjin, and Manolis Kellis. 'Deep learning for regulatory genomics.' Nat Biotechnol 33.8 (2015): 825-6.
Alipanahi, Babak, et al. 'Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning.' Nature biotechnology 33.8 (2015): 831-838.
Zhou, Jian, and Olga G. Troyanskaya. 'Predicting effects of noncoding variants with deep learning-based sequence model.' Nature methods 12.10 (2015): 931-934.

Probability and machine learning basics; having some knowledge of genomics is preferred but not required.

Yufei Huang received his Ph.D. degree in electrical engineering from the State University of New York at Stony Brook in 2001. Since 2002, he has been with the Department of Electrical and Computer Engineering at the University of Texas at San Antonio (UTSA), where he is now Professor. He is also an adjunct professor at the Dept. of Epidemiology and Biostatistics at the University of Texas Health Science Center at San Antonio. He has been a visiting professor at the Center of Bioinformatics, Harvard Center for Neurodegeneration & Repair. Dr. Huang’s expertise is in the areas of computational biology, computational neuroergonomics, brain computer interface, and machine learning. He is currently focusing on uncovering the functions of mRNA methylation using high throughput sequencing technologies, developing passive EEG-based brain-machine-interaction, and developing deep learning algorithms for precision medicine and predicting cognitive behavior from big EEG data. He was a recipient of National Science Foundation (NSF) CAREER Award in 2005, Best Paper Award of 2017 IEEE Biomedical and Health Informatics conference, Best Paper Award of 2006 Artificial Neural Networks in Engineering Conference, and 2007 Best Paper Award of IEEE Signal Processing Magazine. His research has been supported by NSF, National Institute of Health, Air Force Office of Scientific Research, Army Research Lab, Department of Defense, and Qatar National Research Fund. He is a member of IEEE Biomedical and Health Informatics Technical Committee and has served as Editors of multiple journal

Soo-Young Lee Korea Advanced Institute of Science and Technology

Multi-modal Deep Learning for the Recognition of Human Emotions in the Wild


Human emotion is an internal state of human brain which makes different decision and behavior from same sensory inputs. Therefore, for efficient interactions between human and machine, it is important for the machine to estimate human emotions. Due to the internal nature, the classification accuracy of the emotion from a single modality is not high. For example, our result was ranked as Top-1 with only 61.6% accuracy for the emotion recognition task from facial images at EmotiW2015 challenge. In this tutorial, we will start with our deep neural networks for the challenge. Then, to improve the accuracy, we will introduce speech and text modalities to recognize human emotions. Finally, we will discuss how to combine those three modalities based on Early Integration and Late Integration models. Also, we will introduce a new integration method based on top-down attention.

1. Introduction to emotion recognition
- A brief history
- Overview of modality for emotion recognition
2. Deep learning for emotion recognition from facial images
- Learning facial representations for emotion recognition
- Deep CNN architectures for emotion recognition
- Committee machine with many CNN classifiers
3. Deep learning for emotion recognition from speech and texts
- Learning speech representations for emotion recognition
- Learning texts representations for emotion recognition
4. Multi-modal integration for emotion recognition
- Early Integration model
- Late Integration model
- Top-Down Attention model
5. Further issues

B.K. Kim, J. Roh, S.Y. Dong, S.Y. Lee, “Hierarchical committee of deep convolutional neural networks for robust facial expression recognition,” J Multimodal User Interfaces (2016) 10:173-189

No prerequisites.

Soo-Young Lee is a professor of Electrical Engineering at Korea Advanced Institute of Science and Technology. In 1997, he established the Brain Science Research Centre at KAIST, and led Korean Brain Neuroinformatics Research Program from 1998 to 2008. He is now also a Co-Director of Center for Artificial Intelligence Research at KAIST, and leading Emotional Dialogue Project, a Korean National Flagship Project. He is President of Asia-Pacific Neural Network Society in 2017, and had received Presidential Award from INNS and Outstanding Achievement Award from APNNS. His research interests have resided in the artificial cognitive systems with human-like intelligent behavior based on the biological brain information processing. He has worked on speech and image recognition, natural language processing, situation awareness, internal-state recognition, and human-like dialog systems. Especially, among many internal states, he is interested in emotion, sympathy, trust, and personality. Both computational models and cognitive neuroscience experiments are conducted. His group marked Top-1 for the emotion recognition challenge from facial images (EmotiW; Emotion Recognition in the Wild) in 2015.

Li Erran Li Columbia University

Deep Reinforcement Learning: Recent Advances and Frontiers


Deep reinforcement learning has enabled artificial agents to achieve human-level performances across many challenging domains, e.g. playing Atari games and Go. I will present several important algorithms including deep Q-Networks and asynchronous actor-critic algorithms (A3C), DDPG, SVG, guided policy search. I will discuss major challenges and promising results towards making deep reinforcement learning applicable to real world problems in robotics and natural language processing.

1. Introduction to reinforcement learning (RL)
2. Value-based deep RL
  Deep Q-learning (deep Q-Networks)
  Programming assignment of deep Q-Networks in OpenAI Gym to play Atari games
3. Policy-based deep RL
  Policy gradients
  Asynchronous actor-critic algorithms (A3C)
  Natural gradients and trust region optimization (TRPO)
  Deep deterministic policy gradients (DDPG), SVG
4. Model-based deep RL: Alpha Go and guided policy search
5. Deep learning in multi-agent games: fictitious self-play
6. Inverse RL
7. Transfer in RL
8. Frontiers
  Application to robotics
  Application to natural language understanding


  • Sutton, R. S. and Barto, A. G. (2017). Reinforcement Learning: An Introduction (2nd Edition, in preparation). MIT Press.
  • Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., and Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540):529–533.
  • Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al. (2016). Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484–489.
  • Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., and Riedmiller, M. (2014). Deterministic policy gradient algorithms. In the International Conference on Machine Learning (ICML).
  • Mnih, V., Badia, A. P., Mirza, M., Graves, A., Harley, T., Lillicrap, T. P., Silver, D., and Kavukcuoglu, K. (2016). Asynchronous methods for deep reinforcement learning. In the In- ternational Conference on Machine Learning (ICML).
  • Wang, Z., Schaul, T., Hessel, M., van Hasselt, H., Lanctot, M., and de Freitas, N. (2016b). Du- eling network architectures for deep reinforcement learning. In the International Conference on Machine Learning (ICML).
  • Levine, S., Finn, C., Darrell, T., and Abbeel, P. (2016a). End-to-end training of deep visuomotor policies. The Journal of Machine Learning Research, 17:1–40.
  • Finn, C., Christiano, P., Abbeel, P., and Levine, S. (2016). A connection between GANs, inverse reinforcement learning, and energy-based models. In NIPS 2016 Workshop on Adversarial Training.
  • Finn, C. and Levine, S. (2016). Deep visual foresight for planning robot motion. ArXiv e-prints.
  • Gu, S., Holly, E., Lillicrap, T., and Levine, S. (2016a). Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. ArXiv e-prints.
  • Gu, S., Lillicrap, T., Sutskever, I., and Levine, S. (2016b). Continuous deep q-learning with model- based acceleration. In the International Conference on Machine Learning (ICML).
  • Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. (2016). Continuous control with deep reinforcement learning. In the International Conference on Learning Representations (ICLR).

Basic knowledge of reinforcement learning, deep learning and Markov decision processes

Li Erran Li received his Ph.D. in Computer Science from Cornell University advised by Joseph Halpern. He is currently with Uber and an adjunct professor in the Computer Science Department of Columbia University. Before that, He worked as a researcher in Bell Labs. His research interests are AI, deep learning, machine learning algorithms and systems. He is an IEEE Fellow and an ACM Distinguished Scientist.

Michael C. Mozer University of Colorado, Boulder

Incorporating Domain Bias into Neural Networks


Deep learning is often pitched as a general, universal solution to AI. The pitch promises that with sufficient data, a generic neural architecture and learning algorithm can perform end-to-end processing; it is not necessary to understand the domain, engineer features, or specialize models. Although this fantasy holds true in the limit of infinite data and infinite computing cycles, bounds on either -- or on the quality or completeness of data -- often make the promise of deep learning hollow. To overcome limitations of data and computing, an alternative is to customize models to characteristics of the domain. Much of the art of modern deep learning is determining how to incorporate diverse forms of domain knowledge into a model via its representations, architecture, loss function, and data transformations. Domain-appropriate biases constrain the learning problem and thereby compensate for data limitations. A classic form of bias for vision tasks--used even before the invention of back propagation--is the the convolutional architecture, exploiting the homogeneity of image statistics and the relevance of local spatial structure. Many generic tricks of the trade in deep learning can be cast in this manner--as suitable forms of domain bias. Beyond these generic tricks, I will work through illustrations of domains in which prior knowledge can be leveraged to creatively construct models. My own particular research interest involves cognitively-informed machine learning, where an understanding of the mechanisms of human perception, cognition, and reasoning can serve as a powerful constraint on models that are intended to predict human preferences and behavior.

* The scaling problem
* Bias-variance dilemma
* Imposing domain-appropriate bias
- via loss functions
- via representations and representational constraints
- via data augmentation
- via architecture design
* Case studies of model crafting: memory in humans and recurrent networks

Geman, S., Bienenstock, E., & Doursat, R. (1992). Neural networks and the bias-variance dilemma. Neural Computation, v. 4, n. 1, pp. 1-58.
Other references will be provided during the lectures.

Basic background in probability and statistics

Michael Mozer received a Ph.D. in Cognitive Science at the University of California at San Diego in 1987. Following a postdoctoral fellowship with Geoffrey Hinton at the University of Toronto, he joined the faculty at the University of Colorado at Boulder and is presently an Professor in the Department of Computer Science and the Institute of Cognitive Science. He is secretary of the Neural Information Processing Systems Foundation and has served as chair of the Cognitive Science Society. He is interested both in developing machine learning algorithms that leverage insights from human cognition, and in developing software tools to optimize human performance using machine learning methods.

Roderick Murray-Smith University of Glasgow

Applications of Deep Learning Models in Human-Computer Interaction Research


Applications of Deep Learning Models in Human-Computer Interaction Research

The opportunities for interaction with computer systems are rapidly expanding beyond traditional input and output paradigms: full-body motion sensors, brain-computer interfaces, 3D displays, touch panels are now commonplace commercial items. The profusion of new sensing devices for human input and the new display channels which are becoming available offer the potential to create more involving, expressive and efficient interactions in a much wider range of contexts. Dealing with these complex sources of human intention requires appropriate mathematical methods; modelling and analysis of interactions requires sophisticated methods which can transform streams of data from complex sensors into estimates of human intention.
This tutorial will focus on the use of inference and dynamical modelling in human-computer interaction. The combination of modern statistical inference and real-time closed loop modelling offers rich possibilities in building interactive systems, but there is a significant gap between the techniques commonly used in HCI and the mathematical tools available in other fields of computing science. This tutorial aims to illustrate how to bring these mathematical tools to bear on interaction problems, and will cover basic theory and example applications from:

  • mobile interaction
  • interaction with large music collections. This will include work on the Bang & Olufsen Beomoment product and Syntonetic’s Moodgalaxy which combines Gaussian process priors, nonlinear dimensionality reduction and inferred moods to give you new ways to explore your music collection. I will also summarise some of our recent work on using the entropy of inferred mood and genre features to understand users’ criteria for playlist curation).
  • 3D human motion and 3D capacitive sensing systems. Future interactions will often be Casual interactions which are flowing ‘around device' or ‘over device’ interactions, potentially combined with speech recognition technologies, and look at the role of control theory and information theory in analysis of such systems. I will give examples where we have developed 3D capacitive touch systems using particle filters and deep convolutional networks to infer finger pose and position above the surface of the device, and then created a series of ‘flow-based interactions’ which allow more carefree around device gesturing.

General basic background in machine learning and interest in human-computer interaction or information retrieval


Roderick Murray-Smith is a Professor of Computing Science at Glasgow University, leading the Inference, Dynamics and Interaction research group, and heads the 50-strong Section on Information, Data and Analysis, which also includes the Information Retrieval, Computer Vision & Autonomous systems and IDEAS Big Data groups. He works in the overlap between machine learning, interaction design and control theory. In recent years his research has included multimodal sensor-based interaction with mobile devices, mobile spatial interaction, AR/VR, Brain-Computer interaction and nonparametric machine learning. Prior to this he held positions at the Hamilton Institute, NUIM, Technical University of Denmark, M.I.T. (Mike Jordan’s lab), and Daimler-Benz Research, Berlin, and was the Director of SICSA, the Scottish Informatics and Computing Science Alliance (all academic CS departments in Scotland). He works closely with the mobile phone industry, having worked together with Nokia, Samsung, FT/Orange, Microsoft and Bang & Olufsen. He was a member of Nokia's Scientific Advisory Board and is a member of the Scientific Advisory Board for the Finnish Centre of Excellence in Computational Inference Research. He has co-authored three edited volumes, 29 journal papers, 18 book chapters, and 88 conference papers.

Hermann Ney RWTH Aachen University

Speech Recognition and Machine Translation: From Statistical Decision Theory to Machine Learning and Deep Neural Networks


The last 40 years have seen a dramatic progress in machine learning and statistical methods for speech and language processing like speech recognition, handwriting recognition and machine translation. Most of the key statistical concepts had originally been developed for speech recognition. Examples of such key concepts are the Bayes decision rule for minimum error rate and probabilistic approaches to acoustic modelling (e.g.hidden Markov models) and language modelling. Recently the accuracy of speech recognition could be improved significantly by the use of artificial neural networks, such as deep feedforward multi-layer perceptrons and recurrent neural networks (incl. long short-term memory extension). We will discuss these approaches in detail and how they fit into the probabilistic approach.

Part 1: Statistical Decision Theory, Machine Learning and Neural Networks.
Part 2: Speech Recognition (Time Alignment, Hidden Markov models, neural nets, attention models)
Part 3: Machine Translation (Word Alignment, Hidden Markov models, neural nets, attention models).

Bourlard, H. and Morgan, N., Connectionist Speech Recognition - A Hybrid Approach, Kluwer Academic Publishers, ISBN 0-7923-9396-1, 1994.
L. Deng, D. Yu: Deep learning: methods and applications. Foundations and Trends in Signal Processing, Vol. 7, No. 3–4, pp. 197-387, 2014.
P. Koehn: Statistical Machine Translation, Cambridge University Press, 2010.

Familiarity with linear algebra, numerical mathematics, probability and statistics, elementary machine learning..

Hermann Ney is a full professor of computer science at RWTH Aachen University, Germany. His main research interests lie in the area of statistical classification, machine learning, neural networks and human language technology and specific applications to speech recognition, machine translation and handwriting recognition. In particular, he has worked on dynamic programming and discriminative training for speech recognition, on language modelling and on machine translation. His work has resulted in more than 700 conference and journal papers (h-index 87, 39000 citations; estimated using Google scholar). He and his team contributed to a large number of European (e.g. TC-STAR, QUAERO, TRANSLECTURES, EU-BRIDGE) and American (e.g. GALE, BOLT, BABEL) large-scale joint projects. Hermann Ney is a fellow of both IEEE and ISCA (Int. Speech Communication Association). In 2005, he was the recipient of the Technical Achievement Award of the IEEE Signal Processing Society. In 2010, he was awarded a senior DIGITEO chair at LIMIS/CNRS in Paris, France. In 2013, he received the award of honour of the International Association for Machine Translation. In 2016, he was awarded an advanced grant of the European Research Council (ERC).

Jose C. Principe University of Florida

Cognitive Architectures for Object Recognition in Video



I-Requisites for a Cognitive Architecture
• Processing in space
• Processing in time and memory
• Top down and bottom processing
• Extraction of information from data with generative models
• Attention

II- Putting it all together
• Empirical Bayes with generative models
• Clustering of time series with linear state models
• Information Theoretic Autoencoders

III- Current work
• Extraction of time signatures with kernel ARMA
• Attention Based video recognition
• Augmenting Deep Learning with memory



Jose C. Principe is a Distinguished Professor of Electrical and Computer Engineering at the University of Florida where he teaches advanced signal processing, machine learning and artificial neural networks (ANNs). He is Eckis Professor and the Founder and Director of the University of Florida Computational NeuroEngineering Laboratory (CNEL) The CNEL Lab innovated signal and pattern recognition principles based on information theoretic criteria, as well as filtering in functional spaces. His secondary area of interest has focused in applications to computational neuroscience, Brain Machine Interfaces and brain dynamics. Dr. Principe is a Fellow of the IEEE, AIMBE, and IAMBE. Dr. Principe received the Gabor Award, from the INNS, the Career Achievement Award from the IEEE EMBS and the Neural Network Pioneer Award, of the IEEE CIS. He has more than 33 patents awarded over 800 publications in the areas of adaptive signal processing, control of nonlinear dynamical systems, machine learning and neural networks, information theoretic learning, with applications to neurotechnology and brain computer interfaces. He directed 93 Ph.D. dissertations and 65 Master theses. He wrote in 2000 an interactive electronic book entitled 'Neural and Adaptive Systems' published by John Wiley and Sons and more recently co-authored several books on 'Brain Machine Interface Engineering' Morgan and Claypool, 'Information Theoretic Learning', Springer, 'Kernel Adaptive Filtering', Wiley and 'System Parameter Adaption: Information Theoretic Criteria and Algorithms', Elsevier. He has received four Honorary Doctor Degrees, from Finland, Italy, Brazil and Colombia, and routinely serves in international scientific advisory boards of Universities and Companies. He has received extensive funding from NSF, NIH and DOD (ONR, DARPA, AFOSR).

Marc’Aurelio Ranzato Facebook AI Research

Learning Representations for Vision, Speech and Text Processing Applications


This course will cover the foundations of deep learning with its application to vision and text understanding. Attendees will become familiar with two corner stone concepts of most successful applications of deep learning today: convolutional neural networks, and embeddings. The former is employed in audio and visual processing applications, while the latter is used for representing text, graphs and other discrete or symbolic data. Finally, we are going to learn about how we can further extend these methods to deal with sequential data, like videos. Lectures will provide intuitions, the underlying mathematics, typical applications, code snippets and references. By the end of these three lectures, attendees are expected to gain enough familiarity to be able to apply these basic tools to standard datasets on their own.

Session 1 - Deep Learning for Vision and Audio Processing Applications

The basics: from logistic regression to fully connected neural networks, and from fully connected neural networks to convolutional neural networks (CNNs).
Special layers used in vision applications.
Example of CNNs using the pyTorch open source library.

Session 2: - Deep Learning for Text Processing Applications.

How neural networks can be adapted to work with discrete symbolic data like text: the concept of embedding.
Methods using embeddings for a variety of text application tasks.
Example of learning from text using pyTorch.

Session 3: Deep Learning for Sequential Data.

Learning from sequences: Recurrent Neural Networks (RNNs).
How to train and generate from RNNs, variants of RNNs.
Examples of applications of RNNs using pyTorch.

Basic knowledge of linear algebra, calculus, and statistics.

Marc’Aurelio Ranzato is currently a research scientist at the Facebook AI Research lab in New York City. He previously worked at Google in the Brain team from 2011 to 2013, and before that, he was a post-doctoral fellow in Machine Learning, University of Toronto, with Geoffrey Hinton. He earned his Ph.D. in Computer Science at New York University advised by Yann LeCun. He is originally from Padova in Italy, where he graduated in Electronics Engineering. Marc’Aurelio is interested in Machine Learning, Computer Vision, Natural Language Processing and, more generally, Artificial Intelligence. More specifically, he has worked on methods to learn hierarchical representations of data, unsupervised learning and methods for structured prediction. His research has been applied to visual object recognition, face recognition, speech recognition, machine translation, summarization and many other tasks. Marc’Aurelio has served as Area Chair for several major conferences, like NIPS, ICML, CVPR, and ICCV. He has been Senior Program Chair for ICLR 2017 and guest editor for IJCV.

Maximilian Riesenhuber Georgetown University

Deep Learning in the Brain


In recent years, deep convolutional neural networks (CNNs) have been shown to deliver excellent performance on a variety of object detection tasks, and much has been made of how these CNNs have been inspired by insights into how the brain performs vision. Foremost among these insights was the notion of the brain’s visual system as a 'simple-to-complex' feedforward hierarchy of interleaving pooling and template match stages. Yet, the neuroscience underlying CNNs is more than 20 years old, and neuroscience research since then using various brain imaging techniques and other experimental and computational approaches have dramatically advanced our understanding of how the brain recognizes objects and assigns meaning to sensory stimuli. This course will review the traditional picture of visual processing in the brain that underlies CNNs, in particular the concept of a feedforward, 'simple-to-complex' hierarchy, and then present new insights from neuroscience regarding flexibility in brain’s processing hierarchy, including shortcuts, additional levels, feedback signaling and interactions across different hierarchies that have greatly impacted our understanding of how the brain can perform ultra-fast object localization, learn new concepts by leveraging prior learning, and learn in deep hierarchies.

Session 1: The basics: Vision in the brain: Feedforward, simple-to-complex hierarchies
Session 2: New insights into deep hierarchies in brain: Deep, deeper and shallower processing; re-entrant signals for learning and conscious awareness.
Session 3: Learning across modalities: From objects to words, audition, and touch.

Riesenhuber, M., & Poggio, T. (2002). Neural Mechanisms of Object Recognition. Current Opinion in Neurobiology 12: 162-168.

Some basic neuroscience knowledge is helpful, as is having a brain.

Dr. Riesenhuber is Director of the Laboratory for Computational Cognitive Neuroscience and Professor of Neuroscience at Georgetown University Medical Center. His research investigates the neural mechanisms underlying object recognition and task learning in the human brain across sensory modalities. Current research foci are ultra-rapid object localization, leveraging prior learning, multi-tasking, and vibrotactile object recognition and speech. The computational model at the core of his research has been quite successful in elucidating the neural mechanisms underlying robust invariant object recognition, contributing to Technology Review Magazine’s naming him one of their TR100 in 2003, “the 100 people under age 35 whose contributions to emerging technologies will profoundly influence our world.” Dr. Riesenhuber has received several awards, including a McDonnell-Pew Award in Cognitive Neuroscience and an NSF CAREER Award. He holds a PhD in computational neuroscience from MIT.

Ruslan Salakhutdinov Carnegie Mellon University

Foundations of Deep Learning and its Recent Advances


The goal of the tutorial is to introduce the recent and exciting developments of various deep learning methods. The core focus will be placed on algorithms that can learn multi-layer hierarchies of representations, emphasizing their applications in information retrieval, data mining, collaborative filtering, and computer vision.

The tutorial will be split into two parts.
1: The first part will provide a gentle introduction into graphical models, neural networks, and deep learning models. Topics will include:
Unsupervised learning methods, including autoencoders, restricted Boltzmann machines, and methods for learning over-complete representations.
Supervised methods for deep models, including deep convolutional neural network models and their applications to text comprehension, data mining, image and video analysis.
2: The second part of the tutorial will introduce more advanced models, including Variational Autoencoders, Generative Adversarial Networks, Deep Boltzmann Machines, and Recurrent Neural Networks. We will also address mathematical issues, focusing on efficient large-scale optimization methods for inference and learning.
Throughout the tutorial, we will highlight applications of deep learning methods in the areas of natural language processing, reading comprehension, multimodal learning, collaborative filtering, and image/video analysis.


Basic knowledge of probability, linear algebra, and introductory machine learning.

Ruslan Salakhutdinov received his PhD in machine learning (computer science) from the University of Toronto in 2009. After spending two post-doctoral years at the Massachusetts Institute of Technology Artificial Intelligence Lab, he joined the University of Toronto as an Assistant Professor in the Department of Computer Science and Department of Statistics. In February of 2016, he joined the Machine Learning Department at Carnegie Mellon University as an Associate Professor.
Ruslan's primary interests lie in deep learning, machine learning, and large-scale optimization. His main research goal is to understand the computational and statistical principles required for discovering structure in large amounts of data. He is an action editor of the Journal of Machine Learning Research and served on the senior programme committee of several learning conferences including NIPS and ICML. He is an Alfred P. Sloan Research Fellow, Microsoft Research Faculty Fellow, Canada Research Chair in Statistical Machine Learning, a recipient of the Early Researcher Award, Connaught New Researcher Award, Google Faculty Award, Nvidia's Pioneers of AI award, and is a Senior Fellow of the Canadian Institute for Advanced Research.

Alessandro Sperduti University of Padua

Deep Learning for Sequences


With the diffusion of cheap sensors, sensor-equipped devices (e.g., drones), and sensor networks (such as Internet of Things), as well as the development of inexpensive human-machine interaction interfaces, the ability to quickly and effectively process sequential data is becoming more and more important. Many are the tasks that may benefit from advancements in this field, ranging from monitoring and classification of human behavior to prediction of future events. Many are the approaches that have been proposed in the past to learn in sequential domains, ranging from linear models to early models of Recurrent Neural Networks, up to more recent Deep Learning solutions. The lectures will start with the presentation of relevant sequential domains, introducing scenarios involving different types of sequences (e.g., symbolic sequences, time series, multivariate sequences) and tasks (e.g., classification, prediction, transduction). Linear models are first introduced, including linear auto-encoders for sequences. Subsequently non-linear models and related training algorithms are recalled, starting from early versions of Recurrent Neural Networks. Computational problems and proposed solutions will be presented, including novel linear-based pre-training approaches. Finally, more recent Deep Learning models will be discussed. Lectures will close with some theoretical considerations on the relationships between Feed-forward and Recurrent Neural Networks, and a discussion about dealing with more complex data (e.g., trees and graphs).

1. Introduction to sequential domains and related computational tasks
2. Linear models for sequences
3. Linear auto-encoders for sequences: optimal and approximated solutions
4. Recurrent Neural Network models and related training algorithms
5. Computational problems of Recurrent Neural Networks and some 'solutions'
6. Novel linear-based pre-training approaches for Recurrent Neural Networks
7. Recent Deep Learning models
8. Relationship between Feed-forward and Recurrent Neural Networks
9. Beyond sequences: trees and graphs


Basic algebra, calculus, and probability at the introductory college level.

Prof. Sperduti is full professor of Computer Science at the Department of Mathematics of the University of Padova since March 1, 2002. Previously, he has been associate professor (1998-2002) and assistant professor (1995-1998) at the Department of Computer Science of the University of Pisa. His research interests are mainly in Neural Networks, Kernel Methods, and Process Mining. Prof. Sperduti has been PC member of several conferences (such as IJCAI, ECAI, ICML, ECML, SIGIR, ECIR, SDM, IJCNN, ICANN, ESANN, ...), and guest editor of special issues for the journals Neural Networks, IEEE TKDE, and Cognitive Systems Research. He is in the editorial board of the journal Theoretical Computer Science (Section C), the European Journal on Artificial Intelligence, IEEE Intelligent Systems Magazine, and the journal Neural Networks. He has been associate editor (2009-2012) for the IEEE Transactions on Neural Networks and Learning Systems. Starting from 2001 till 2010, he has been member of the European Neural Networks Society (ENNS) Executive Committee, chair of the DMTC of IEEE CIS for the years 2009 and 2010, chair of the NNTC for the years 2011 and 2012, and chair of the IEEE CIS Student Games-Based Competition Committee for the year 2013. He is senior member IEEE. He has delivered several tutorials in main Artificial Intelligence conferences (WCCI 2012, IJCAI 2001, IJCAI 1999, IJCAI 1997) and summer schools. He was the recipient of the 2000 AI*IA (Italian Association for Artificial Intelligence) 'MARCO SOMALVICO' Young Researcher Award. He as been invited plenary speaker for the conferences ICANN 2001, WSOM 2007, CIDM 2013. Prof. Sperduti is the author of more than 180 publications on refereed journals, conferences, and chapters in books.

Jimeng Sun Georgia Institute of Technology

Interpretable Deep Learning Models for Healthcare Applications


It is widely believed that deep learning techniques could fundamentally change healthcare industries. Even though recent development in deep learning has achieved successes in many other applications, such as computer vision, natural language processing, and speech recognition. However, healthcare applications pose many significantly different challenges to existing deep learning models. Examples include but not are limited to interpretations for prediction, heterogeneity in data, missing value, rare events, interpretation and privacy issues. In this short class, we will discuss a series of problems in healthcare that can benefit from deep learning models, the challenges as well as recent advances in addressing those.


  • Introduction of machine learning problems in healthcare data
  • Application of existing deep learning models to healthcare
    • CNN for medical images
    • RNN for heart failure prediction
    • CNN-RNN for EEG analysis
  • New models
    • Attention models
      • Temporal attention models
      • Attention on medical ontology
    • Representation learning
      • Word2vec on medical codes
      • Med2vec: Two-level representation learning
    • Data synthesis
      • MedGAN for generating discrete EHR data
  • Q&A


  1. Choi, Edward, Andy Schuetz, Walter F Stewart, and Jimeng Sun "Using recurrent neural network models for early detection of heart failure onset" Journal of the American Medical Informatics Association 2016; doi: 10.1093/jamia/ocw112
  2. Edward Choi, Mohammad Taha Bahadori, Andy Schuetz, Walter F. Stewart, and Jimeng Sun, "Doctor AI: Predicting Clinical Events via Recurrent Neural Networks", Machine learning for Healthcare 2016, arXiv:1511.05942 [cs.LG]
  3. Choi, Edward, Mohammad Taha Bahadori, Andy Schuetz, Walter F. Stewart, and Jimeng Sun. 2016. “RETAIN: Interpretable Predictive Model in Healthcare Using Reverse Time Attention Mechanism.” NIPS’16
  4. Edward Choi, Mohammad Bahadori, Elizabeth Searles, Catherine Coffey, Michael Thompson, James Bost, Javier Tejedor-Sojo,and Jimeng Sun. “Multi-layer Representation Learning for Medical Concepts” KDD 16.
  5. Choi, Edward, Mohammad Taha Bahadori, Le Song, Walter F. Stewart, and Jimeng Sun. 2016. “GRAM: Graph-Based Attention Model for Healthcare Representation Learning.” arXiv [cs.LG]. arXiv.
  6. Choi, Edward, Siddharth Biswal, Bradley Malin, Jon Duke, Walter F. Stewart, and Jimeng Sun. 2017. “Generating Multi-Label Discrete Electronic Health Records Using Generative Adversarial Networks.” arXiv [cs.LG]. arXiv.
  7. Esteva, Andre, Brett Kuprel, Roberto A. Novoa, Justin Ko, Susan M. Swetter, Helen M. Blau, and Sebastian Thrun. 2017. “Dermatologist-Level Classification of Skin Cancer with Deep Neural Networks.” Nature 542 (7639): 115–18.
  8. Gulshan, Varun, Lily Peng, Marc Coram, Martin C. Stumpe, Derek Wu, Arunachalam Narayanaswamy, Subhashini Venugopalan, et al. 2016. “Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs.” JAMA: The Journal of the American Medical Association 316 (22): 2402–10.

The tutorial is targeted at researchers in machine learning as well as researchers working on the health-related applications. It will also attract a broader of audience who work on applying deep learning models to applications with heterogenous data. The prerequisites include graduate-level machine learning classes and ideally basic knowledge on deep learning.

Jimeng Sun, Georgia Institute of Technology (, Dr. Sun is an associate professor of College of Computing at Georgia Tech. Prior to Georgia Tech, he was a researcher at IBM TJ Watson Research Center. His research focuses on health analytics and data mining, especially in designing tensor factorizations, deep learning methods, and large-scale predictive modeling systems. Dr. Sun has been collaborating with many healthcare organizations: Children's Healthcare of Atlanta, Vanderbilt university medical center, Mass General hospital, Sutter Health, Geisinger, Northwestern and UCB. He published over 120 papers and filed over 20 patents (5 granted). He has received ICDM best research paper award in 2008, SDM best research paper award in 2007, and KDD Dissertation runner-up award in 2008. Dr. Sun received B.S. and M.Phil. in Computer Science from Hong Kong University of Science and Technology in 2002 and 2003, M.Sc and PhD in Computer Science from Carnegie Mellon University in 2006 and 2007.

Julian Togelius New York University

(Deep) Learning for (Video) Games


We will discuss methods of applying AI, in particular deep learning but also other methods from machine learning, optimization and tree search, to games, in particular video games. While AI methods have been used for playing board games since the birth of the concept of AI itself, it has only relatively recently been applied to playing a broad range of video games, which offer new kinds of challenges. There are at least two different reasons for doing this: in order to provide value for game design and development, for example by automatically testing games or providing interesting adversaries, and for testing AI methods in realistic environments. However, the applications of deep learning and other similar methods go beyond playing games, and into creating game content itself. This is challenging task for AI methods, which has often been approached with search-based methods, but where a recent crop of machine learning-based approaches has recently appeared.

Part 1: Playing Games
- Why play games?
- Tree search methods
- Neuroevolution and reinforcement learning
- Supervised learning

Part 2: Generating Content
- Why generate content?
- Constructive methods
- Search-based methods
- Machine learning-based methods

Shaker, N., Togelius, J., & Nelson, M. (2016). Procedural Content Generation In Games. Springer. Available for free online at
Yannakakis, G. N., Togelius, J. (2017). Artificial Intelligence in Games. Springer (being published later this year).
Yannakakis, G. N., & Togelius, J. (2015). A panorama of artificial and computational intelligence in games. IEEE Transactions on Computational Intelligence and AI in Games, 7(4), 317-335.

An understanding of artificial intelligence, including at least basic knowledge of modern machine learning methods. An interest in games.

Julian Togelius is an Associate Professor in the Department of Computer Science and Engineering, New York University, USA. He works on all aspects of computational intelligence and games and on selected topics in evolutionary computation and evolutionary reinforcement learning. His current main research directions involve search-based procedural content generation in games, general video game playing, player modeling, and fair and relevant benchmarking of AI through game-based competitions. He is a past chair of the IEEE CIS Technical Committee on Games, and an associate editor of IEEE Transactions on Computational Intelligence and Games. Togelius holds a BA from Lund University, an MSc from the University of Sussex, and a PhD from the University of Essex. He has previously worked at IDSIA in Lugano and at the IT University of Copenhagen.

Joos Vandewalle KU Leuven

Data Processing Methods, and Applications of Least Squares Support Vector Machines


The course starts with a basic part where the methods, opportunities and limitations of neural networks and learning machines are briefly summarized. Support Vector Machines SVM are a powerful methodology for solving problems in nonlinear classification, function estimation and density estimation which has received widespread attention. Least Squares Support Vector Machines (LS-SVM) are reformulations of the standard SVMs which lead to simplicity and performance by solving linear KKT systems, with strong application potentials. LS-SVMs are closely related to regularization networks and Gaussian processes but additionally emphasize and exploit deeper insights and opportunities of primal-dual formulations. Links between kernel versions of classical pattern recognition algorithms such as kernel Fisher discriminant analysis and extensions to unsupervised learning, recurrent networks and control are produced. Robustness, sparseness and weightings can be incorporated into LS-SVMs. Recent developments are in kernel spectral clustering, data visualization and dimensionality reduction, and survival analysis. For very large scale problems a method of Fixed Size LS-SVM is proposed. Several successful applications in our interdisciplinary research on medical diagnostics and time series predictions will be highlighted. The use of neural networks and LS SVM in deep learning will be presented.

Part 1 : Methods, capabilities, limitations and fascinating applications of artificial neural networks and support vector machines
Part 2 Least squares support vector machines LS SVM, the methods and the use of supervised and unsupervised designs.
Part 3 Applications of neural networks and support vector machines in research of medical diagnostics for ovarian cancer, time series modeling for electrical energy consumption, weather forecasting
Part 4 Combining neural networks and LS SVM with deep learning methods.

J.A.K. Suykens, T. Van Gestel, J. De Brabanter, B. De Moor, J. Vandewalle, Least Squares Support Vector Machines, World Scientific, Singapore, 2002 (ISBN 981-238-151-1)
Van Gestel T., Suykens J.A.K., Baesens B., Viaene S., Vanthienen J., Dedene G., De Moor B., Vandewalle J., Benchmarking Least Squares Support Vector Machine Classifiers, Machine Learning, vol. 54, no. 1, Jan. 2004, pp. 5-32.
Suykens J.A.K., Vandewalle J., The K.U.Leuven competition data : a challenge for advanced neural network techniques, in Proc. of the European Symposium on Artificial Neural Networks (ESANN'2000), Bruges, Belgium, 2000, pp. 299-304.
Lu C., Van Gestel T., Suykens J.A.K., Van Huffel S., Vergote I., Timmerman D., Preoperative prediction of malignancy of ovarium tumor using least squares support vector machines, Artificial Intelligence in Medicine, vol. 28, no. 3, Jul. 2003, pp. 281-306.
Espinoza M., Suykens J.A.K., Belmans R., De Moor B., Electric Load Forecasting - Using kernel based modeling for nonlinear system identification, IEEE Control Systems Magazine, Special Issue on Applications of System Identification, vol. 27, no. 5, Oct. 2007, pp. 43-57.
Langone R., Mall R., Vandewalle J., Suykens J. A. K., Discovering cluster dynamics using kernel spectral methods, in Chapter 1 of Complex Systems and Networks, (Jinhu L., Xinghuo Y., Guanrong C., and and Wenwu Y., eds.), vol. 2 {The Springer Series in Understanding Complex Systems}, Springer-Verlag Berlin, 2016, pp. 1-24.
Karevan Z., Suykens J.A.K., Clustering-based feature selection for black-box weather temperature prediction, in Proc. of the International Joint Conference on Neural Networks (IJCNN), Vancouver, Canada, Jul. 2016, pp. 1-8.

Familiarity with linear algebra, calculus, probability theory, statistics, basic optimization and algorithm design is expected, at the level that is typically introduced in bachelor level in computer science or engineering programs.

Joos Vandewalle is emeritus full professor with assignments at the Department Electrical Engineering (ESAT), Katholieke Universiteit Leuven, Belgium. He headed the SCD division at ESAT, with more than 150 researchers. He held visiting positions University of California, Berkeley and I3S CNRS Sophia Antipolis, France. For more than 20 years he taught courses in linear algebra, linear and nonlinear system and circuit theory, signal processing and neural networks. His research interests are in mathematical system theory and its applications in circuit theory, control, signal processing, cryptography and neural networks where he supervised 43 PhDs. He (co-)authored more than 300 international journal papers and 6 books. He obtained several best paper awards and research awards. His publications received over 38 000 googlescholar citations. He is a Fellow of IEEE, IET, and EURASIP and member of the Academia Europaea and of the Belgian Academy of Sciences. From 2009 till 2013 he was a member of the Board of Governors of the IEEE Circuits and Systems Society. He is a member of the jury of the BBVA Foundation Frontiers of Knowledge Award in ICT. He is currently the president of the Royal Belgian Academy KVAB. Website.

Ying Nian Wu University of California, Los Angeles

Generative Modeling and Unsupervised Learning


This short course introduces generative models and the associated learning, inference and sampling algorithms that are commonly used in unsupervised learning, i.e., learning from unlabeled data. We shall focus on two classes of models, namely the undirected energy based models and the directed latent variable models, including the deep undirected and directed graphical models parametrized by convolutional neural networks.

(1) Markov random field models, maximum entropy and maximum likelihood.
(2) Latent factor models including factor analysis, independent component analysis, sparse coding, matrix factorization; auto-encoder and embedding.
(3) Deep undirected and directed models, maximum likelihood, contrastive divergence, variational, adversarial, and cooperative learning methods.

The focus will be on (3), with (1) and (2) laying the groundwork.

Ian Goodfellow and Yoshua Bengio and Aaron Courville, Deep Learning, MIT press, 2016.

Basic knowledge of probability, statistics and machine learning.

Ying Nian Wu received his Ph.D. degree in statistics from Harvard in 1996. He was an assistant professor from 1997 to 1999 in Department of Statistics, University of Michigan. He joined University of California, Los Angeles (UCLA) in 1999, and is currently a professor in Department of Statistics, UCLA. His research interests include statistical modeling, computing and learning, with applications in computer vision.

Eric P. Xing Carnegie Mellon University

Statistical Machine Learning Perspectives of Extending Deep Neural Networks: Kernels, Logics, Regularizers, Priors, and Distributed Algorithms


Scott Wen-tau Yih Microsoft Research

Continuous Representations for Natural Language Understanding


Understanding human language has been one of the long-standing research goals since the dawn of AI. In this lecture, we will discuss how recently developed methods, mostly deep neural network models, advance the state of the art. The lecture starts from the broad introduction of the research of natural language processing, analyzing why understanding language remains difficult. We will introduce several representative NLP tasks and discuss the role of machine leaning in the data-driven approaches. Historical and modern paradigms of problem formulations and models will also be briefly surveyed in this part.
The rest of lecture focuses on two key natural language understanding tasks: information extraction and question answering. Transforming unstructured text to structured databases, information extraction aims to find the entities and their relationships in text, as well as to make the extracted facts easily accessible programmatically. We will first give an overview on the basic problem setting, such as binary relation extraction as sequence labeling problems. After that, we will emphasize more on the latest distant supervision methods, which model the multi-sentence, n-ary relation settings using structured LSTM models. New approaches for embedding entities and relations in a knowledge base for reasoning for previously unknown facts will also be covered.
Question answering, while often used as the means to demonstrate machine intelligence, is an important application for fulfilling user's information need. In this part, we'll start by introducing the general framework of answering question using unstructured text, such as Wikipedia or the Web, and describe the current state-of-the-art deep learning approaches. We will also discuss how to leverage structured data such as databases or tables to answer questions, with the focus on semantic parsing methods.


Part 1. Background of Natural Language Processing and Machine Learning

  • What is natural language understanding and why is it difficult?
  • Representative NLP tasks
  • Machine learning models & paradigms

Part 2. Knowledge Base Completion and Information Extraction

  • Information extraction tasks: bridging unstructured text and structured databases
  • Simple and complex problem settings
    • Binary to n-ary relations
    • Distant supervision for combating annotation bottleneck
    • Going beyond sentence boundaries
  • Sequence labeling to structured LSTM models
  • Deep-learning approaches for embedding structured knowledge and text

Part 3. Semantic Parsing and Question Answering

  • Current research trend of question answering
  • Question answering with unstructured text and the Web
  • Question answering with knowledge bases
    • Semantic parsing (of questions)
    • Matching questions and answers in embedding space
    • Information extraction and text matching


Yih, He & Gao. Deep learning and continuous representations for natural language processing. Tutorial presented in HLT-NAACL-2015, IJCAI-2016.

Yih & Ma. Question answering with knowledge bases, Web and beyond. Tutorial presented in HLT-NAACL-2016, SIGIR-2016.

Poon, Quirk, Toutanova & Yih. Natural Language Processing for Precision Medicine. Tutorial to be presented in ACL-2017.

No prerequisites.

Scott Wen-tau Yih is a Senior Researcher at Microsoft Research Redmond. His research interests include natural language processing, machine learning and information retrieval. Yih received his Ph.D. in computer science at the University of Illinois at Urbana-Champaign. His work on joint inference using integer linear programming (ILP) helped the UIUC team win the CoNLL-05 shared task on semantic role labeling, and the approach has been widely adopted in the NLP community since then. After joining Microsoft Research, he has worked on email spam filtering, keyword extraction and search & ad relevance. His recent work focuses on continuous semantic representations using neural networks and matrix/tensor decomposition methods, with applications in lexical semantics, knowledge base embedding, semantic parsing and question answering. Yih received the best paper award from CoNLL-2011, an outstanding paper award from ACL-2015 and has served as area chairs (HLT-NAACL-12, ACL-14, EMNLP-16,17), program co-chairs (CEAS-09, CoNLL-14) and action/associated editors (TACL, JAIR) in recent years.

Georgios N. Yannakakis University of Malta

Deep Learning for Games - But Not for Playing them


Can AI understand how players feel, think and react and, in turn, automatically design new games for them? Can those computationally designed games be considered creative? When does this happen and who judges after all? How can Deep Learning help us achieve these goals?
In this course I will address the above questions by positioning computer games as the ideal application domain for computational creativity, affective computing and machine (deep) learning for the unique features they offer. For that purpose, I will identify a number of key creative facets in modern game development and discuss their required orchestration for a final successful game product. I will also focus on the study of player emotion and will detail the key phases for efficient game-based affect interaction. Advanced deep learning methods for player experience modeling, game adaptation, procedural content generation, and computational game creativity will be showcased via a plethora of projects developed at the Institute of Digital Games, University of Malta (

Session 1: Introduction to the domain of games. Why games are the ideal arena for AI and deep learning (DL)? What is there beyond gameplaying for AI/DL and why it is even more challenging? Session 2: Deep learning for modeling players
Session 3: Deep learning for generating content

H. P. Martinez, Y. Bengio and G. N. Yannakakis, “Learning Deep Physiological Models of Affect,” IEEE Computational Intelligence Magazine, Special Issue on Computational Intelligence and Affective Computing, pp. 20-33, May, 2013.
A. Liapis, H. P. Martinez, J. Togelius and G. N. Yannakakis, “Transforming Exploratory Creativity with DeLeNoX,” in Proceedings of the Fourth International Conference on Computational Creativity, pp. 71–78, 2013.
H. P. Martinez and G. N. Yannakakis, “Deep Multimodal Fusion: Combining Discrete Events and Continuous Signals,” in Proceedings of the International Conference in Multimodal Interaction (ICMI), 2014.

Basic knowledge of statistics and calculus.

Prof. Georgios N. Yannakakis is Associate Professor at the Institute of Digital Games, University of Malta (UoM). He received the Ph.D. degree in Informatics from the University of Edinburgh in 2005. Prior to joining the Institute of Digital Games, UoM, in 2012 he was an Associate Professor at the Center for Computer Games Research at the IT University of Copenhagen. He does research at the crossroads of artificial intelligence, computational creativity, affective computing, advanced game technology, and human-computer interaction. Georgios N. Yannakakis is one of the leading researchers within player affective modeling and adaptive content generation for games and has pioneered the use of preference learning algorithms to create statistical models of player experience which drive the automatic generation of personalized game content. He has published over 200 journal and conference papers in the aforementioned fields; his work has been cited broadly and received multiple awards. His research has been supported by numerous national and European grants and has been featured in Science Magazine and New Scientist among other venues. He has given keynote talks in the most recognised conferences in the areas of his research activity and he has organized a few of the most respected conferences in the areas of game artificial intelligence (IEEE CIG 2010) and games research (FDG 2013). He is an Associate Editor of the IEEE Transactions on Computational Intelligence and AI in Games. Finally, he is the co-author of the first academic textbook on game AI ( offering a holistic view of the field (to be published in 2017 by Springer). Web:

Richard Zemel University of Toronto

Learning to Understand Images and Text


Few shot classification is a task in which a classifier must be adapted to accommodate new classes not seen in training, given only a few examples of each of these classes. A naive approach, such as re-training the model on the new data, would severely overfit. While the problem is quite difficult, it has been demonstrated that humans have the ability to perform even one-shot classification, where only a single example of each new class is given, with a high degree of accuracy. Recently there has been a flurry of work in machine learning, and significant progress, on this problem, In this course we will discuss the variety of approaches to this important problem.

The few-shot classification problem; non-parametric models; metric learning; deep learning for few-shot; novel loss functions and approaches

Metric learning: A survey. Brian Kulis. Foundations and Trends in Machine Learning, 5(4):287–364, 321 2012.
One shot learning of simple visual concepts. Brenden Lake, Ruslan Salakhutdinov, Jason Gross, Joshua Tenenbaum. In CogSci, 2011.
Siamese neural networks for one-shot image recognition. Gregory Koch, Richard Zemel, Ruslan Salakhutdinov. ICML Deep Learning Workshop, 2015.

Familiarity with math, probability and statistics is expected, at the level of an undergraduate in computer science or enginereing program. Also basic knowledge of deep networks will be assumed.

Richard Zemel is a Professor of Computer Science at the University of Toronto, and the Research Director and Co-Founder of the new Vector Institute for Artificial Intelligence. Prior to that he was on the faculty at the University of Arizona, and a Postdoctoral Fellow at the Salk Institute and at CMU. He received the B.Sc. in History & Science from Harvard, and a Ph.D. in Computer Science from the University of Toronto. His awards and honors include a Young Investigator Award from the ONR and a US Presidential Scholar award. He is a Senior Fellow of the Canadian Institute for Advanced Research, an NVIDIA Pioneer of AI, and a member of the NIPS Advisory Board. His recent research interests include learning with weak labels, models of images and text, and fairness.