The process of learning is essential for building natural or artificial intelligent systems. Thus, not surprisingly, machine learning is at the center of artificial intelligence today. And deep learning--essentially learning in complex systems comprised of multiple processing stages--is at the forefront of machine learning. The lectures will provide an overview of neural networks and deep learning with an emphasis on first principles and theoretical foundations. The lectures will also provide a brief historical perspective of the field. Applications will be focused on difficult problems in the natural sciences, from physics, to chemistry, and to biology.
1: Introduction and Historical Background. Building Blocks. Architectures. Shallow Networks. Design and Learning.
2: Deep Networks. Backpropagation. Underfitting, Overfitting, and Tricks of the Trade.
3: Two-Layer Networks. Universal Approximation Properties. Compressive and Expansive Autoencoders. Network capacity.
4: Learning in the Machine. Local Learning and the Learning Channel. Hebbian Learning. Dropout. Optimality of BP and Random BP.
5: Architectures (Convolutional, Siamese, GANs, etc). Applications.
6: Recurrent Networks. Hopfield model. Boltzmann machines.
7: Recursive and Recurrent Networks. Design and Learning. Inner and Outer Approaches.
8: Applications to Physics (High Energy, Neutrino, Antimatter, Dark Matter, etc.)
9: Applications to Chemistry (Molecules, Reactions, etc).
10: Applications to Biology (Proteins, DNA, Biomedical Imaging, etc).
Basic algebra, calculus, and probability at the introductory college level. Some previous knowledge of machine learning could be useful but not required.
Pierre Baldi earned MS degrees in Mathematics and Psychology from the University of Paris, and a PhD in Mathematics from the California Institute of Technology. He is currently Chancellor's Professor in the Department of Computer Science, Director of the Institute for Genomics and Bioinformatics, and Associate Director of the Center for Machine Learning and Intelligent Systems at the University of California Irvine. The long term focus of his research is on understanding intelligence in brains and machines. He has made several contributions to the theory of deep learning, and developed and applied deep learning methods for problems in the natural sciences such as the detection of exotic particles in physics, the prediction of reactions in chemistry, and the prediction of protein secondary and tertiary structure in biology. He has written four books and over 300 peer-reviewed articles. He is the recipient of the 1993 Lew Allen Award at JPL, the 2010 E. R. Caianiello Prize for research in machine learning, and a 2014 Google Faculty Research Award. He is and Elected Fellow of the AAAS, AAAI, IEEE, ACM, and ISCB.
One of the central challenges in machine learning today relates to the high parameter complexity of models and their relation to a large amount of heterogeneous, noisy data. What is the relevant information in the data to select a model / hypothesis from a hypothesis class? Machine learning research has answered this question to a scientifically satisfactory degree in supervised learning, i.e., classification and regression. Without teacher provided guidance, however, model selection and validation still appears to be a magic engineering art mostly dominated by heuristics fundamental ML algorithms The course willl cover traditional model selection methods like AIC, BIC, but also the stability method and a novel approach based on information theory. The resulting selection score, called posterior agreement criterion, requires hypotheses to agree on two different instances drawn from the same data source. Such a robustness criterion captures the spirit of cross-validation and ensures that hypotheses of models are selected according to the signal in the data and are not significantly affected by noise.
Algorithms and Gibbs distributions, Maximum Entropy method, AIC, BIC, stability selection,
Information Theoretic Model Validation, algorithms as time evolving posterior distributions; examples in approximate sorting, aproximate spanning trees,pipeline tuning in biomedical applications.
J.M. Buhmann et al., Robust optimization in the presence of uncertainty: A generic approach,Journal of Computer and System Sciences 94, pp. 135-166, (2018)
J.M. Buhmann, Information theoretic model validation for clustering, ISIT 2010 Austin, pp 1398 - 1402, (2010)
J.M. Buhmann, SIMBAD: Emergence of Pattern Similarity, in Advances in Vision and Pattern Recognition, Ed. Marcello Pelillo, Springer, (2013), isbn = 978-1-4471-5627-7
Introductory course in Machine Learning and/or Statistics.
Joachim M. Buhmann is full Professor for Computer Science at ETH Zurich since October 2003. He heads the Institute for Machine Learning at the Department of Computer Science. Joachim Buhmann studied physics at the Technical University of Munich and was awarded a PhD for his work on artificial neuronal networks in 1988. After research appointments at the University of Southern California and at the Lawrence Livermore National Laboratory he joined the University of Bonn as professor for practical computer science (1992 – 2003). Buhmann’s research interests cover theory and applications of machine learning and artificial intelligence, as well as wide range of subjects related to information processing in the life sciences. His conceptual and theoretical work on machine learning investigates the central question, how complex models and algorithms in data analysis (Big Data) can be validated, if they are estimated from empirical observations. Particularly, the concepts of statistical and algorithmic complexity and their mutual dependency need to be understood in this context.Joachim Buhmann served as Director of Studies for Computer Science (2008 – 2013) and as Vice-Rector for Study Programmes (2014-2017) of ETH Zurich. The German Pattern Recognition Society (DAGM) awarded him an honorary membership in 2017. He was elected as an individual member of the Swiss Academy of Engineering Sciences (SATW) in the same year.
A confluence of new artificial neural network architectures and unprecedented compute capabilities based on numeric accelerators has reinvigorated interest in Artificial Intelligence based on neural processing. Initial first successful deployments in hyperscale internet services are now driving broader commercial interest in adopting Deep learning as a design principle for cognitive applications in the enterprise. In this class, we will review hardware acceleration and co-optimized software frameworks for Deep Learning, and discuss model development and deployment to accelerate adoption of Deep Learning based solutions for enterprise deployments
1a. Hardware Foundations of the Great AI Re-Awakening
1b. Deployment models for DNN Training and Inference
Optimized High Performance Training Frameworks
Parallel Training Environments
M. Gschwind, Need for Speed: Accelerated Deep Learning on Power, GPU Technology Conference, Washington DC, October 2016.
Dr. Michael Gschwind is Chief Engineer for Machine Learning and Deep Learning for IBM Systems where he leads the development of hardware/software integrated products for cognitive computing. Over the past several years, the led the creation of the OpenPOWER Linux environment supporting GPU accelerators, created and brought to market of several generations of PowerAI, led the optimization of PowerAI for Watson workloads, and currently leads the development of the Deep Learning at Scale (DL@S) high performance cloud environment for deep learning at IBM. During his career, Dr. Gschwind has been a technical leader for IBM’s key transformational initiatives, leading the development of the OpenPOWER Hardware Architecture as well as the software interfaces of the OpenPOWER Software Ecosystem. In previous assignments, he was a chief architect for Blue Gene, POWER8, and POWER7. As chief architect for the Cell BE, Dr. Gschwind created the first programmable numeric accelerator serving as chief architect for both hardware and software architecture. In addition to his industry career, Dr. Gschwind has held faculty appointments at Princeton University and Technische Universität Wien. While at Technische Universität Wien, Dr. Gschwind invented the concept of neural network training and inference accelerators. Dr. Gschwind is a Fellow of the IEEE, an ACM Distinguished Speaker, Chair of the ACM SIGMICRO Executive Committee, an IBM Master Inventor and a Member of the IBM Academy of Technology.
Medical imaging is getting more important in modern medicine including radiology, pathology, surgery, neuroscience, etc. In case of radiology, there are several shortcomings in case of typical diagnostic radiology, due to the qualitative reading of a human observer. In addition, the rapid development of recent medical imaging equipment which produce a tremendous amount of image data makes the typical medical image reading nearly impractical. Recently, deep learning shows better accuracy for detection and classification in computer vision, which could be rapidly applied to medical imaging areas. I'll introduce methodology of data science including machine learning, and deep learning, and deep learning based applications in computer vision, computer aided diagnosis in radiology and pathology. In addition, I'll suggest some practical considerations on application of these technology to clinical workflow including efficient labeling technology, interpretability and visualization (No blackbox), uncertainty (Data level, Decision level), reproducibility of deep learning, novelty in supervised learning, one-shot or multi-shot learning due to Imbalanced data set or rare disease, deep survival, and physics induced machine learning.
1. Introduction to data science, machine learning, and deep learning
2. Deep learning in computer vision and applicaions
3. Deep learning for computer aided detection/diagnosis in radiology
4. Deep learning for computer aided detection/diagnosis in pathology
5. Practical consideration for deep learning application in medicine
- efficient labeling technology
- Interpretability and visualization (No blackbox)
- Uncertainty (Data level, Decision level)
- Reproducibility of deep learning
- Novelty in supervised learning
- One-shot or multi-shot learning due to Imbalanced data set or rare disease
- Deep survival
- Physics induced machine learning
Deep into the Brain: Artificial Intelligence in Stroke Imaging., Lee EJ, Kim YH, Kim N, Kang DW., J Stroke. 2017 Sep;19(3):277-285. doi: 10.5853/jos.2017.02054. Epub 2017 Sep 29. Review.
Comparison of Shallow and Deep Learning Methods on Classifying the Regional Pattern of Diffuse Lung Disease, Guk Bae Kim, Kyu-Hwan Jung, Yeha Lee, Hyun-Jun Kim, Namkug* Kim, Sanghoon Jun, Joon Beom Seo, David A. Lynch, Journal of Digital Imaging, 17 October 2017 (co-CA)
Development of a Computer-Aided Differential Diagnosis System to Distinguish Between Usual Interstitial Pneumonia and Non-specific Interstitial Pneumonia Using Texture- and Shape-Based Hierarchical Classifiers on HRCT Images. Jun S, Park B, Seo JB, Lee S, Kim N*. J Digit Imaging. 2017 Sep 7. doi: 10.1007/s10278-017-0018-y. PMID: 28884381 [PubMed – as supplied by publisher] (co-CA)
Deep Learning in Medical Imaging: General Overview. Lee JG, Jun S, Cho YW, Lee H, Kim GB, Seo JB, Kim N*. Korean J Radiol. 2017 Jul-Aug;18(4):570-584. doi: 10.3348/kjr.2017.18.4.570. Epub 2017 May 19. Review PMID: 28670152 [PubMed – in process]
Deep Learning: A Primer for Radiologists, Gabriel Chartrand, et al, Radiographics, Volume 37, Issue 7, 2017
Basic knowledge of computer algorithms and software; knowledge of machine learning and deep learning is recommended.
Namkug Kim is a professor at University of Ulsan College of Medicine, and also holds an appointment at Asan Medical Center (http://eng.amc.seoul.kr/), one of leading hospitals in South Korea. He is currently as a dual appointed assistant professor at Department of Convergence Medicine and Radiology. He received his BS, MS, PhD degrees from the Department of Industrial Engineering at Seoul National University and the author of about 160 peer-reviewed original articles and 90 patents (https://scholar.google.com/citations?user=namkugkim). His research interests are the areas of image based clinical applications including artificial intelligence in medicine, 3d printing in medicine, computer aided diagnosis, computer aided surgery, and robotic interventions, medical image processing, etc.
This course will start with the introduction of two basic machine learning subsystems: Feature Engineering (e.g. CNN for Image/Speech Feature Extraction) and Label Engineering, e.g. Multi-layer Perceptron (MLP). The great success of DNN in broad applications of deep learning networks hinges upon the rich nonlinear space embedded in their nonlinear hidden (neuron) layers. However, we face two major challenges: (1) the curse of depth and (2) the ad hoc nature of deep learning. Fortunately, many solutions have been proposed to effectively overcome the 'vanishing gradient' problem due to the curse of depth. In particular, we shall elaborate (a) cross-entropy (with amplified gradients) effective to surrogate the 0-1 loss; (b) the merit of ReLu-neurons and (c) the vital roles of bagging, mini-batch, and dropout.
It is widely recognized that the ad hoc nature of deep learning renders its success at the mercy of trial- and-errors. To combat this problem, we advocate a methodic and cost-effective learning paradigm (MINDnet) to train multi-layer networks. In particular, MINDnet elegantly circumvents the curse of depth by harnessing a new notion of omni-present supervision, i.e. teachers hidden within a sort of 'Trojan-horse' traveling along with the forward-propagating signals from the input to hidden layers. Therefore, one can directly harvest teacher’s information at any hidden-layer in the MLP, i.e. , no- propagation (NP) will be required. This will lead to a new and slender 'inheritance layer' to summarize (inherit) all the discriminant information embedded in the previous layer. Moreover, by augmenting the inheritance layer with additional randomized nodes and applying again back-propagation (BP) learning, the discriminant power of the network can be further enhanced. Finally, we have compared MINDnet with several popular learning models on real-world datasets, including CIFAR images, MNIST, mHealth, HAR, Yale, Olivetti, Essex datsets. Our preliminary simulation seems to suggest some superiority by MINDnet. For example, for the CIFAR-10 dataset, 97.9%+/-0.16% (MINDnet) > 97.4% (CutNet) > 96.0% (DenseNet) > 93.6% (ResNet).
Introduction of two basic machine learning subsystems:
- Feature Engineering: CNN for Image/Speech Feature Extraction
- Label Engineering: multi-layer deep learning networks
Introduce supplementary (SVM-based) subsystems for validation and prediction and highlight their
vital roles in optimization and generalization.
Introduce an effective surrogate function (to surrogate the 0-1 loss) in the training phase:
- How/why cross-entropy offers amplified gradients.
Introduce network friendly training metrics:
• equivalent optimization metrics: LSE (Gauss), FDR (Fisher) and Mutual Information (Shannon).
Derive Back-propagation (BP) Algorithm for
- back-propagation of 1st-order (gradient) and 2nd-order (Hessian) functions
Discuss effective remedies for tackling the vanishing gradient problem in deep networks:
- bagging, minim-batch, and dropout
Introduce MINDnet learning paradigm:
- Why the acronym MIND: Monotonically INcreasing Discriminant (MIND).
- A simple solution to overcome the Curse of Depth: No-propagation (NP) learning algorithm
o How to harness the teacher information “hidden” in the hidden layer?
- How to use a small number of nodes (inheritance layer) to fully summarize (inherit) all the
useful information embedded in the entire previous layer?
- To highlight the vital role of BP/NP hybrid learning.
Elaborate the detailed procedure to successively construct MINDnets with gradually growing depth:
• (vertical expansion)= full Inheritance with a small number of nodes
• (horizontal expansion)= Inheritance Theorem + random nodes
Demonstrate that the prediction accuracy indeed improves as the MINDnet grows deeper:
• Via a Synthetic dataset, we shall conduct an extensive comparative study of various machine
learning tools in the literature.
- compare MINDnet with other existing networks based real-world datasets such as CIFAR,
MNIST, Yale, Olivetti, Essex, mHealth, HAR, etc.
1. I. Goodfellow, Y. Bengio, and A. Courville, Deep learning. MIT Press, Cambridge, UAA, 2016.
2. C.M. Bishop, Pattern Recognition and Machine Learning, Berlin: Springer.
3. S.Y. Kung, Digitial Neural Networks. Prentice Hall, 1993.
4. S.Y. Kung, Kernal Methods and Machine Learning, Cambridge Press, 2014.
5. Zhang, C., Bengio, S., Hardt, M., Recht, B., & Vinyals, O. (2016). Understanding deep learning requires rethinking generalization. arXiv preprint arXiv:1611.03530.
6. Koh, P. W., & Liang, P. (2017). Understanding black-box predictions via influence functions. arXiv preprint arXiv:1703.04730.
Linear Algebra;Understanding of the design and analysis of algorithms
S.Y. Kung, Life Fellow of IEEE, is a Professor at Department of Electrical Engineering in Princeton University. His research areas include machine learning, data mining, systematic design of (deep-learning) neural networks, statistical estimation, VLSI array processors, signal and multimedia information processing, and most recently compressive privacy. He was a founding member of several Technical Committees (TC) of the IEEE Signal Processing Society. He was elected to Fellow in 1988 and served as a Member of the Board of Governors of the IEEE Signal Processing Society (1989-1991). He was a recipient of IEEE Signal Processing Society's Technical Achievement Award for the contributions on "parallel processing and neural network algorithms for signal processing" (1992); a Distinguished Lecturer of IEEE Signal Processing Society (1994); a recipient of IEEE Signal Processing Society's Best Paper Award for his publication on principal component neural networks (1996); and a recipient of the IEEE Third Millennium Medal (2000). Since 1990, he has been the Editor-In- Chief of the Journal of VLSI Signal Processing Systems. He served as the first Associate Editor in VLSI Area (1984) and the first Associate Editor in Neural Network (1991) for the IEEE Transactions on Signal Processing. He has authored and co-authored more than 500 technical publications and numerous textbooks including ``VLSI Array Processors'', Prentice-Hall (1988); ``Digital Neural Networks'', Prentice-Hall (1993) ; ``Principal Component Neural Networks'', John-Wiley (1996); ``Biometric Authentication: A Machine Learning Approach'', Prentice-Hall (2004); and ``Kernel Methods and Machine Learning”, Cambridge University Press (2014).
Deep reinforcement learning has enabled artificial agents to achieve human-level performances across many challenging domains, e.g. playing Atari games and Go. I will cover the foundations of reinforcement learning, present several important algorithms including deep Q-Networks and asynchronous actor-critic algorithms (A3C), DDPG, SVG, guided policy search, TDM. I will discuss major challenges and promising results towards making deep reinforcement learning applicable to real world problems in robotics and natural language processing.
1. Introduction to reinforcement learning (RL)
2. Value-based deep RL
Deep Q-learning (deep Q-Networks)
Temporal-difference model (TDM)
3. Policy-based deep RL
Asynchronous actor-critic algorithms (A3C)
Natural gradients and trust region optimization (TRPO)
Deep deterministic policy gradients (DDPG), SVG
4. Model-based deep RL: guided policy search
5. Deep learning in multi-agent environment: fictitious self-play
6. Imitation learning: GAIL and InfoGAIL
8. Inverse RL
9. Transfer learning, multitask learning and meta learning in RL
Application to robotics
Application to natural language understanding
V. Pong, S. Gu, M. Dalal, S. Levine, Temporal Difference Models: Model-Free Deep RL for Model Based Control, ICLR 2018
B., and de Freitas, N. (2016). Learning to learn by gradient descent by gradient descent. In the Annual Conference on Neural Information Processing Systems (NIPS).
Asri, L. E., He, J., and Suleman, K. (2016). A sequence-to-sequence model for user simulation in spoken dialogue systems. In Annual Meeting of the International Speech Communication Association (INTERSPEECH).
Babaeizadeh, M., Frosio, I., Tyree, S., Clemons, J., and Kautz, J. (2017). Reinforcement learning through asynchronous advantage actor-critic on a gpu. Submitted to Int’l Conference on Learning Representations.
Bahdanau, D., Brakel, P., Xu, K., Goyal, A., Lowe, R., Pineau, J., Courville, A., and Bengio, Y. (2017). An actor-critic algorithm for sequence prediction. Submitted to Int’l Conference on Learning Representations.
Chebotar, Y., Kalakrishnan, M., Yahya, A., Li, A., Schaal, S., and Levine, S. (2016). Path integral guided policy search. ArXiv e-prints.
Deng, L. and Liu, Y. (2017). Deep Learning in Natural Language Processing (edited book, scheduled August 2017). Springer.
Dhingra, B., Li, L., Li, X., Gao, J., Chen, Y.-N., Ahmed, F., and Deng, L. (2016). End-to-End Reinforcement Learning of Dialogue Agents for Information Access. ArXiv e-prints. Domingos, P. (2012). A few useful things to know about machine learning. Communications of the ACM, 55(10):78–87.
Dulac-Arnold, G., Evans, R., van Hasselt, H., Sunehag, P., Lillicrap, T., Hunt, J., Mann, T., Weber, T., Degris, T., and Coppin, B. (2016). Deep reinforcement learning in large discrete action spaces In the International Conference on Machine Learning (ICML).
Finn, C., Christiano, P., Abbeel, P., and Levine, S. (2016). A connection between GANs, inverse reinforcement learning, and energy-based models. In NIPS 2016 Workshop on Adversarial Training.
Finn, C. and Levine, S. (2016). Deep visual foresight for planning robot motion. ArXiv e-prints.
Finn, C., Yu, T., Fu, J., Abbeel, P., and Levine, S. (2017). Generalizing skills with semi supervised reinforcement learning. Submitted to Int’l Conference on Learning Representations.
Florensa, C., Duan, Y., and Abbeel, P. (2017). Stochastic neural networks for hierarchical reinforcement learning. Submitted to Int’l Conference on Learning Representations.
García, J. and Fernández, F. (2015). A comprehensive survey on safe reinforcement learning. The Journal of Machine Learning Research, 16:1437–1480.
Basic knowledge of reinforcement learning, deep learning and Markov decision processes
Li Erran Li received his Ph.D. in Computer Science from Cornell University advised by Joseph Halpern. He is currently with Uber ATG and an adjunct professor in the Computer Science Department of Columbia University. His research interests are AI, deep learning, machine learning algorithms and systems. He is an IEEE Fellow and an ACM Fellow.
A major challenge of modern machine learning and artificial intelligence is to offer understanding and reasoning for domains such as complex real-world environments, humans and their activities, medical imaging analytics and real world image generation. Addressing such problems for meta knowledge creation requires methods that combine deep neural methods, sparse methods, mixed norms, AI, and deformable modeling methods. This course will introduce the above new concepts and methodologies and will focus on three main topics: a) Deriving high order information from complex scenes and human movement for event understanding, b) Generative Adversarial Networks (GAN) and deep learning for real world image and video generation and story telling, and c) Cardiac and Cancer Medical Image Analytics.
1. Scene and Human Motion Understanding
Neural Nets and Nonnegative Matrix Factorization Concepts
Using NNs for Scene Understanding
Human Motion Understanding and Sign Language Understanding
2. GANs and other Deep Learning Methods for scene generation and Story Telling
Introduction to GANs
Modifications to develop Stack GANs for scene generation from text
Video Generation from Sentences
3. Medical Image Analytics
Deformable Models and Deep Learning
Cancer Diagnosis from Clinical and Preclinical Data
RED-Net: A recurrent encoder-decoder network for video-based face alignment.Xi Peng, Rogerio Feris, Xiaoyu Wang, Dimitris Metaxas. International Journal of Computer Vision (IJCV), 2018.
CR-GAN: Learning Complete Representations for Multi-view Generation. Yu Tian, Xi Peng, Long Zhao, Shaoting Zhang, Dimitris Metaxas International Joint Conference on Artificial Intelligence (IJCAI), 2018.
Jointly optimize data augmentation and network training: Adversarial data augmentation. Xi Peng, Zhiqiang Tang, Fei Yang, Rogerio S Feris, Dimitris Metaxas. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
3D Motion Modeling and Reconstruction of Left Ventricle Wall in Cardiac MRI. Dong Yang, Pengxiang Wu, Chaowei Tan, Kilian M Pohl, Leon Axel, Dimitris Metaxas. Functional Imaging and Modeling of the Heart, FIMH 2017.
Deep Image-to-Image Recurrent Network with Shape Basis Learning for Automatic Vertebra Labeling in Large-Scale 3D CT Volumes. Dong Yang, Tao Xiong, Daguang Xu, S Kevin Zhou, Zhoubing Xu, Mingqing Chen, JinHyeong Park, Sasa Grbic, Trac D Tran, Sang Peter Chin, Dimitris Metaxas, Dorin Comaniciu. International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), pp. 498-506, 2017.
Automatic liver segmentation using an adversarial image-to-image network Dong Yang, Daguang Xu, S Kevin Zhou, Bogdan Georgescu, Mingqing Chen, Sasa Grbic, Dimitris Metaxas, Dorin Comaniciu. International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 201 .
Pixel-Wise Neural Cell Instance Segmentation. Jingru Yi, Pengxiang Wu, Daniel J Hoeppner, Dimitris Metaxas. Proceedings of the IEEE ISBI, 2018
Multi-Component Deformable Models Coupled with 2D-3D U-Net for Automated Probabilistic Segmentation of Cardiac Walls and Blood. Dong Yang, Huang Qiaoying, Leon Axel and Dimitris Metaxas Proceedings of IEEE ISBI, 2018.
Jointly Optimize Data Augmentation and Network Training: Adversarial Data Augmentation in Human Pose Estimation. Xi Peng; Zhiqiang Tang, Fei Yang, Rogerio S. Feris, Dimitris Metaxas. Procs. CVPR 2018
Show Me a Story: Towards Coherent Neural Story Illustration. Hareesh Ravi, Lezi Wang, Carlos Muniz, Leonid Sigal, Dimitris Metaxas, Mubbasir Kapadia. Procs CVPR 2018.
Improving GANs Using Optimal Transport. Tim Salimans · Han Zhang · Alec Radford · Dimitris Metaxas, ICLR 2018
A recurrent encoder-decoder network for sequential face alignment. Xi Peng, Rogerio Feris, Xiaoyu Wang, Dimitris Metaxas European Conference on Computer Vision (ECCV), 2016
Parallel sparse subspace clustering via joint sample and parameter blockwise partition.B Liu, XT Yuan, Y Yu, Q Liu, DN Metaxas. ACM Transactions on Embedded Computing Systems (TECS) 16 (3), 75, 2017
StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks.H Zhang, T Xu, H Li, S Zhang, X Wang, X Huang, D Metaxas. arXiv preprint arXiv:1710.10916, 2017
Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks.H Zhang, T Xu, H Li, S Zhang, X Wang, X Huang, D Metaxas. IEEE Int. Conf. Comput. Vision (ICCV), 5907-5915, 2017.
Calculus and PDEs, Basic Optimization Methods, Deep Neural Nets, Numerical analysis.
Dr. Dimitris Metaxas is a Distinguished Professor and Chair of the Computer Science Department at Rutgers University. He is director of the Center for Computational Biomedicine, Imaging and Modeling (CBIM). He has also been a tenured faculty member in the Computer and Information Science Department of the University of Pennsylvania. Prof. Metaxas received a Diploma with highest honors in Electrical Engineering and Computer Science from the National Technical University of Athens Greece, an M.Sc. in Computer Science from the University of Maryland, College Park, and a Ph.D. in Computer Science from the University of Toronto. Dr. Metaxas has been conducting research towards the development of formal methods to advance understanding of complex scenes and human movement, multimodal aspects of human language and ASL medical imaging, computer vision, computer graphics. His research emphasizes the development of formal models for shape and motion representation and understanding, deterministic and statistical object modeling and tracking, deformable models, sparse learning methods for segmentation, generative adversarial networks, and augmenting neural net methods for understanding. Dr. Metaxas has published over 500 research articles in these areas and has graduated 46 PhD students. The above research has been funded by NSF, NIH, ONR, AFOSR, DARPA, HSARPA and the ARO. Dr. Metaxas has received several best paper awards, and he has 7 patents. He was awarded a Fulbright Fellowship in 1986, is a recipient of an NSF Research Initiation and Career awards, an ONR YIP, and is a Fellow of the MICCAI Society, a Fellow the American Institute of Medical and Biological Engineers and a Fellow of IEEE. He has been involved with the organization of several major conferences in vision and medical image analysis, including ICCV 2007, ICCV 2011, MICCAI 2008 and CVPR 2014.
I-Requisites for a Cognitive Architecture
• Processing in space
• Processing in time and memory
• Top down and bottom processing
• Extraction of information from data with generative models
II- Putting it all together
• Empirical Bayes with generative models
• Clustering of time series with linear state models
• Information Theoretic Autoencoders
III- Current work
• Extraction of time signatures with kernel ARMA
• Attention Based video recognition
• Augmenting Deep Learning with memory
Jose C. Principe is a Distinguished Professor of Electrical and Computer Engineering at the University of Florida where he teaches advanced signal processing, machine learning and artificial neural networks (ANNs). He is Eckis Professor and the Founder and Director of the University of Florida Computational NeuroEngineering Laboratory (CNEL) www.cnel.ufl.edu. The CNEL Lab innovated signal and pattern recognition principles based on information theoretic criteria, as well as filtering in functional spaces. His secondary area of interest has focused in applications to computational neuroscience, Brain Machine Interfaces and brain dynamics. Dr. Principe is a Fellow of the IEEE, AIMBE, and IAMBE. Dr. Principe received the Gabor Award, from the INNS, the Career Achievement Award from the IEEE EMBS and the Neural Network Pioneer Award, of the IEEE CIS. He has more than 33 patents awarded over 800 publications in the areas of adaptive signal processing, control of nonlinear dynamical systems, machine learning and neural networks, information theoretic learning, with applications to neurotechnology and brain computer interfaces. He directed 93 Ph.D. dissertations and 65 Master theses. He wrote in 2000 an interactive electronic book entitled 'Neural and Adaptive Systems' published by John Wiley and Sons and more recently co-authored several books on 'Brain Machine Interface Engineering' Morgan and Claypool, 'Information Theoretic Learning', Springer, 'Kernel Adaptive Filtering', Wiley and 'System Parameter Adaption: Information Theoretic Criteria and Algorithms', Elsevier. He has received four Honorary Doctor Degrees, from Finland, Italy, Brazil and Colombia, and routinely serves in international scientific advisory boards of Universities and Companies. He has received extensive funding from NSF, NIH and DOD (ONR, DARPA, AFOSR).
Speech conveys many types of information to the listener. Beyond just the words, the speech signal provides information about what language is being spoken, who is speaking, the emotional state of the speaker, and the acoustic environment in which the speech is occurring. Such extra-word information can be useful for many areas such as secure access, device personalization, audio searching, and medical interactions. Powerful machine learning techniques, including statistical, geometric, and neural pattern recognition, have been successfully applied over several decades to successfully and effectively build systems for automatically recognizing these types of characteristics from challenging, real-world speech recordings. In this tutorial we will introduce the audience to the fundamentals behind speaker, language, and emotion recognition, going from the science behind speech production to the machine learning building blocks underpinning modern recognition systems. We will describe the details of implementing these recognition systems covering the critical role of data in the training and testing of systems. Important areas of domain adaptation, channel compensation, diarization, and effective evaluation design and interpretation will also be covered.
D. A. Reynolds, T. F. Quatieri, R. B. Dunn, 'Speaker verification using adapted Gaussian mixture models', Digital signal processing 10 (1-3), 19-41 F Bimbot, et al, 'A tutorial on text-independent speaker verification', EURASIP Journal on Advances in Signal Processing 2004
T. Kinnunen, H. Li, 'An overview of text-independent speaker recognition: From features to supervectors', Speech Communication, Volume 52, Issue 1, 2010, Pages 12-40
N. Dehak, P. J. Kenny, R. Dehak, P. Dumouchel and P. Ouellet, 'Front-End Factor Analysis for Speaker Verification,' in IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 4, pp. 788-798, May 2011
N Dehak, PA Torres-Carrasquillo, D Reynolds, R Dehak, 'Language recognition via i-vectors and dimensionality reduction', Interspeech 2012
F. Richardson, D. Reynolds and N. Dehak, 'Deep Neural Network Approaches to Speaker and Language Recognition,' in IEEE Signal Processing Letters, vol. 22, no. 10, pp. 1671-1675, Oct. 2015
D Snyder, D Garcia-Romero, G Sell, D Povey, 'X-vectors: Robust DNN embeddings for speaker recognition', ICASSP 2018
Some knowledge of digital signal processing, probability and statistics, and linear algebra
Douglas Reynolds is a senior member of the technical staff at MIT Lincoln Laboratory, where he provides technical oversight of the speech projects in speaker and language recognition and speech-content based information retrieval. Dr. Reynolds joined the Human Language Technology Group as a member of technical staff in 1992 conducting research in the areas of robust speaker recognition (identification and verification), transient classification and robust speech representations for recognition. During this period, he invented and developed several widely used techniques in the area of speaker recognition, such as robust modeling with GMMs, application of a universal background model to text-independent recognition tasks, the use of Bayesian adaptation to train and update speaker models, fast scoring techniques for GMM based systems, the development and use of a handset/channel-type detector, and several normalization techniques based on the handset/channel-type detector. In 2002, Dr. Reynolds led the SuperSID project at the JHU Summer Workshop where new approaches to exploiting high-level information for improved speaker recognition were explored. These and other ideas have been implemented in the Lincoln speaker recognition system which has won several annual international speaker recognition evaluations conducted by the National Institute of Standards and Technology.
Dr. Reynolds is a Fellow of the IEEE, a member of the IEEE Signal Processing Society's Speech Technical Committee, and has worked to launch the Odyssey Speaker Recognition Workshop series.
Najim Dehak received his PhD from School of Advanced Technology, Montreal in 2009. During his PhD studies he worked with the Computer Research Institute of Montreal, Canada. He is well known as a leading developer of the I-vector representation for speaker recognition. He first introduced this method, which has become the state-of-the-art in this field, during the 2008 summer Center for Language and Speech Processing workshop at Johns Hopkins University. This approach has become one of most known speech representations in the entire speech community.
Dr. Dehak is currently a faculty member of the Department of Electrical & Computer Engineering at Johns Hopkins University. Prior to joining Johns Hopkins, he was a research scientist in the Spoken Language Systems Group at the MIT Computer Science and Artificial Intelligence Laboratory. His research interests are in machine learning approaches applied to speech processing, audio classification, and health applications. He is a senior member of IEEE and member of the IEEE Speech and Language Technical Committee.
This course will deal with injection of deep learning algorithms into multimodal and multisensorial signal analysis such as from audio, video, or physiological signals. Methods shown will, however, be applicable to a broad range of further signals. We will first deal with pre-processing such as by autoencoders and feature representation learning such as by convolutional neural networks first as basis for end-to-end learning from raw signals. Then, we shall discuss modelling for decision making such as by recurrent neural networks with long-short-term memory or gated recurrent units. We will also elaborate on the impact of topologies including multiple targets with shared layers and bottlenecks, and how to move towards self-shaping networks in the sense of Automatic Machine Learning. In a last part, we will deal with data efficiency, such as by weak supervision with the human in the loop based on active and semi-supervised learning, transfer learning, or generative adversarial networks. The content shown will be accompanied by open-source implementations of according toolkits available on github. Application examples will come from the domains of Affective Computing, Multimedia Retrieval, and mHealth.
1) Pre-Processing and Feature Representation Learning (AEs, CNNs, end-to-end)
2) Modelling for Decision Making (Feature Space Optimisation, Topologies, AutoML)
3) Data Efficiency (GANs, Transfer Learning, Weak Supervision)
The Handbook of Multimodal-Multisensor Interfaces. Vol. 2, S. Oviatt, B. Schuller, P.R. Cohen, D. Sonntag, G. Potamianos, A. Krüger (eds.), 2018 (forthcoming) https://github.com/end2you/end2you
Attendees should be familiar with Machine Learning and Neural Networks in general. They should further have basic knowledge of Signal Processing.
Björn W. Schuller received his diploma, doctoral degree, habilitation, and Adjunct Teaching Professor all in EE/IT from TUM in Munich/Germany. He is the Head of GLAM - the Group on Language Audio & Music - at Imperial College London/UK, Full Professor and ZD.B Chair of Embedded Intelligence for Health Care and Wellbeing at the University of Augsburg/Germany, co-founding CEO of audEERING, and permanent Visiting Professor at HIT/China. Before, he was Full Professor at the University of Passau/Germany, with Joanneum Research in Graz/Austria, and the CNRS-LIMSI in Orsay/France. He is a Fellow of the IEEE, President-Emeritus of the AAAC, and Senior Member of the ACM. He (co-)authored 700+ publications (18000+ citations, h-index=66), and is the Editor in Chief of the IEEE Transactions on Affective Computing, General Chair of ACII 2019, ACII Asia 2018, and ACM ICMI 2014, and a Program Chair of Interspeech 2019, ACM ICMI 2019/2013, ACII 2015/2011, and IEEE SocialCom 2012 amongst manifold further commitments and service to the community. His 20+ awards include having been honoured as one of 40 extraordinary scientists under the age of 40 by the WEF in 2015. He served as Coordinator/PI in 10+ European Projects, is an ERC Starting Grantee, and consultant of companies such as GN, Huawei or Samsung.
This presentation will primarily focus on learning algorithms with reduced iterations or no iterations at all. Some of the algorithms have closed form solutions. Some of the algorithms do not adjust the structures once constructed. The main algorithms considered in this talk are randomized neural networks, kernel ridge regression and random forest. These non-iterative methods have attracted attention of researchers due to their high performance in terms of accuracy as well as their ability to train fast due to their non-iterative properties or closed form training solutions. For example, the random forest deliver the top classification performance. The presentation will also include the basic methods as well as their state of the art realizations. These algorithms will be benchmarked using classification, time series forecasting and visual tracking datasets. Future research directions will also be suggested.
Non-iterative algorithms or algorithms with closed-form training solutions
Randomization based neural networks and their variants
Kernel Ridge Regression and their variants
Random Forest and their variants
Applications of the above methods in classification, time series and visual tracking
Benchmarking of these methods
(Additional References will be included in the lecture materials)
X Qiu, PN Suganthan, GAJ Amaratunga, Ensemble incremental learning Random Vector Functional Link network for short-term electric load forecasting
Knowledge-Based Systems 145, 182-196, 2018.
L Zhang, PN Suganthan, Benchmarking Ensemble Classifiers with Novel Co-Trained Kernel Ridge Regression and Random Vector Functional Link Ensembles [Research Frontier], IEEE Computational Intelligence Magazine 12 (4), 61-72, 2017.
L Zhang, PN Suganthan, Visual tracking with convolutional random vector functional link network, IEEE Transactions on Cybernetics 47 (10), 3243-3253.
L Zhang, PN Suganthan, Robust visual tracking via co-trained Kernelized correlation filters, Pattern Recognition 69, 82-93, 2017.
L Zhang, PN Suganthan, A survey of randomized algorithms for training neural networks, Information Sciences 364, 146-155, 2016.
L Zhang, PN Suganthan, Oblique decision tree ensemble via multisurface proximal support vector machine, IEEE Transactions on Cybernetics 45 (10), 2165-2176, 2015.
Basic knowledge of neural networks, pattern classification, decision trees will be advantageous.
Ponnuthurai Nagaratnam Suganthan (or P N Suganthan) received the B.A degree, Postgraduate Certificate and M.A degree in Electrical and Information Engineering from the University of Cambridge, UK in 1990, 1992 and 1994, respectively. After completing his PhD research in 1995, he served as a pre-doctoral Research Assistant in the Dept of Electrical Engineering, University of Sydney in 1995–96 and a lecturer in the Dept of Computer Science and Electrical Engineering, University of Queensland in 1996–99. He moved to NTU in 1999. He is an Editorial Board Member of the Evolutionary Computation Journal, MIT Press. He is an associate editor of the IEEE Trans on Cybernetics (2012 - ), IEEE Trans on Evolutionary Computation (2005 -), Information Sciences (Elsevier) (2009 - ), Pattern Recognition (Elsevier) (2001 - ) and Int. J. of Swarm Intelligence Research (2009 - ) Journals. He is a founding co-editor-in-chief of Swarm and Evolutionary Computation (2010 - ), an SCI Indexed Elsevier Journal. His co-authored SaDE paper (published in April 2009) won the 'IEEE Trans. on Evolutionary Computation outstanding paper award' in 2012. His former PhD student, Dr Jane Jing Liang, won the IEEE CIS Outstanding PhD dissertation award, in 2014. His research interests include swarm and evolutionary algorithms, pattern recognition, big data, deep learning and applications of swarm, evolutionary & machine learning algorithms. He was selected as one of the highly cited researchers by Thomson Reuters in 2015, 2016 , and 2017 in computer science. He served as the General Chair of the IEEE SSCI 2013. He has been a member of the IEEE since 1990 and Fellow since 2015. He was an elected AdCom member of the IEEE Computational Intelligence Society (CIS) in 2014-2016. Google Scholar: http://scholar.google.com.sg/citations?hl=en&user=yZNzBU0AAAAJ&view_op=list_works&pagesize=100
Neural networks & Deep learning and Support vector machines & Kernel methods have been among the most powerful and successful techniques in machine learning and data driven modelling. Initially, in artificial neural networks, the use of one hidden layer feedforward networks was common because of their universal approximation property. However, the existence of many local minima solutions in the training process was encountered as a drawback. Therefore, support vector machines and kernel methods became widely used, relying on solving convex optimization problems in classification and regression. In the meantime, computing power has increased and data have become abundantly available in many applications. As a result, currently one can afford training deep models consisting of (many) more layers and interconnection weights. Examples of successful deep learning models are convolutional neural networks, stacked autoencoders, deep Boltzmann machines, deep generative models and generative adversarial networks. In this course we will explain several synergies between neural networks, deep learning, least squares support vector machines and kernel methods. A key role at this point is played by primal and dual model representations and different duality principles. In this way the bigger picture will be revealed for neural networks, deep learning and kernel machines, and future perspectives will be outlined.
The material is organized into 3 parts:
- Part I Neural networks, support vector machines and kernel methods - Part II Restricted Boltzmann machines, kernel machines and deep learning - Part III Deep restricted kernel machines and future perspectives In Part I a basic introduction is given to support vector machines (SVM) and kernel methods with emphasis on their artificial neural networks (ANN) interpretations. The latter can be understood in view of primal and dual model representations, expressed in terms of the feature map and the kernel function, respectively. Related to least squares support vector machines (LS-SVM) such characterizations exist for supervised and unsupervised learning, including classification, regression, kernel principal component analysis (KPCA), kernel spectral clustering (KSC), kernel canonical correlation analysis (KCCA), and other. Primal and dual representations are also relevant in order to obtain efficient training algorithms, tailored to the nature of the given application (high dimensional input spaces versus large data sizes). Application examples are given e.g. in black-box weather forecasting, pollution modelling, prediction of energy consumption, and community detection in networks.
In Part II we explain how to obtain a so-called restricted kernel machine (RKM) representation for least squares support vector machine related models. By using a principle of conjugate feature duality it is possible to obtain a similar representation as in restricted Boltzmann machines (RBM) (with visible and hidden units), which are used in deep belief networks (DBN) and deep Boltzmann machines (DBM). The principle is explained both for supervised and unsupervised learning. Related to kernel principal component analysis a generative model is obtained within the restricted kernel machine framework. In such a generative model the trained model is able to generate new data examples.
In Part III deep restricted kernel machines (Deep RKM) are explained which consist of restricted kernel machines taken in a deep architecture. In these models a distinction is made between depth in a layer sense and depth in a level sense. Links and differences with stacked autoencoders and deep Boltzmann machines are given. The framework enables to conceive both deep feedforward neural networks (DNN) and deep kernel machines, through primal and dual model representations. In this case one has multiple feature maps over the different levels in companion with multiple kernel functions. By fusing the objectives of the different levels (e.g. several KPCA levels followed by an LS-SVM classifier) in the deep architecture, the training process becomes faster and gives improved solutions. Different training algorithms and methods for large data sets will be discussed.
Finally, based on the newly obtained insights, future perspectives and challenges will be outlined.
Bengio Y., Learning deep architectures for AI, Boston: Now, 2009.
Fischer A., Igel C., Training restricted Boltzmann machines: An introduction. Pattern Recognition, 47, 25-39, 2014.
Goodfellow I., Bengio Y., Courville A., Deep learning, Cambridge, MA: MIT Press, 2016.
Hinton G.E., What kind of graphical model is the brain?, In Proc. 19th International Joint Conference on Artificial Intelligence, pp. 1765-1775, 2005.
Hinton G.E., Osindero S., Teh Y.-W., A fast learning algorithm for deep belief nets, Neural Computation, 18, 1527-1554, 2006.
LeCun Y., Bengio Y., Hinton G., Deep learning, Nature, 521, 436-444, 2015.
Lin H.W., Tegmark M., Rolnick D., Why does deep and cheap learning work so well?, Journal of Statistical Physics 168 (6), 1223-1247, 2017.
Mall R., Langone R., Suykens J.A.K., Multilevel Hierarchical Kernel Spectral Clustering for Real-Life Large Scale Complex Networks, PLOS ONE, e99966, 9(6), 1-18, 2014.
Mehrkanoon S., Suykens J.A.K., Deep hybrid neural-kernel networks using random Fourier features, Neurocomputing, Vol. 298, pp. 46-54, July 2018.
Salakhutdinov R., Learning deep generative models, Annu. Rev. Stat. Appl., 2, 361-385, 2015.
Scholkopf B., Smola A., Learning with kernels, Cambridge, MA: MIT Press, 2002.
Schreurs J., Suykens J.A.K., Generative Kernel PCA, ESANN 2018.
Suykens J.A.K., Van Gestel T., De Brabanter J., De Moor B., Vandewalle J., Least squares support vector machines, Singapore: World Scientific, 2002.
Suykens J.A.K., Alzate C., Pelckmans K., Primal and dual model representations in kernel-based learning, Statistics Surveys, vol. 4, pp. 148-183, Aug. 2010.
Suykens J.A.K., Deep Restricted Kernel Machines using Conjugate Feature Duality, Neural Computation, vol. 29, no. 8, pp. 2123-2163, Aug. 2017.
Vapnik V., Statistical learning theory, New York: Wiley, 1998.
Basics of linear algebra
Johan A.K. Suykens was born in Willebroek Belgium, May 18 1966. He received the master degree in Electro-Mechanical Engineering and the PhD degree in Applied Sciences from the Katholieke Universiteit Leuven, in 1989 and 1995, respectively. In 1996 he has been a Visiting Postdoctoral Researcher at the University of California, Berkeley. He has been a Postdoctoral Researcher with the Fund for Scientific Research FWO Flanders and is currently a full Professor with KU Leuven. He is author of the books 'Artificial Neural Networks for Modelling and Control of Non-linear Systems' (Kluwer Academic Publishers) and 'Least Squares Support Vector Machines' (World Scientific), co-author of the book 'Cellular Neural Networks, Multi-Scroll Chaos and Synchronization' (World Scientific) and editor of the books 'Nonlinear Modeling: Advanced Black-Box Techniques' (Kluwer Academic Publishers), 'Advances in Learning Theory: Methods, Models and Applications' (IOS Press) and 'Regularization, Optimization, Kernels, and Support Vector Machines' (Chapman & Hall/CRC). In 1998 he organized an International Workshop on Nonlinear Modelling with Time-series Prediction Competition. He has served as associate editor for the IEEE Transactions on Circuits and Systems (1997-1999 and 2004-2007), the IEEE Transactions on Neural Networks (1998-2009) and the IEEE Transactions on Neural Networks and Learning Systems (from 2017). He received an IEEE Signal Processing Society 1999 Best Paper Award and several Best Paper Awards at International Conferences. He is a recipient of the International Neural Networks Society INNS 2000 Young Investigator Award for significant contributions in the field of neural networks. He has served as a Director and Organizer of the NATO Advanced Study Institute on Learning Theory and Practice (Leuven 2002), as a program co-chair for the International Joint Conference on Neural Networks 2004 and the International Symposium on Nonlinear Theory and its Applications 2005, as an organizer of the International Symposium on Synchronization in Complex Networks 2007, a co-organizer of the NIPS 2010 workshop on Tensors, Kernels and Machine Learning, and chair of ROKS 2013. He has been awarded an ERC Advanced Grant 2011 and 2017, and has been elevated IEEE Fellow 2015 for developing least squares support vector machines.
The past few years have seen a dramatic increase in the performance of recognition systems thanks to the introduction of deep networks for representation learning. However, the mathematical reasons for this success remain elusive. For example, a key issue is that the neural network training problem is nonconvex, hence optimization algorithms are not guaranteed to return a global minima. The first part of this tutorial will overview recent work on the theory of deep learning that aims to understand how to design the network architecture, how to regularize the network weights, and how to guarantee global optimality. The second part of this tutorial will present sufficient conditions to guarantee that local minima are globally optimal and that a local descent strategy can reach a global minima from any initialization. Such conditions apply to problems in matrix factorization, tensor factorization and deep learning. The third part of this tutorial will present an analysis of dropout for matrix factorization, and establish connections
1. Introduction to Deep Learning Theory: Optimization, Regularization and Architecture Design
2. Global Optimality in Matrix Factorization
3. Global Optimality in Tensor Factorization and Deep Learning
4. Dropout as a Low-Rank Regularizer for Matrix Factorization
Basic understanding of sparse and low-rank representation and non-convex optimization.
Rene Vidal is a Professor of Biomedical Engineering and the Innaugural Director of the Mathematical Institute for Data Science at The Johns Hopkins University. His research focuses on the development of theory and algorithms for the analysis of complex high-dimensional datasets such as images, videos, time-series and biomedical data. Dr. Vidal has been Associate Editor of TPAMI and CVIU, Program Chair of ICCV and CVPR, co-author of the book 'Generalized Principal Component Analysis' (2016), and co-author of more than 200 articles in machine learning, computer vision, biomedical image analysis, hybrid systems, robotics and signal processing. He is a fellow of the IEEE, IAPR and Sloan Foundation, a ONR Young Investigator, and has received numerous awards for his work, including the 2012 J.K. Aggarwal Prize for ``outstanding contributions to generalized principal component analysis (GPCA) and subspace clustering in computer vision and pattern recognition” as well as best paper awards in machine learning, computer vision, controls, and medical robotics.
The goal is to introduce the recent advances in object tracking based on deep learning and related approaches. Performance evaluation and challenging factors in this field will be discussed.
Brief history of visual tracking
Deep learning methods
Challenges and future research directions
Y. Wu, J. Lim, and M.-H. Yang, Object Tracking Benchmark, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015.
H. Nam and B. Han, Learning Multi-domain Convolutional Neural Networks for Visual Tracking, CVPR, 2016.
M. Danelljan, G. Bhat, F. Khan, M. Felsberg, ECO: Efficient Convolution Operators for Tracking. CVPR, 2017.
Basic knowledge in computer vision and intermediate knowledge in deep learning.
Ming-Hsuan Yang is a Professor of Electrical Engineering and Computer Science at University of California, Merced, and a visiting researcher at Google Cloud. He serves as a program co-chair of IEEE International Conference on Computer Vision (ICCV) in 2019, program co-chair of Asian Conference on Computer Vision (ACCV) in 2014, and general co-chair of ACCV 2016. He has served as an associate editor of the IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) from 2007 to 2011, and currently serves as an associate editor of the International Journal of Computer Vision (IJCV), Computer Vision and Image Understanding (CVIU), Image and Vision Computing (IVC) and Journal of Artificial Intelligence (JAIR). Yang received the Google Faculty Award in 2009, and the Distinguished Early Career Research Award from the UC Merced Senate in 2011. Yang is a recipient of the Faculty Early Career Development (CAREER) award from the National Science Foundation in 2012. In 2015, Yang receives the Distinguished Research Award from UC Merced Senate. He is a senior member of the IEEE and the ACM.
This lecture will give a brief introduction of convolutional neural network. The convolution, pooling, and fully-connected layers shall be introduced. The neuroscience under CNN shall be discussed. The hyperparameter optimization of CNN shall be presented. Several typical convolutional neural networks shall be analyzed and compared, including LeNet, AlexNet, VGG, NiN, GoogleNet, ResNet, etc. CNN in segmentation shall be briefly discussed. State-of-the-art examples will be used to illustrate CNN approaches.
(i) ImageNet and ILSVRC
(ii) Convolutional neural network, Convolution layers, pooling layer
(iii) Drop out; Batch normalization; data augmentation
(iv) Neuroscientific basis, Random search, LeNet
(v) Transfer learning, AlexNet, 1x1 convolution, VGG
(vi) Network in network, GoogleNet, ResNet
(vii) R-CNN, Fast(er) R-CNN, Mask R-CNN
(viii) Application to cerebral microbleeding, radar imaging, etc.
1. Deng, J., W. Dong, R. Socher, L. J. Li, L. Kai and F.-F. Li (2009). ImageNet: A large-scale
hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition. 248-255.
2. Ioffe, S. and C. Szegedy (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167.
3. Bergstra, J. and Y. Bengio (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research 13(Feb): 281-305.
4. Krizhevsky, A., I. Sutskever and G. E. Hinton (2012). ImageNet classification with deep convolutional neural networks. Advances in neural information processing systems. 1097-1105.
5. Simonyan, K. and A. Zisserman (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
6. Szegedy, C., W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke and A. Rabinovich (2015). Going deeper with convolutions. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.
7. He, K., X. Zhang, S. Ren and J. Sun (2016). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770-778.
8. Goodfellow, I., Y. Bengio, A. Courville and Y. Bengio (2016). Deep learning, MIT press Cambridge.
Linear Algebra and Calculus, Probability and Statistics, Basics of Image Processing, Pattern Recognition and Computer Vision
Dr. Yu-Dong Zhang now serves as Professor (Permanent) in Department of Informatics, University of Leicester, UK. He is the guest professor in Henan Polytechnic University, China. His research interest is deep learning in signal processing and medical image processing.