(Rutherford Appleton Laboratory, UK Science and Technology Facilities Council)
Big Scientific Data and Data Science 
(Georgia Institute of Technology)
High Performance Computational Biology [intermediate]
David A. Bader
(Georgia Institute of Technology)
Massive-scale Graph Analytics [introductory/intermediate]
Ümit V. Çatalyürek
(Georgia Institute of Technology)
HPC Graph Analytics [introductory/intermediate]
(Massachusetts Institute of Technology)
Julia, with an Introduction to Performance and Machine Learning [introductory]
(Georgia Institute of Technology)
Parallel Discrete Event Simulation [intermediate]
Timothy C. Germann
(Los Alamos National Laboratory)
HPC Frontiers in Computational Materials Science and Engineering [intermediate]
(University of Houston)
Energy Efficient Computing [introductory/intermediate]
(University of Zürich)
Code Performance Optimizations [introductory/intermediate]
(Pacific Northwest National Laboratory)
Modern C++ for High-performance Computing [intermediate/advanced]
(Virginia Polytechnic Institute and State University)
Massively Interacting Bio-social Systems: Pervasive, Personalized and Precision Analytics [introductory/advanced]
(North Carolina State University)
How to Parallelize Your Code: Taking Stencils from OpenMP to MPI, CUDA and TensorFlow [introductory/intermediate]
(Virginia Polytechnic Institute and State University)
Revealing Parallelism: How to Decompose your Problem into Concurrent Tasks [introductory/intermediate]
(Georgia Institute of Technology)
Fundamentals of Parallel, Concurrent, and Distributed Programming [introductory]
(University of Illinois at Urbana-Champaign)
Programming Models and Run-times for High-Performance Computing [introductory]
(University of Illinois at Urbana-Champaign)
Parallel Computer Architecture Concepts [intermediate/advanced]
Todd J. Treangen
(University of Maryland, College Park)
Metagenomic Assembly and Validation [intermediate]
(University of Southampton)
Hands-on Introduction to HPC for Life Scientists [introductory]
(University of Maryland, College Park)
Parallel Algorithmic Thinking and How It Has Been Affecting Architecture [introductory/intermediate]
Parallel Programming with OpenMP, MPI, and CUDA [intermediate]
There is broad recognition within the scientific community that the ongoing deluge of scientific data is fundamentally transforming academic research. “The Fourth Paradigm” refers to the new ‘data intensive science’ and the tools and technologies needed to manipulate, analyze, visualize, and manage large amounts of research data. This talk will review the challenges posed by the growth of ‘Experimental and Observational Data’ (EOD) generated by the new generation of large-scale experimental facilities at the UK’s Harwell site near Oxford. The talk will conclude with a discussion of the use of experimental scientific ‘benchmarks’ for training the scientist users of these facilities in Machine Learning and data science technologies.
Tony Hey began his career as a theoretical physicist with a doctorate in particle physics from the University of Oxford in the UK. After a career in physics that included research positions at Caltech and CERN, and a professorship at the University of Southampton in England, he became interested in parallel computing and moved into computer science. In the 1980’s he was one of the pioneers of distributed memory message-passing computing and co-wrote the first draft of the successful MPI message-passing standard. After being both Head of Department and Dean of Engineering at Southampton, Tony Hey was appointed to lead the U.K.’s ground-breaking ‘eScience’ initiative in 2001. He recognized the importance of Big Data for science and wrote one of the first papers on the ‘Data Deluge’ in 2003. He joined Microsoft in 2005 as a Vice President and was responsible for Microsoft’s global university research engagements. He worked with Jim Gray and his multidisciplinary eScience research group and edited a tribute to Jim called ‘The Fourth Paradigm: Data-Intensive Scientific Discovery.’ Hey left Microsoft in 2014 and spent a year as a Senior Data Science Fellow at the eScience Institute at the University of Washington. He returned to the UK in November 2015 and is now Chief Data Scientist at the Science and Technology Facilities Council. In 1987 Tony Hey was asked by Caltech Nobel physicist Richard Feynman to write up his ‘Lectures on Computation’. This covered such unconventional topics as the thermodynamics of computing as well as an outline for a quantum computer. Feynman’s introduction to the workings of a computer in terms of the actions of a ‘dumb file clerk’ was the inspiration for Tony Hey’s attempt to write ‘The Computing Universe’, a popular book about computer science. Tony Hey is a fellow of the AAAS and of the UK's Royal Academy of Engineering. In 2005, he was awarded a CBE by Prince Charles for his ‘services to science.’
In just a little over a decade, the cost of sequencing a complex organism such as the human dwindled from the $100 million range to sub $1000 range. This rapid decline is brought about by the advent of a number of high-throughput sequencing technologies, collectively known as next generation sequencing. Their usage has become ubiquitous, enabling single investigators with modest budgets to carry out what could only be accomplished by a network of major sequencing centers just over a decade ago. As a result, routine biological investigations must now deal with big data sets. The rate and scale of data generation is pushing the limits of bioinformatcis software. Rapid development of parallel methods, and community-driven specifications and implementations of parallel open source software libraries, can assist the bioinformatics community in developing HPC solutions to meet this need. This course will survey the state-of-the-art and teach parallel algorithms for a variety of problems arising in bioinformatics and computational biology.
1. Introduction to computational biology and biological data sets
2. Parallel algorithms for sequence alignments
3. Parallel methods for storing and analyzing k-mers
4. Suffix Arrays, Trees, and how to construct them in parallel
5. Parallel methods for clustering and assembly
6. Parallel construction of gene regulatory networks
Basic understanding of sequential and parallel algorithms, and high performance computing systems. The ideal student will have familiarity with design and analysis of basic parallel algorithms (prefix sums, matrix multiplication, etc.), and is interested in learning parallel algorithms in computational biology.
1. S. Aluru, N. Futamura and K. Mehrotra, 'Parallel biological sequence comparison using prefix computations,' Journal of Parallel and Distributed Computing, Vol. 63, No. 3, pp. 264-272, 2003.
2. T. Pan, P. Flick, C. Jain, Y. Liu and S. Aluru, 'Kmerind: A flexible parallel library for k-mer indexing of biological sequences on distributed memory systems,' IEEE Transactions on Computational Biology and Bioinformatics, doi:10.1109/TCBB.2017.2760829, 15 pages, 2017.
3. P. Flick and S. Aluru, 'Parallel distributed memory construction of suffix and longest common prefix arrays,' Proc. ACM/IEEE Supercomputing Conference (SC), 10 pages, 2015.
4. P. Flick and S. Aluru, 'Parallel construction of suffix trees and the all-nearest-smaller-values problem,' Proc. 31st International Parallel and Distributed Processing Symposium (IPDPS), pp. 12-21, 2017.
5. A. Kalyanaraman, S.J. Emrich, P.S. Schnable and S. Aluru, 'Assembling genomes on large-scale parallel computers,' Journal of Parallel and Distributed Computing (special issue on IPDPS 2006 best papers), Vol. 67, pp. 1240-1255, 2007.
6. J. Zola, M. Aluru, A. Sarje and S. Aluru, 'Parallel information theory based construction of gene regulatory networks,' IEEE Transactions on Parallel and Distributed Systems, Vol. 21, No. 12, pp. 1721-1733, 2010.
Srinivas Aluru is co-Executive Director of the Interdisciplinary Research Institute in Data Engineering and Science (IDEaS), and a professor in the School of Computational Science and Engineering at Georgia Institute of Technology. He co-leads the NSF South Big Data Regional Innovation Hub and the NSF Transdisciplinary Research Institute for Advancing Data Science. Aluru conducts research in high performance computing with emphasis on parallel algorithms and applications, data science, bioinformatics and systems biology, combinatorial scientific computing, and applied algorithms. He pioneered the development of parallel methods in bioinformatics and systems biology, and contributed to assembly of genomes and metagenomes, next generation sequencing bioinformatics, and gene network inference and analysis. He is currently serving as the Chair of the ACM Special Interest Group on Bioinformatics, Computational Biology and Biomedical Informatics (SIGBIO). He is a recipient of the NSF Career award, IBM faculty award, Swarnajayanti Fellowship from the Government of India, John. V. Atanasoff Discovery Award from Iowa State University, and the Outstanding Senior Faculty Research award and the Dean’s award for faculty excellence at Georgia Tech. He is a Fellow of the AAAS and IEEE, and a recipient of the IEEE Computer Society Meritorious Service and Golden Core awards.
Emerging real-world graph problems include detecting community structure in large social networks, improving the resilience of the electric power grid, and detecting and preventing disease in human populations. Unlike traditional applications in computational science and engineering, solving these problems at scale often raises new challenges because of sparsity and the lack of locality in the data, the need for additional research on scalable algorithms and development of frameworks for solving these problems on high performance computers, and the need for improved models that also capture the noise and bias inherent in the torrential data streams. In this course, students will be exposed to the opportunities and challenges in massive data-intensive computing for applications in computational biology, genomics, and security.
This course will introduce students to designing high-performance and scalable algorithms for massive graph analysis. The course focuses on algorithm design, complexity analysis, experimentation, and optimization, for important 'big data' graph problems. Students will develop knowledge and skills concerning:
-the design and analysis of massive-scale graph algorithms employed in real-world data-intensive applications, and
-performance optimization of applications using the best practices of algorithm engineering.
An increasingly fast-paced, digital world has produced an ever-growing volume of petabyte-sized datasets. At the same time, terabytes of new, unstructured data arrive daily. As the desire to ask more detailed questions about these massive streams has grown, parallel software and hardware have only recently begun to enable complex analytics in this non-scientific space.
In this course, we will discuss the open problems facing us with analyzing this “data deluge”. Students will learn the design and implementation of algorithms and data structures capable of analyzing spatio-temporal data at massive scale on parallel systems. Students will understand the difficulties and bottlenecks in parallel graph algorithm design on current systems and will learn how multithreaded and hybrid systems can overcome these challenges. Students will gain hands-on experience mapping large-scale graph algorithms on a variety of parallel architectures using advanced programming models.
Understanding of the design and analysis of algorithms
David A. Bader is Professor and Chair of the School of Computational Science and Engineering, College of Computing, at Georgia Institute of Technology. He is a Fellow of the IEEE and AAAS and served on the White House's National Strategic Computing Initiative (NSCI) panel. Dr. Bader serves as a board member of the Computing Research Association, on the NSF Advisory Committee on Cyberinfrastructure, on the Council on Competitiveness High Performance Computing Advisory Committee, on the IEEE Computer Society Board of Governors, and on the Steering Committees of the IPDPS and HiPC conferences. He is the editor-in-chief of IEEE Transactions on Parallel and Distributed Systems, and is a National Science Foundation CAREER Award recipient. Dr. Bader is a leading expert in data sciences. His interests are at the intersection of high-performance computing and real-world applications, including cybersecurity, massive-scale analytics, and computational genomics, and he has co-authored over 210 articles in peer-reviewed journals and conferences. During his career, Dr. Bader has served as PI/coPI of over $179M of competitive awards with over $41.1M of this brought into his institution. Dr. Bader has served as a lead scientist in several DARPA programs including High Productivity Computing Systems (HPCS) with IBM PERCS, Ubiquitous High Performance Computing (UHPC) with NVIDIA ECHELON, Anomaly Detection at Multiple Scales (ADAMS), Power Efficiency Revolution For Embedded Computing Technologies (PERFECT), and Hierarchical Identify Verify Exploit (HIVE). He has also served as Director of the Sony-Toshiba-IBM Center of Competence for the Cell Broadband Engine Processor. Bader is a co-founder of the Graph500 List for benchmarking 'Big Data' computing platforms. Bader is recognized as a 'RockStar' of High Performance Computing by InsideHPC and as HPCwire's People to Watch in 2012 and 2014. Dr. Bader also serves as an associate editor for several high impact publications including IEEE Transactions on Computers, ACM Transactions on Parallel Computing, and ACM Journal of Experimental Algorithmics.
Graphs became de facto standard for modeling complex relations and networks in computers. With an increase in the size of the graphs and the complexity of the analyses to perform on them, many software systems have been designed to leverage modern high performance computing platforms. Some of them provide very productive programming environment for graph analysis, however, they cannot get even close to single threaded performance. In this lecture series, we will present and discuss techniques for developing high performance of graph analytics on modern computer architectures.
HPC Graph Analytics: Enterprise Graph Frameworks vs HPC Graph Analytics.
Design Choices: Exact Algorithms, Approximations and Heuristics.
Graph Storage Formats.
Know your Graph: Graph Manipulations for Fast Analysis.
Sparse Matrix-based Graph Algorithms.
Compiler optimizations or the lack of for Graph kernels
Vectorization of Graph Kernels.
Distributed Memory Graph Algorithms.
Basic understanding of the design and analysis of algorithms; good knowledge of C/C++, good understanding of modern computer architecture and compilers.
F. McSherry, M. Isard, and D. G. Murray. 'Scalability! But at what COST?' HotOS, 2015.
D. Bozdağ, A. Gebremedhin, F. Manne, E.G. Boman, and Ü.V. Çatalyürek, “A Framework for Scalable Greedy Coloring on Distributed Memory Parallel Computers,' Journal of Parallel and Distributed Computing, Vol. 68, No. 4, pp. 515-535, Apr 2008.
A.E. Sarıyüce, E. Saule, K. Kaya, and Ü.V. Çatalyürek, “Regularizing Graph Centrality Computations', Journal of Parallel and Distributed Computing, Vol. 76, pp. 106-119, Feb 2015.
A.E. Sarıyüce, K. Kaya, E. Saule, and Ü.V. Çatalyürek, “Graph Manipulations for Fast Centrality Computation', ACM Transactions on Knowledge Discovery from Data, Vol. 11, No. 3, Apr 2017.
E. Saule and Ü.V. Çatalyürek, “An Early Evaluation of the Scalability of Graph Algorithms on the Intel MIC Architecture', Proceedings of 26th International Symposium on Parallel and Distributed Processing, Workshops and PhD Forum (IPDPSW), Workshop on Multithreaded Architectures and Applications (MTAAP), May 2012.
A. Yoo, E. Chow, K. Henderson, W. McLendon, B. Hendrickson, and Ü.V. Çatalyürek, “A Scalable Distributed Parallel Breadth-First Search Algorithm on BlueGene/L', Proceedings of SC2005 High Performance Computing, Networking, and Storage Conference, Nov 2005,
Ü.V. Çatalyürek, J. Feo, A.H. Gebremedhin, M. Halappanavar, and A. Pothen, “Graph Coloring Algorithms for Multi-core and Massively Multithreaded Architectures', Parallel Computing, Vol. 38, No. 10-11, pp. 576-594, Oct-Dec, 2012.
Ümit V. Çatalyürek is currently professor and associate chair of the School of Computational Science and Engineering in the College of Computing at the Georgia Institute of Technology. He received his PhD, MS and BS in Computer Engineering and Information Science from Bilkent University, in 2000, 1994 and 1992, respectively. Professor Çatalyürek is a Fellow of the IEEE, a member of the Association for Computing Machinery (ACM) and the Society for Industrial and Applied Mathematics, and the elected chair for the IEEE’s Technical Committee on Parallel Processing for 2016-2017. He is also vice-chair for the ACM’s Special Interest Group on Bioinformatics, Computational Biology and Biomedical Informatics for the 2015-2018 term. He currently serves as the editor-in-chief for Parallel Computing, as an editorial board member for IEEE Transactions on Parallel and Distributed Computing, and on the program and organizing committees of numerous international conferences. His main research areas are in parallel computing, combinatorial scientific computing and biomedical informatics. He has co-authored more than 200 peer-reviewed articles, invited book chapters and papers. More information about Dr. Çatalyürek and his research group can be found at http://cc.gatech.edu/~umit.
Programming tools for applied mathematics and science have historically been pulled in three directions. System flexibility enables the development of complex new ideas. Programming simplicity boosts productivity. Application performance allows large problems to be solved quickly. Historically, systems for technical-but-not-computer-science programmers have failed to achieve all three goals simultaneously. Faced with systems that exhibit shortcomings on at least one of these fronts, users have been forced to make frustrating trade-offs among flexibility, productivity, and performance, often coping with the complexities of coding different parts of their application in different languages to achieve their goals. The Julia language addresses all three concerns by providing a programming language that is fast, flexible, and easy to use.
'Julia: looks like Python, feels like Lisp, runs like C'
What makes Julia Special
Julia solves the two language problem
Why multiple dispatch matters
B. Machine Learning
'No more mini-languages'
Deep Neural Nets
C. Performance Comptuing
'You can’t have a good parallel language until you have a good serial language'
Faster Serial Computing
Helpful if you download Julia to your laptop or have tried juliabox.com before the lectures.
Alan Edelman is Professor of Applied Mathematics, and member of MIT's Computer Science & AI Lab. He has received many prizes for his work on computing and mathematics, including a Gordon Bell Prize, a Householder Prize, and a Charles Babbage Prize, is a fellow of IEEE, AMS, and SIAM, and is a founder of Interactive Supercomputing, Inc. and Julia Computing, Inc. Edelman's research interests include Julia, machine learning, high-performance computing, numerical computation, linear algebra and random matrix theory. He has consulted for Akamai, IBM, Pixar, and Microsoft among other corporations.
Discrete event simulation is an approach to modeling and simulation used in many applications such as manufacturing, logistics and supply chains, transportation, telecommunications, and complex systems, among others. This course is concerned with algorithms and computational issues that arise in executing discrete event simulations on parallel and distributed computers. This series of lectures will cover the fundamental principles and underlying algorithms used in parallel discrete event simulation. Distributed simulation standards such as the High Level Architecture (HLA) will be discussed that enable separately developed simulations to interoperate. Current research trends and challenges will be discussed.
Discrete event simulation, applications
Parallel discrete event simulation and the synchronization problem
Conservative synchronization algorithms
Advanced conservative synchronization techniques
Optimistic synchronization and Time Warp
Global Virtual Time algorithms
Advanced optimistic synchronization algorithms
Interoperability and the High Level Architecture
Basic knowledge of computer algorithms and software; knowledge of discrete event simulation is not required.
- R. M. Fujimoto, Parallel and Distributed Simulation Systems, WileyInterscience 2000.
- F. Kuhl, R. Weatherly, J. Dahmann, Creating Computer Simulation Systems: An Introduction to the High Level Architecture for Simulation, Prentice Hall, 1999.
- R. M. Fujimoto, Research Challenges in Parallel and Distributed Simulation, ACM Transactions on Modeling and Computer Simulation, Vol. 24, No. 4, March 2016.
Richard Fujimoto is a Regents Professor in the School of Computational Science and Engineering at the Georgia Institute of Technology. He received his Ph.D. from the University of California at Berkeley in 1983 and two B.S. degrees from the University of Illinois at Urbana-Champaign in 1977 and 1978. He was the founding chair of the School of Computational Science (CSE) at Georgia Tech where he led the creation of the PhD and MS degree programs in CSE as well as two undergraduate minors. He has been an active researcher in the parallel and distributed simulation field since 1985. His publications include seven award winning papers. He led the definition of the time management services for the High Level Architecture for Modeling and Simulation standard (IEEE 1516). Fujimoto has served as Co-Editor- in-chief of the journal Simulation: Transactions of the Society for Modeling and Simulation International as well as a founding area editor for ACM Transactions on Modeling and Computer Simulation. He received the ACM Distinguished Contributions in Modeling and Simulation Award and is a Fellow of the ACM.
High-performance computing has revolutionized the world of computational materials science and engineering, including large-scale initiatives such as the Integrated Computational Materials Engineering (ICME) and Materials Genome Initiative (MGI). The length and time scales involved span from the nanometers and femtoseconds of electronic structure to the meters and years of structural engineering. In this course, I will survey the state-of-the art in this field, including key algorithmic methods, codes, and examples of their applications. Following this broad survey, I will focus specifically on scientific drivers and algorithmic and computational challenges for atomistic simulation, highlighting tradeoffs between accuracy, length, and time for classical molecular dynamics simulations of materials response to extreme mechanical and radiation environments. I will conclude with a discussion of current research efforts to tackle these challenges, specifically for the upcoming era of exascale computing.
Lecture 1: Overview of computational materials science and engineering methods, from electrons to atoms to microstructure. Methods: density functional theory, molecular dynamics, kinetic Monte Carlo, accelerated molecular dynamics, dislocation dynamics, phase field, and continuum mechanics. Codes: VASP, Abinit, Quantum Espresso, LAMMPS, SPaSM, ddcMD, ParaDIS.
Lecture 2: State-of-the-art atomistic simulation algorithms and applications. Interatomic potentials, including machine learning (GAP, SNAP, etc) and learn-on-the-fly methods. Applications to radiation damage evolution, shock physics, and materials dynamics.
Lecture 3: Advancing the frontiers of atomistic simulation in accuracy, length, and time for the US Exascale Computing Project.
(Beginner) Michael P. Allen and Dominic J. Tildesley, 'Computer Simulation of Liquids,' second edition (Oxford University Press, 2017) – updated version of the classic 1987 reference.
(Beginner) Dennis C. Rapaport, 'The Art of Molecular Dynamics Simulation,' second edition (Cambridge University Press, 2004).
(Intermediate) LAMMPS Documentation, http://lammps.sandia.gov/doc/Manual.html
(Intermediate) A. F. Voter, F. Montalenti, and T. C. Germann, “Extending the time scale in atomistic simulation of materials,” Annual Review of Materials Research 32, 321-346 (2002).
(Advanced) R. J. Zamora, B. P. Uberuaga, D. Perez, and A. F. Voter, 'The Modern Temperature-Accelerated Dynamics Approach,' Annual Review of Chemical and Biomolecular Engineering 7, 87-110 (2016).
(Advanced) D. Perez, E. D. Cubuk, A. Waterland, E. Kaxiras, and A. F. Voter, 'Long-Time Dynamics through Parallel Trajectory Splicing,' J. Chem. Theory Comput. 12(1), 18-28 (2016).
(Advanced) M.A. Meyers, H. Jarmakani, E.M. Bringa, and B.A. Remington, 'Dislocations in Shock Compression and Release,' in Dislocations in Solids, Vol 15, J. P. Hirth and L. Kubin, eds. (North-Holland, 2009), pp. 91–197. http://meyersgroup.ucsd.edu/papers/journals/Meyers%20321.pdf
Lecture 1 is intended to be generally accessible and provide the basic background in molecular dynamics simulation techniques required for Lectures 2 and 3. Alternatively, anyone familiar with the Allen & Tildesley or Rapaport textbooks (or similar ones) would be well prepared for Lectures 2 and 3.
Timothy C. Germann is in the Physics and Chemistry of Materials Group (T-1) at Los Alamos National Laboratory (LANL), where he has worked since 1997. Dr. Germann received a Ph.D. in Chemical Physics from Harvard University in 1995, where he was a DOE Computational Science Graduate Fellow. At LANL, Tim has used large-scale classical MD simulations to investigate shock, friction, detonation, and other materials dynamics issues using leadership-class supercomputers. He was the Director of the Exascale Co-Design Center for Materials in Extreme Environments (ExMatEx) and currently directs the Co-design center for Particle Applications (CoPA), as part of the Exascale Computing Project (ECP). Dr. Germann is a Fellow of the American Physical Society (APS), past chair of the APS Division of Computational Physics from 2011-5, and chair-elect of the APS Topical Group on Shock Compression of Condensed Matter (2017-20). He has received an IEEE Gordon Bell Prize, the LANL Fellows' Prize for Research, the LANL Distinguished Copyright Award, and an R&D 100 Award, and is a member of the DOE Advanced Scientific Computing Advisory Committee.
Nowadays, developers of software applications, who wants to write efficient code, face the major challenge to keep pace with the increasing complexity of computing hardware. Writing optimal implementations requires the developer to have an understanding of the target platform's architecture, algorithms, and capabilities and limitations of compilers. A well-optimized code may gain as one or two orders of magnitude in performance with respect to a loosely optimized implementation.
The aim of the lectures is to give the attendees a practical introduction to performance optimization and monitoring on Linux, based on a good understanding of modern computer architectures. While the focus will be on C++ and Fortran, programmers of other languages will also benefit from the performance monitoring and hardware related classes. Special emphasis will be given to optimizations on x86 based architectures, with examples of optimizations in real scientific computing applications.
1.Overview of modern computing hardware
2.Scalability in software and hardware
5.Understanding performance tuning
Good knowledge of at least one programming language between C++ and Fortran; Basic understanding of modern computer architecture and compilers; Knowledge of Linux
'Intel 64 and IA-32 Architectures Optimization Reference Manual'. Intel website, new versions are produced regularly.
'Performance Optimization of Numerically Intensive Codes'. Stefan Goedecker and Adolfy Hoisie, SIAM (2001).
'Introduction to High Performance Computing for Scientists and Engineers'. Georg Hager and Gerhard Wellein, CRC Press (2010).
I earned my Ph.D. in experimental Particle Physics at the University of Milan in 2007. During that period and in the following 3 years, I worked on a project of optimization and parallelization of data analysis software, collaborating with HPC Cineca group at Bologna and the ROOT team at CERN. In 2010 I joined CERN openlab with a COFUND-CERN and Marie Curie fellowship. Within openlab, I worked on the optimization and parallelization of software used in High Energy Physics community for many-cores systems, in collaboration with Intel. I developed a C++ prototype for Maximum Likelihood fitting, which was also ported to GPUs and Intel Xeon Phi accelerators. Between September 2012 and December 2014, I was an application analyst at CRAY, based at CSCS (Lugano, Switzerland). My main duty was to support CSCS user community for the CRAY software products. Furthermore, I worked on porting applications to the new CRAY systems at CSCS (e.g. Piz Daint), especially for the GPUs usage. In 2015-2016, I was a postdoctoral research associate at ETH Zurich, working in the CP2K team (Department of Materials, Nanoscale Simulations group), under the Swiss PASC project. Since January 2017 I have been working at the University of Zurich (Department of Chemistry, Computational Chemistry group), where I lead the development of the DBCSR library for Sparse Matrix-Matrix multiplications.
Since its creation by Bjarne Stroustrup in the early 1980s, C++ has steadily evolved to become a multi-paradigm programming language that fully supports the needs of modern programmers -- and of high-performance computing. This course will introduce and explain C++ holistically (rather than as an extension of C or of earlier versions of C++), which will be of interest even to experienced programmers. Taking a very careful slice of C++ focused on the needs of scientific and high performance application developers, the course will emphasize current best practices for high-performance and high-quality scientific software using modern C++ (up through C++17). A unifying theme throughout the course will be the use of abstraction for expressiveness and for performance.
o) Introduction. High-level overview of modern C++, programming principles, and C++ core guidelines.
o) Variables, built-in data types, references and values. Functions and function overloading. Passing and returning variables.
o) Classes (user defined types). Resource management, constructors, destructors, copying and moving. Some standard library types.
o) Matrices and vectors.
o) Sparse matrices. Function overloading, ad-hoc polymorphism, operator overloading.
o) Performance tuning and optimization. Abstraction penalty.
o) Multithreading. Async, futures, atomics.
o) MPI and C++. Boost.MPI.
o) C++ standard library, templates, parametric polymorphism, generic programming.
Students should have some experience in programming and linear algebra as well as basic familiarity with parallel computing concepts.
o) Bjarne Stroustrup. The C++ Programming Language. Pearson Education.
o) Bjarne Stroustrup. A Tour of C++ (C++ In-Depth Series). Pearson Education.
o) Bjarne Stroustrup. Programming: Principles and Practice Using C++. Pearson Education.
o) C++ Core Guidelines. http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines
o) ISO standard: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/n4659.pdf
o) Peter Gottschling. Discovering Modern C++: An Intensive Course for Scientists, Engineers, and Programmers (1st ed.). Addison-Wesley Professional.
o) Alexander Stepanov and Paul McJones. Elements of Programming (1st ed.). Addison-Wesley Professional.
o) Alexander A. Stepanov and Daniel E. Rose. From Mathematics to Generic Programming (1st ed.). Addison-Wesley Professional.
o) Online C++ reference. http://en.cppreference.com/
o) University of Washington AMATH 483/583 Introduction to High-Performance Computing.
Andrew Lumsdaine is Chief Scientist at the Northwest Institute for Advanced Computing (NIAC), with dual appointments at Pacific Northwest National Laboratory and the University of Washington. Prior to joining NIAC, he was Professor of Computer Science and the Associate Dean for Research in the School of Informatics and Computing and Director of the Center for Research in Extreme Scale Technologies (CREST) at Indiana University. Lumsdaine received his Ph.D. from MIT in 1992 and was a faculty member in the Department of Computer Science and Engineering at the University of Notre Dame from 1992 to 2001. His research interests include computational science and engineering, parallel and distributed computing, parallel graph algorithms, generic programming, and computational photography. Lumsdaine is a member of ACM, IEEE, and SIAM and participated actively in the MPI Forum and the ISO C++ standards committee.
Developing practical informatics tools and decision support environments for reasoning about real-world social habitats is complicated and scientifically challenging due to their size, co-evolutionary nature and the need for representing multiple dynamical processes simultaneously. The 2014 Ebola epidemic, global migration, societal impacts of natural and human initiated disasters and the effect of climate change provide examples of the many challenges faced when developing such environments. Recent quantitative changes in computing and information sciences have created new opportunities to create innovative tools and technologies in this area that can help advance the science of cities.
The course will focus on elements of high performance network and information science required to support policy informatics as it pertains to the study of socio-technical systems such as public health epidemiology and urban science. Understanding these systems is of interest but beyond that, they serve as excellent 'model organisms' for developing a theory of co-evolving complex networks. We will describe high performance computing oriented methods for simulation, analytics, machine learning and decision support for such systems. Practical usefulness of these methods will be described via well chosen case studies. We will end the series with a number of challenges and open questions for future research.
Lecture 1: Introduction to the topic:
-- Motivation and role of HPC in bio-social systems
-- Cover four roles of HPC: simulation, analytics, machine learning and decision making.
-- Describe real world work going on this in this domain
-- Introduce graphical models as representations for bio-social systems.
-- Introduce three components: structural analysis of networks, dynamics over network,
control and optimization of dynamics over networks
Lecture 2: Structural analysis of networks and dynamics over networks
-- Introduce important structural analysis problems and discuss parallel algorithms
for them. Specific topics we will consider include:
(i) discovering and counting network motifs, (ii) random graph generation,
-- Discuss dynamics over networks and focus on contagion dynamics over networks.
Discuss its applicability to social, biological and epidemic science.
Discuss parallel algorithms for contagions over networks.
-- Continue dynamics over networks,
-- Forecasting, control and optimization over networks
-- Discuss two real world case studies: Spread of infectious diseases & urban science. Discuss
pervasive, personalized and precision analytics in these areas that can be supported using HPC.
-- Synthetic social habitats -- a data structure that represents at-scale realistic high resolution models of
functioning bio-social habitats
-- Showcase tools built by researchers
-- Conclude with challenges, open questions and opportunities.
Basic undergraduate degree in Computer science or equivalent training.
M Marathe, AKS Vullikanti
Communications of the ACM 56 (7), 88-96
What factors might have led to the emergence of Ebola in West Africa?
KA Alexander, CE Sanderson, M Marathe, BL Lewis, CM Rivers, ...
PLoS neglected tropical diseases 9 (6), 2015
High performance informatics for pandemic preparedness
KR Bisset, S Eubank, MV Marathe
Proceedings of the Winter Simulation Conference, 72
Fujimoto, Richard. 'Parallel and distributed simulation.'
Proceedings of the 2015 Winter Simulation Conference. IEEE Press, 2015.
Parallel algorithms for switching edges in heterogeneous graphs
H Bhuiyan, M Khan, J Chen, M Marathe
Journal of Parallel and Distributed Computing 104, 2017
A fast parallel algorithm for counting triangles in graphs using dynamic load balancing
S Arifuzzaman, M Khan, M Marathe
Big Data (Big Data), 2015 IEEE International Conference on, 1839-1847
Dynamics over networks
Agent-Based Modeling and High Performance Computing
M Alam, V Abedi, J Bassaganya-Riera, K Wendelsdorf, K Bisset, X Deng, ...
Computational Immunology: Models and Tools, 79
Modelling disease outbreaks in realistic urban social networks
S Eubank, H Guclu, VSA Kumar, MV Marathe, A Srinivasan, Z Toroczkai, ...
Nature 429 (6988), 180-184
EpiSimdemics: an efficient algorithm for simulating the spread of infectious disease
over large realistic social networks
CL Barrett, KR Bisset, SG Eubank, X Feng, MV Marathe Proceedings of the
2008 ACM/IEEE conference on Supercomputing, 2008
EpiFast: a fast algorithm for large scale realistic epidemic simulations on distributed memory systems
KR Bisset, J Chen, X Feng, VS Kumar, MV Marathe
Proceedings of the 23rd international conference on Supercomputing, 430-439
I ndemics: An interactive high-performance computing framework for data-intensive epidemic modeling
KR Bisset, J Chen, S Deodhar, X Feng, Y Ma, MV Marathe
ACM Transactions on Modeling and Computer Simulation (TOMACS) 24 (1), 4
Reasoning about mobile malware using high performance computing based population scale models
K Channakeshava, K Bisset, MV Marathe, AKS Vullikanti
Proceedings of the 2014 Winter Simulation Conference, 3048-3059
Perumalla, Kalyan S., and Sudip K. Seal. 'Discrete event modeling and massively parallel
execution of epidemic outbreak phenomena.' Simulation 88.7 (2012): 768-783.
Mubarak, M., Carothers, C. D., Ross, R. B., & Carns, P. (2017).
Enabling parallel simulation of large-scale hpc network systems.
IEEE Transactions on Parallel and Distributed Systems, 28(1), 87-100.
Bast, Hannah, Daniel Delling, Andrew Goldberg, Matthias Müller-Hannemann,
Thomas Pajor, Peter Sanders, Dorothea Wagner, and Renato F. Werneck.
Route planning in transportation networks.'
In Algorithm engineering, pp. 19-80. Springer, Cham, 2016.
Bazzan, Ana LC, and Franziska Klügl.
'A review on agent-based technology for traffic and transportation.'
The Knowledge Engineering Review 29, no. 3 (2014): 375-403.
Control and Optmization
Inhibiting diffusion of complex contagions in social networks: theoretical and experimental results
CJ Kuhlman, VSA Kumar, MV Marathe, SS Ravi, DJ Rosenkrantz
Data mining and knowledge discovery 29 (2), 423-465
Interaction-based HPC modeling of social, biological, and economic contagions over large networks
K Bisset, J Chen, CJ Kuhlman, VS Kumar, MV Marathe
Proceedings of the winter simulation conference, 2938-2952
Spatio-temporal optimization of seasonal vaccination using a metapopulation model of influenza
S Venkatramanan, J Chen, S Gupta, B Lewis, M Marathe, H Mortveit, ...
Healthcare Informatics (ICHI), 2017 IEEE International Conference on, 134-143
Madhav Marathe is the director of the Network Dynamics and Simulation Science Laboratory and professor in the department of computer science, Virginia Tech. His research interests are in computational epidemiology, network science, design and analysis of algorithms, computational complexity, communication networks and high performance computing. Before coming to Virginia Tech, he was a Team Leader in the Computer and Computational Sciences division at the Los Alamos National Laboratory (LANL) where he led the basic research programs in foundations of computing and high performance simulation science for analyzing extremely large socio-technical and critical infrastructure systems. He is a Fellow of the IEEE, ACM and AAAS.
This course prepares you for the task of analyzing a sequential code for its potential benefit for parallelization with hands-on coding experience. It then walks you through the steps of data and code transformations using a sample stencil code. This code is subsequently parallelized for a number of common paradigms and architectures: (1) OpenMP data parallelism of the number of cores on a shared-memory architecture; (2) MPI message passing over a set of nodes in a distributed architecture; (3) hybrid OpenMP+MPI; (4) CUDA data parallelism for GPUs; (5) hybrid MPI+GPU with multiple GPU nodes; (6) OpenAcc for pragma-based GPU parallelism; (7) TensorFlow. Each paradigm is introduced in terms of its API before an exemplary application to the stencil code is discussed and benchmarked
By the end of the course, you should be able to (1) understand the concepts of parallel computing and parallel systems' architecture, (2) assess parallelization potentials of a sequential algorithm, (3) parallelize a sequential algorithm over a diverse set of architectures, and (4) assess the performance of the resulting code.
Parallel Programming: For Multicore and Cluster Systems by Thomas Rauber, Gudula Rünger, Springer 2010
Programming in C, Linux
Frank Mueller is a Professor in Computer Science and a member of multiple research centers at North Carolina State University. Previously, he held positions at Lawrence Livermore National Laboratory and Humboldt University Berlin, Germany. He received his Ph.D. from Florida State University in 1994. He has published papers in the areas of parallel and distributed systems, embedded and real-time systems and compilers. He is a member of ACM SIGPLAN, ACM SIGBED and a senior member of the ACM and an IEEE Fellow. He is a recipient of an NSF Career Award, an IBM Faculty Award, a Google Research Award and two Fellowships from the Humboldt Foundation.
This lecture will present general approaches to decompose a problem into independent tasks that can be solved simultaneously. We will discuss specifications of parallel algorithms, problem decomposition, methodical design steps, and examples of parallel algorithms to solve discrete and continuous computational problems.
The lectures will cover methodical design steps including partitioning, communication, and mapping. Partitioning methods discussed include: Direct decomposition (for ideally/embarrassingly parallel problems); Data decomposition (for problems involving similar computations applied to various data sets); functional decomposition (for applications composed of loosely coupled heavy parts); recursive decomposition (for D&C problems); exploratory decomposition (for state-space problems); speculative decomposition (for branch problems); and pipelined decomposition (for solving multiple instances of the same problem). Techniques for mapping tasks to processes, communication and load balancing issues, and tradeoffs for performance will be discussed. Examples of problems and the resulting parallel solution algorithms will be given.
Appropriate references will be provided in the lecture notes (slides) for the course.
Familiarity with algorithms will be assumed, at the level typically taught at undergraduate studies of engineering/computer science.
Adrian Sandu is a full professor of Computer Science at Virginia Tech, USA. His research interests include numerical algorithms, statistical computing, high performance computing, and inverse problems. Prof. Sandu made multiple contributions to algorithms for solving large scale problems in the simulation of the Earth system, and engineering and biological systems. He serves in editorial positions with SIAM Journal on Uncertainty Quantification, Applied Numerical Mathematics, Geoscientific Model Development, and International Journal of Computer Mathematics. Sandu is a Distinguished Scientist of the ACM and an Honorary Fellow of the European Society of Computational Methods in Science and Engineering.
This course provides a graduate-level introduction to the fundamentals of parallel, concurrent, and distributed programming. The goal is to prepare you for studying advanced topics in these areas, and for picking up any specific parallel, concurrent, or distributed programming system that you may encounter in the future. For parallelism, you will learn the fundamentals of task parallelism, functional parallelism, loop-level parallelism, and data flow parallelism. For concurrency, you will learn the fundamentals of multithreading with locks, isolation and transactions, and actors. For distribution, you will learn the fundamentals of message-passing, map-reduce, actor, and global address space frameworks, as well as the integration of multithreading with these distributed frameworks. Underlying theoretical concepts related to parallelism, locality, data races, determinism, deadlock, and livelock will be covered across all three topics. Examples will be given from popular frameworks based on C++ and Java, as well as from newer programming models, so as to provide the necessary background for further exploration of these topics.
Session 1: Fork-join task parallelism; functional parallelism with promises and futures; loop-level parallelism with barriers and reductions; data flow parallelism with task dependencies; computation graphs; ideal parallelism; data races; functional and structural determinism.
Session 2: Threads and locks; critical sections; isolation; transactional memory; actors; linearizability of concurrent data structures; progress guarantees including deadlock and livelock freedom; memory consistency models.
Session 3: Message-passing; collective operations; map-reduce frameworks; distributed actors; partitioned global address space systems; integration of multithreading with distribution.
Basic knowledge of sequential algorithms and data structures; prior knowledge of parallelism, concurrency and distribution is not required.
There is no single reference that covers the breadth of topics in this course. However, any of the following references can serve as useful further reading after the course, depending on the research directions that you plan to pursue in the future. Optionally reading part or all of any one of these references before the course may also help prepare you with questions that you may have about the fundamentals of parallelism, concurrency and distribution.
* 'Using OpenMP: Portable Shared Memory Parallel Programming', by Barbara Chapman, Gabriele Jost, and Ruud van der Pas.
* 'The Art of Multiprocessor Programming', by Maurice Herlihy and Nir Shavit.
* 'Using MPI - 2nd Edition: Portable Parallel Programming with the Message Passing Interface', by William Gropp, Ewing Lusk, and Anthony Skjellum.
* 'UPC: Distributed Shared-Memory Programming', by Tarek El-Ghazawi, William Carlson, Thomas Sterling, and Katherine Yelick.
* 'Actors in Scala', by Philipp Haller and Frank Sommers.
* 'X10: an object-oriented approach to non-uniform cluster computing', Recipient of Most Influential Paper Award for OOPSLA 2005.
* Coursera specialization on 'Parallel, Concurrent, and Distributed Programming in Java', https://www.coursera.org/specializations/pcdp, for an undergraduate-level introduction to course material. (The lectures cover general concepts, with Java-specific material primarily in optional mini-projects.)
Vivek Sarkar joined Georgia Tech in August 2017 as a Professor in the School of Computer Science, and the Stephen Fleming Chair for Telecommunications. Prior to that, he was the E.D. Butcher Chair in Engineering at Rice University during 2007 - 2017, where he also served as Chair of the Department of Computer Science during 2013 - 2016. While at Rice, Sarkar's Habanero Extreme Scale Software Research Group conducted research on programming models, compilers, and runtime systems for homogeneous and heterogeneous parallel computing, with the goal of unifying parallelism and concurrency concepts across high-end computing, multicore, and embedded systems. Earlier, Sarkar was Senior Manager of Programming Technologies at IBM's T.J. Watson Research Center. His research projects at IBM included the X10 programming language, the Jikes Research Virtual Machine open source project, the ASTI optimizer used in IBM’s XL Fortran product compilers, and the PTRAN automatic parallelization system. Sarkar received his Ph.D. from Stanford University in 1987, became a member of the IBM Academy of Technology in 1995, and was inducted as an ACM Fellow in 2008. He has been serving as a member of the US Department of Energy’s Advanced Scientific Computing Advisory Committee (ASCAC) since 2009, and on CRA’s Board of Directors since 2015.
A variety of programming models and systems have been developed, in order to support shared memory and distributed memory programming. It is not always clear which model fits which types of machines and algorithms. The lecture will provide a tour of their landscape and will present a taxonomy of parallel programming models that can help evaluating their fitness to particular problems
SIMD, SIMT, SIMD, Data parallelism, control parallelism, shared memory, PGAS, distributed memory, loop parallelism, task parallelism, virtualization, MPI, Shmem, OpenMP, UPC, TBB, Charm++, Legion
General knowledge of parallel architectures and parallel computing
Gropp, William and Snir, Marc, Programming for Exascale Computers, Journal of Computing in Science and Engineering, vol. 15:6, pp. 27-35, 2013.
Marc Snir is Michael Faiman Professor in the Department of Computer Science at the University of Illinois at Urbana-Champaign. He currently pursues research in parallel computing. Marc Snir received a Ph.D. in Mathematics from the Hebrew University of Jerusalem in 1979, worked at NYU on the NYU Ultracomputer project in 1980-1982, and was at the Hebrew University of Jerusalem in 1982-1986, before joining IBM, where he led a team that developed the initial software of the IBM SP and Blue Gene products. Marc Snir was a major contributor to the design of the Message Passing Interface. He has published numerous papers and given many presentations on computational complexity, parallel algorithms, parallel architectures, interconnection networks, parallel languages and libraries and parallel programming environments.Marc is AAAS Fellow, ACM Fellow and IEEE Fellow. He has Erdos number 2 and is a mathematical descendant of Jacques Salomon Hadamard. He recently won the IEEE Award for Excellence in Scalable Computing and the IEEE Seymour Cray Computer Engineering Award.
The goal is to introduce the students to the central aspects of shared-memory computer architecture. The emphasis is on the basic ideas and the current state of the art. Ties to programming will also be emphasized.
Introduction to Architectures and Programming
Scalable Hardware Cache Coherence
Memory Consistency Models
Multiple Processors on a Chip
Speculative Parallelization and Execution
Processor and Memory Integration
Basic computer organization course
Parallel Computer Organization and Design by Dubois, Annavaram and Stenstrom, published by Cambridge University Press, 2012
D. Lenoski at al. 'The Directory-Based Cache Coherence Protocol for the DASH Multiprocessors'. ISCA 1990.
K. Gharachorloo et al. 'Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors'. ISCA 1990.
J. Goodman et al. 'Efficient Synchronization Primitives for Large Scale Cache-Coherent Multiprocessors'. ASPLOS 1989.
J. Mellor-Crummey and M. Scott. 'Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors'. ACM TOCS 1991.
D. Tullsen et al. 'Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor'. ISCA 1996.
J. Martinez et al. 'Speculative Synchronization: Applying Thread-Level Speculation to Expliticly Parallel Applications'. ASPLOS 2002
Josep Torrellas is the Saburo Muroga Professor of Computer Science at the University of Illinois at Urbana-Champaign (UIUC). He is the Director of the Center for Programmable Extreme Scale Computing, and past Director of the Illinois-Intel Parallelism Center (I2PC). He is a Fellow of IEEE, ACM, and AAAS. He received the IEEE Computer Society 2015 Technical Achievement Award, for 'Pioneering contributions to shared-memory multiprocessor architectures and thread-level speculation', and the 2017 UIUC Campus Award for Excellence in Graduate Student Mentoring. He is a member of the Computing Research Association (CRA) Board of Directors. He has served as the Chair of the IEEE Technical Committee on Computer Architecture (TCCA) (2005-2010) and as a Council Member of CRA’s Computing Community Consortium (CCC) (2011-2014). He was a Willett Faculty Scholar at UIUC (2002-2009). As of 2017, he has graduated 36 Ph.D. students, who are now leaders in academia and industry. He received a Ph.D. from Stanford University.
The DNA data deluge is upon us thanks to the incredible advances in DNA sequencing technology. Datasets comprising billions to trillions of reads are not uncommon, and hundreds of thousands of draft and complete genomes are now available from public databases. This course will cover HPC techniques for genomic and metagenomic assembly of these massive sequencing datasets.
Session I: Introduction to genomics and metagenomics
-What is a genome?
-What is a metagenome?
-Early advances and remaining HPC Challenges
Session II: HPC strategies for genomic and metagenomic assembly
-Hands on exercise
Session III: HPC strategies for validation
-De novo genome assembly validation
-De novo metagenome assembly validation
-Hands on exercise
Familiarity with algorithms, computer programming, and statistics will be assumed, at the level typically taught at undergraduate studies of engineering/computer science. Basic understanding of genome assembly and DNA sequencing would be helpful, but is not required.
References will be provided in the lecture notes (slides) for the course.
Dr. Todd J. Treangen is an Assistant Research Scientist at the Center for Bioinformatics and Computational Biology (CBCB) and the Assistant Director of the Center for Health-related Informatics and Bioimaging (CHIB) at the University of Maryland College Park. Prior to joining CBCB and CHIB, he was a Principal Investigator within the Genomics group at the National Biodefense Analysis and Countermeasures Center (NBACC). He received his Ph.D. in Computer Science from the Technical University of Catalonia (Barcelona, Spain). His research interests lieat the intersection of computer science and genomics, and is focused on the development of novel algorithms, methods, and software for the analysis of genomes and metagenomes
This course provides both a general introduction to High Performance Computing (HPC) and a practical experience of running simple jobs on a real Computer Cluster. Examples in this course are based on tools for Next Generation Sequencing, however it is not limited to students from Life Science, anybody considering HPC for their research can benefit from it.
* What is a Supercomputer or HPC? Who uses it?
* HPC Hardware - Building blocks and architectures;
* Choices - private HPC vs Clouds; Galaxy project;
* Using HPC systems - access, using command line, navigating file system;
* Parallel Computing for Pedestrians;
* Batch system, job types, job control;
* Things to consider - performance, parallel filesystems, resource allocations and costs;
Familiarity with desktop computers is presumed but no programming, Linux or HPC experience is required.However, some experience with Linux CLI (Command Line Interface) is highly desirable.Users are expected to bring their own laptops for practical exercises.
Elena Vataga is a Senior Research Commuting Specialist in University of Southampton - home to one of the biggest University Owned High Performance Computing facilities. She received her PhD in Particle Physics from Moscow State University in 1997 and for many years worked on major accelerators in international centres like Fermilab and CERN. Supercomputers are fundamental to High Energy Physics and during 15 years in academic research Elena gained extensive experience and build her expertise in different aspects of research computing, gradually her interests shifted from High Energy Physics to High Performance Computing. At present, being a member of HPC team, Elena is involved in operation and procurement for University Research Infrastructure and actively works with local research community facilitating their access to HPC.
Serial computing evolved from a handful of scientific applications to broad use. But, how did the founders manage to build a foundation that later allowed so many programmers with different levels of skill to expand it to applications whose scope and breadth were unimagined by them? Suggesting the possible profound insights that guided serial computing founders, we will contrast serial computing with parallel computing, where a gap between programming and architecture appears irreconcilable. We will review: (i) A long term vision for closing the gap, driven by parallel algorithmic thinking. (ii) Comprehensive hardware and software prototping of this vision, cluminating in our 2018 paper [GVB]. And, (iii) evidence that: 1. The gap is not as big as prevailing beliefs suggest, 2. Some vendors have already made remarkable headway towards significantly narrowing the gap, and 3. It is feasible to close the gap. The course will provide necessary foundations for the vision, prototyping and evidence, and as much understanding of them as fits its time constraints.
Familiarity with basic algorithms, data structures and computer architecture will be assumed.
F. Ghanim, U. Vishkin, R. Barua. Easy PRAM-based high-performance parallel programming with ICE. IEEE Transactions on Parallel and Distributed Systems
U. Vishkin. Thinking in Parallel: Some Basic Data-Parallel Algorithms and Techniques http://legacydirs.umiacs.umd.edu/~vishkin/PUBLICATIONS/classnotes.pdf
U. Vishkin. Using simple abstraction to reinvent computing for parallelism. Communications of the ACM (CACM) 54,1, pages 75-85, January, 2011 or directly https://dl.acm.org/citation.cfm?id=1866757
F. Ghanim, R. Barua and U. Vishkin. Easy PRAM-based High-performance Parallel Programming with ICE. IEEE Transactions on Parallel and Distributed Systems 29:2, Feb. 2018. Paper.
X. Wen and U. Vishkin. FPGA-based prototype of a PRAM-on-chip processor, ACM Computing Frontiers, Ischia, Italy, May 5-7, 2008. Download TR (pdf, 12 pages)
U. Vishkin. Is Multicore Hardware for General-Purpose Parallel Processing Broken? Communications of the ACM (CACM), Volume 57, No. 4, pages 35-39, April 2014. https://cacm.acm.org/magazines/2014/4/173217-is-multicore-hardware-for-general-purpose-parallel-processing-broken/fulltext
Uzi Vishkin has been Professor at the University of Maryland Institute for Advanced Computer Studies (UMIACS) since 1988. Prior affiliations included Technion, IBM T.J. Watson, NYU, and Tel Aviv University. Per his ACM Fellow citation, he “played a leading role informing and shaping what thinking in parallel has come to mean in the fundamental theory of Computer Science”. Later, his team’s work on his explicit multi-threaded (XMT) many-core architecture refuted the common wisdom that the richest theory of parallel algorithms, known as PRAM, is irrelevant for practice. He is an ISI-Thompson Highly Cited Researcher and a Maryland Innovator of the Year for his XMT venture.
This course will introduce students to high performance computing (HPC) on modern parallel computing systems. At the end of the course students should be aware of the important distinctions between shared and distributed memory models, and between data and task based parallelism, and know how to write simple parallel applications. After an introduction covering the motivation for using parallelism and the types of applications that benefit from parallel execution, the course will focus mainly on the practical issues of programming modern parallel computers using OpenMP, MPI, and CUDA.
Introduction: examples of parallelism and HPC applications; Flynn’s taxonomy; shared and distributed memory architectures; modern supercomputers and the Top500 list;
Programming with OpenMP 4.0: the OpenMP programming model using threads; work sharing constructs; controlling the number of threads and scheduling parameters; simple OpenMP programs.
Programming with MPI:pPoint-to-point and collective communication routine; support for application topologies; simple applications using MPI, such as image processing and matrix multiplication; irregular applications, such as molecular dynamics simulations and cellular automata.
Programming with CUDA: the architecture of modern GPUs and their use in scientific computing; memory allocation on the GPU, and transferring data between the GPU and the host; simple applications such as vector addition and image processing; tiled algorithms and optimising CUDA applications through the use of shared memory.
Knowledge of the C programming language, and basic linear algebra operations.
Books (useful but not essential for the course)
Introduction to Parallel Programming, Peter Pacheco, published by Morgan Kaufmann, 2011. ISBN 9780123742605.http://store.elsevier.com/product.jsp?isbn=9780123742605
- Using MPI, Gropp, Lusk, and Skjellum, published by MIT Press, second edition, 1999. ISBN 9780262571326.https://mitpress.mit.edu/books/using-mpi
- Programming Massively Parallel Processors, David B. Kirk and Wen-mei W. Hwu, third edition, pub. Morgan Kaufmann, 2016. ISBN 978-0-12-811986-0. https://www.elsevier.com/books/programming-massively-parallel-processors/kirk/978-0-12-811986-0
-For MPI: http://www.mcs.anl.gov/mpi/
-For OpenMP: https://computing.llnl.gov/tutorials/openMP/
-For CUDA: https://developer.nvidia.com/cuda-zone
-For information on the World’s fastest supercomputers: http://www.top500.org/
David W. Walker is Professor of High Performance Computing in the School of Computer Science and Informatics at Cardiff University. He received a B.A. (Hons) in Mathematics from Jesus College, Cambridge in 1976,an M.Sc. in Astrophysics from Queen Mary College, London, in 1979, and a Ph.D. in Physics from the same institution in 1983. Professor Walker has conducted research into parallel and distributed algorithms and applications for thepast 30 years in the UK and USA, and has published over 140 papers on these subjects. Professor Walker was instrumental in initiating and guiding the development of the MPI specification for message-passing, and has co-authored a book on MPI. He also contributed to the ScaLAPACK library for parallel numerical linear algebra computations. Professor Walker’s research interests include software environments for distributed scientific computing, problem-solving environments and portals, and parallel applications and algorithms. Professor Walker is a Principal Editor of Computer Physics Communications, the co-editor of Concurrency and Computation: Practice and Experience, and serves on the editorial boards of the International Journal of High Performance Computing Applications, and the Journal of Computational Science.