Course Description

Dan Gusfield (U California Davis), [introductory/intermediate, 8 hours]

ReCombinatorics: The Algorithmics and Combinatorics of Phylogenetic Networks with Recombination

The work discussed in this course falls into the emerging area of Population Genomics. I will first introduce the area and then talk about models, problems and combinatorial algorithms involved in the inference of recombination from population data.

A phylogenetic network (or Ancestral Recombination Graph) is a generalization of a tree, allowing structural properties that are not tree-like. With the growth of genomic and population data, much of which does not fit ideal tree models, and the increasing appreciation of the genomic role of such phenomena as recombination (crossing-over and gene-conversion), recurrent and back mutation, horizontal gene transfer, and mobile genetic elements, there is greater need to understand the algorithmics and combinatorics of phylogenetic networks.

In this course, I will discuss a range of recent algorithmic and mathematical results on phylogenetic networks with recombination and show applications of these results to several issues in Population Genomics. The methods involve combinatorial algorithms and graph theory; both theoretical and empirical results will be discussed.


All the papers and associated software can be accessed at wwwcsif/

Andrey Rzhetsky (U Chicago), [introductory, 6 hours]

Trees, Networks, and their Use in Systems Biology

There is plethora of biomedical problems that were productively tackled in the recent years through inference and computation over trees and networks. The successful applications span evolutionary biology, medical and statistical genetics, epidemiology, biochemistry and interface of health sciences and sociology. This course will give an overview of synthesis of computational approaches and domain problems associated with these topics, with introduction to relevant modeling and inference approaches.


Robert Stevens (U Manchester) & James Malone (European Bioinformatics Institute, Hinxton), [introductory/intermediate, 4 hours]


Unless we know what entities our data describe, those data are of reduced value. Biologists are good at naming the entities they investigate, but they are too good at it. As a consequence, there are too many names for the things we need to analyse in our data: the functions, processes, anatomical components, cells, diseases and so on. Making explicit what the entities we analyse are and how they relate to each other is the topic of bio-ontologies. This short course will introduce you to the area of bio-ontologies, what they are, how they are built and what can be done with them once we have them. By the end of the course, attendees will know what an ontology is, the uses of an ontology in bioinformatics, bio-ontologies in use, outlines of ontology authoring.


Martin Tompa (U Washington Seattle), [introductory/intermediate, 4 hours]

Comparative Sequence Analysis in Molecular Biology

In computational molecular biology, "phylogenetic footprinting" is a standard idea that is used to predict functional regions within a biological sequence (DNA, RNA, or protein). The procedure is to find corresponding sequences from several related species, and within these to identify those regions that have mutated less than expected over the course of evolution, suggesting that these regions are under selective pressure due to biological functionality.

We will discuss various algorithms for and applications of phylogenetic footprinting and demonstrate some of these using software available on the web. We will then turn our attention to the larger problem of doing phylogenetic footprinting on a whole-genome scale, demonstrating the use of a genome browser available on the web and discussing the issue of assessing its reliability.


Alfonso Valencia (Spanish National Cancer Research Centre, Madrid), [advanced, 4 hours]

A Bioinformatics Perspective of Personalized Medicine

The fast progression of genomics is making of the use of personal genomic information a pressing daily reality. In this scenario, Bioinformatics plays a central role. The organization and analysis of individual genomes is a complex task involving data organization, integration and interpretation challenges. This task requires a blend of engineering and scientific developments at each step of the analysis, touching many areas in which the development of computational methods is very active.

In the context of the CNIO clinical setting, my group is developing both the technical framework for the interpretation of the result in collaboration with the clinicians and the science required at various level of the analysis. Based on the experience accumulated in this new area of application, I will review the key problems in the analysis of high-throughput sequencing information, prediction of the incidence of mutation in proteins and other coding regions, analysis of splicing and splice sites, comparative analysis of affected pathways, and extraction of mutation-drug-disease relations from databases and text-sources.


Limsoon Wong (National University of Singapore), [introductory/intermediate, 6 hours]

Using Biological Networks for Protein Function Prediction, Biomarker Identification, and Other Problems in Computational Biology

While sequence homology search has been the main work horse in protein function prediction, it is not applicable to a significant portion of novel proteins that do not have informative homologs in sequence databases. Similarly, while statistical tests and learning algorithms based purely on gene expression profiles have been popular for analyzing disease samples, critical issues remain in the understanding of diseases based on the differentially expressed genes suggested by these methods. In the past decade, a large number of databases providing information on various types of biological networks have become available. These databases make it possible to tackle biological problems in novel ways. This course presents a review on biological network databases and an introduction to approaches -- based on biological networks -- for protein function prediction, biomarker identification, and other interesting challenges in computational biology.


Ying Xu (U Georgia), [advanced, 8 hours]

Cancer Bioinformatics

The availability of large-scale omic data for multiple types of cancers in the public domain, in conjunction with our current understanding about cancer, allows computational cancer biologists to study cancer in a comparative and more systematic manner, which makes it possible to discover previously unknown relationships among different aspects of cancer initiation, growth and metastasis. In this short course, I will present an overview about what computational researchers can do to help solving a variety of challenging cancer-related problems. I plan to cover the following topics: (a) a brief overview of cancer biology by reviewing the hallmarks of cancer; (b) a brief overview of information derivable through analyses and comparative analyses of large-scale transcriptomic data; (c) cancer classification based on transcriptomic data: examining cancers from multiple perspectives; and (d) a taste of hypothesis-driven cancer research through transcriptomic data mining.


Zohar Yakhini (Agilent Laboratories), [intermediate/advanced, 6 hours]

Algorithmics and Statistics in the Analysis of High Throughput Molecular Measurement Data

The courses will be organized as follows: