About the Book (from the Preface):
GOALS OF THIS BOOK
While attending a workshop or conference on Structural Bioinformatics you may overhear tidbits of conversations that are interspersed with phrases such as “phosphofructokinase regulation”, “singular value decomposition”, or “class instantiation”. The usage of such terminology, arising from biochemistry, mathematics, and computer science respectively would not be surprising in this setting because these three areas of investigation have become the core of expertise required for the study of structural bioinformatics:
- Biochemistry provides realistic knowledge about protein structure and functionality.
- Mathematics helps us to build models that represent protein structure and functionality.
- Computation allows us to apply the mathematical models to detect patterns and make predictions about biological systems. As well, computation is necessary for the creation of visual displays that may be used to demonstrate various aspects of protein structure and function.
This book addresses all three areas with an emphasis on computational techniques:
- Topics in the book deal primarily with protein structure and there are many exercises that are grounded in biological problems at the molecular level.
- The book encourages mathematical analysis because it provides a firm foundation for subsequent computations.
- Computational techniques are covered by providing and analyzing several Python scripts that execute within the Chimera environment. This supplemental material (over 140 scripts) is available from the book’s website and it provides solutions for the various exercises at the end of the chapters. Due attention has been given to the modularity of the scripts with the goal of providing the reader with a toolkit of Python classes that can be directly used in the reader’s own applications or used as starting scripts that provide base functionality via class inheritance.
What are the typical computations that would be done with a script? A script working with Chimera can easily use data downloaded from the PDB (Protein Data Bank). For example, if your algorithm requires protein atomic coordinates, a script can access a PDB file relying on Chimera to parse the content and make these coordinates available as input to the algorithm. If the algorithm is modifying or generating atom coordinates, then Chimera can be used to provide a visualization of the altered molecule. Moreover, Python scripts can be designed to add various graphical constructs that help to describe various aspects of protein structure and function. Visual representation of an abstraction (for example, a plane that has some relevance to a biological construct) help us to discover configurations that are made clearer with the appropriate visualization. In addition to providing visual output, changing the display can be indispensable during debugging and for final validation of the script that implements the algorithm.
SCRIPTING WITH CHIMERA
This book will help any reader who wishes to use the scripting capabilities of Chimera. There are over 60 exercises that involve development of Python scripts. Most of these scripts are available as supplemental material (downloadable from http://structuralbioinformatics.com). There are several approaches to the use of these scripts:
- To get experience with Python scripting in the Chimera environment, the reader may choose to work through an exercise and then compare this answer with the provided script. This “learning by doing” helps to build confidence leading to the solution of more challenging problems. Some of the exercises duplicate calculations done in research papers. Doing such an exercise is not meant to verify or double-check the result but rather to gain experience in the solving of such problems so that the reader can tackle similar problems in the future.
- In many cases, the exercises demonstrate the various applications of linear algebra. For example, eigenvectors and eigenvalues are not just theoretical mathematical abstractions, they can be used to build visual objects that are then displayed in the Chimera window. Last but not least: scripting is fun to do. There is a compelling sense of accomplishment when the execution of a script runs a graphical user interface (GUI) or produces an immediate visual feedback in the Chimera window.
- The reader may use the script as a starting point for the development of a similar application. There are many situations, especially during GUI development, when a working script can show the reader how some code snippet handles a problem that is difficult because it involves some exasperating implementation issue that would otherwise be resolved only after extensive searching on the Internet.
- The reader may use classes from the StructBio toolkit in the development of an application. In addition to this, the ability to display some structural entity related to the algorithm can help in the debugging of a complex program that involves several difficult and complicated steps. I have spent several months in the design and development of these classes with the hope that they would be useful for the future scripting efforts of both myself and the readers of this book. These classes provide several tools that may be used for computations such as structure overlap, data plotting, scenographics, and the display of residue networks.
The book is mainly intended for people who wish to build Python scripts to extend the capabilities of Chimera. It can also be used by students, typically third or fourth year undergraduates, who have completed introductory courses in bioinformatics and wish to go further in the study of structural biology. Prerequisites include introductory linear algebra, elementary calculus, bioinformatics (“biology 101”, sequence analysis, etc.), computer programming and an introduction to algorithms. We assume that the reader can do elementary Python programming. An introduction to Python is not included in the book because there are several excellent sources for such information (see references at the end of Chapter 2).
The first chapter presents some introductory material on protein structure and subsequent chapters cover various selected algorithms and topics that are typically seen in structural bioinformatics. The algorithms are mainly used as a source of exercises that demonstrate scripting techniques in Chimera. As such, the chosen algorithms are not meant to provide a comprehensive coverage of the algorithms used in structural bioinformatics.
Chapter 1 introduces protein structure. This is essentially a review of material that may have been covered in earlier courses. The concepts of primary, secondary, tertiary, and quaternary structure are introduced and explained. The material serves to underscore the most salient aspects of protein structure while providing several figures that illustrate the visualization capabilities of Chimera. Exercises cover the use of Chimera menu invocations to produce various types of displays related to protein structure.
Chapter 2 introduces Python scripting for the Chimera environment and rapidly progresses to a description of the Chimera Object Hierarchy. Various objects and their attributes are covered so that the programmer can start writing scripts dealing with molecular structure including hydrogen bonds. Some elementary file I/O is discussed so that scripts can deal with multiple PDB files. A final exercise shows how polygonal surfaces can be added to a display (an introduction to “scenographics”).
Chapter 3 covers scripts that rely on the calculation of distances (most often interatomic distances). Exercises deal with distance shells, contact maps, inertial axes, and dehydrons. The Solids and Ellipsoid classes are introduced to provide more scenographics. Generation of contact maps introduces the reader to scripts that can display plots.
Chapter 4 deals with scripts that rely on the calculation of angles (usually dihedral angles). The chapter provides more material for plotting, using Ramachandran plots as an example. Lagrange optimization (introduced in Chapter 3 for the generation of inertial axes) is applied again for the creation and display of a least squares plane. More examples are presented for GUI implementations using Tkinter classes.
Chapter 5 covers the supporting theory and implementation of scripts to do structure overlap. This will duplicate current functionality of Chimera, but the idea is to learn the mathematics behind that functionality. This has intrinsic value and, more significantly, it gives the reader the skills to generate an overlap class that can be used in larger applications. For example, if you need to do several hundred structure alignments then menu invocations will not be effective – you need a script that can repeatedly call a function to do overlap.
Chapter 6 is a short discussion about energy functions. Its main importance is to provide a simple energy calculation for a script in Chapter 7.
Chapter 7 introduces rotamers and the Chimera functions that work with them. Goldstein’s dead end elimination is covered as a prelude to side chain packing. Scripts provide examples of more extensive GUI development including facilities to do strip charts. Scripts to do plotting of 3D surfaces are introduced.
Chapter 8 covers residue networks and provides examples for the use of the Graph classes in the StructBio package. The final exercise describes a script that provides a visual display of a network for residue contact rearrangements.
Appendix A discusses the implementation of graphical user interfaces that can be used for Python applications that require a more sophisticated interaction with the user. Since the Chimera window will still be the main window of concern, these GUIs are typically modeless dialogs that set up data input for the application. The appendix gives a quick introduction to Tkinter along with several sample scripts set up as a “widget buffet”. These can be extracted, modified and placed into the reader’s applications.
Appendix B introduces the scenographics toolkit. This includes several classes that can be used to place various lines, surfaces and solids into the Chimera display. An extremely versatile set of classes under the heading of Parametric Surfaces includes classes for Frenet frame surfaces, extrusion surfaces, surfaces of revolution, and ruled surfaces. The LabelGroups class can be used to place labels into the display at specified 3D positions.
Appendix C introduces the GraphBase class that can be used to implement abstract graphs via the Vertex and Arc classes. Methods are included for the computation of shortest paths and for the extraction of various sub-graphs. A set of derived classes (Graph, Node, and Edge) are provided for the construction of 3D graphs that will be displayed in the Chimera window. Instantiation of these classes will also generate the underlying Vertex and Arc objects so that various graph algorithms can be applied to the constructed graph.
Appendix D covers various scripts to do plotting including bar charts, density plots, scatter plots, plots of parametric curves, and 3D surface plots. Plots can be shown in their own windows or within a modeless dialog designed by the programmer.
Appendix E provides a review of dynamic programming to act as background material for the sequence alignment algorithms presented in Chapter 5.