On December 16, 2015, Amy Keating, professor of Biology and Biological Engineering, told the Dean’s Breakfast audience about her “true love”—proteins. Keating studies the way proteins interact with each other and with other molecules, building computational models that predict a protein’s behavior and analyzing how proteins have evolved a diverse array of functions over time.

Biology = Information + Chemistry

Keating thinks of biology as “the intersection between information and chemistry.” DNA provides the blueprint and proteins carry out the chemistry—the interactions with DNA, small molecules, cell membranes, and other proteins that make up the business of life inside a cell.

With her background in physical organic chemistry, Keating is partial to the “chemistry” half of the equation. She is trained to look at physical structures and interactions between molecules, but “structurally,” Keating explained, “DNA is boring.” With its double helix backbone and string of four repeating nucleotides, the structure of a DNA molecule is relatively simple. Proteins, on the other hand, are wildly complicated and varied. Proteins begin as linear chains of 20 different amino acids, which are in turn folded into the modular building blocks of highly complex, specialized machines, such as “scissors” that snip DNA or “motors” that transport molecules throughout the cell.

Protein Engineering by Nature

The Keating laboratory specializes in what is called the “paralog problem,” exploring how gene duplication and mutation enable the reprogramming of proteins in nature. Families of proteins that are similar in structure can theoretically arise when genes are accidentally duplicated during cell division, or at some other point in the cell’s lifespan. With one copy making the original protein as usual, the spare copy can mutate without interrupting important cell functions. The resulting “paralog” may have a strong family resemblance to the original protein’s structure, but subtle changes to its sequence over generations have the potential to give it a new function.

In one study, Keating used a family of gene-regulating proteins called basic region-leucine zipper (bZIP) transcription factors as a model to see how protein reprogramming occurs in nature. Keating compared the protein-protein interactions across five multicellular species (human, sea squirt, fruit fly, nematode, and sea anemone) as well as two single-celled species (a choanoflagellate and the yeast Saccharomyces cerevisiae). Although a billion years of evolution separate them from their common ancestor, all seven species employ members of the bZIP family, as do all organisms in general.

Keating found that the metazoans had developed richer, more complex bZIP interaction networks than their distant single-celled cousins, demonstrating that signaling and gene expression proteins like bZIP are important factors in the evolution of diverse species. Moreover, Keating discovered that these networks proliferated through the introduction of small changes to amino acid sequences that nevertheless had big impacts on protein function and interactivity. Keating showed that the process of making tiny changes in bZIP proteins had a big impact on the development of new species.

Protein Engineering by Design

Ultimately, Keating wants to be able to reprogram proteins in the laboratory, just as they are reprogrammed in nature, and works to design small molecules that interact with human bZIP proteins. The trouble is, while nature has been engineering proteins for billions of years, Keating has only been engineering them for 14—and we just don’t know many of the rules that govern protein interactivity.

The sheer complexity of proteins makes understanding their interactivity difficult. Keating studies bZIPs because their structure is simple when compared with other proteins. Their distinctive “coiled coils” are made up of repeating units of just seven amino acids. But even with these short sequences, Keating’s task is still daunting. The number of possible sequences to consider is sizable – about 3.44×1045 total. Only some of these sequences will have interesting functional implications, and fewer still will bind reliably and accurately with the bZIP protein.

Keating works to uncover the rules that govern protein interactions by using computational modeling to assess and predict their interactivity. Keating relies on both structure-based scoring, using the principles of chemistry and physics to model and evaluate a molecule, and on data-based scoring, which builds a model using a large amount of data using regression techniques of machine learning. Keating also uses a variety of biophysical assays of protein-protein interactions to help collect the enormous amount of data that her computational approach requires.

Despite the difficulty, Keating is making advances in protein engineering. For example, Keating has built up a catalogue of synthetic coiled coil modules called SYNZIPs and their interactions – in essence, a toolbox for the protein engineer. Although protein engineering is still in its infancy, computational and technical advances – such as those in cryo-electron microscopy (cryo-EM) that have enabled a gain in resolution by about one order of magnitude, enabling the imaging of large biological molecules at near-atomic detail – are providing exciting new opportunities to understand nature’s interactions and to design our own.