Shah Lab / Research

Gene expression is a highly stochastic process leading to variability in mRNA and protein abundances even within an isogenic population of cells grown in the same environment. This heterogeneity is commonly referred to as noise in gene expression. Biochemical processes such as transcription, translation, and mRNA and protein decay are each noisy and contribute to the overall variability in gene expression. While noise in bacteria is thought to be predominantly driven by translation, noise in eukaryotes has been attributed mainly to transcriptional processes. However, the stochastic nature of mRNA and protein decay has often been ignored when estimating noise. As a result, noise due to transcription or translation is likely overestimated.

Also, sequence features of an mRNA that contribute to noise remain unexplored. Using single-molecule FISH and flow cytometry (FISH-Flow), we are studying the variation in mRNA and protein decay in single cells. Understanding how decay dynamics of mRNAs and proteins contribute to noise will help model noise more accurately, and provide insights into how living systems regulate noise through various biochemical processes

Evolution of translational regulation

Studies of how organisms evolve in the lab have identified several key features of the evolutionary processes such as the dynamics of clonal interference and epistatic interactions between adaptive mutations. While the role of individual mutations on organismal fitness have been characterized, how these mutations affect protein and DNA synthesis leading to faster growth remains a critical gap in our understanding of evolutionary dynamics. A major goal of our research program is to provide a mechanistic understanding of phenotypic changes that occur at the transcriptional and translational level during adaptive evolution.

We use -omics approaches such as, RNA-seq and RIBO-seq to study changes in transcriptional and translation regulation at the whole genome level. We take advantage of model systems such as the long-term evolution experiment (LTEE) in E. coli to understand changes in gene regulation over long periods of time and determine their impact on fitness.

Dynamics of protein synthesis

Translating information from DNA to proteins is the most expensive process a cell undertakes, and is central to all life. The basic principles of translation are simple - free floating ribosomes bind to mRNAs and translate the transcript one codon at a time, each of which is recognized by a corresponding tRNA. Although our understanding of how certain proteins and RNAs participate in translation has improved in recent years, we still lack a coherent view of what factors set the global pace of protein synthesis in a cell. To systematically integrate the various components of translation that have each been subjected to empirical measurements. Therefore to systematically study protein synthesis, we need high-throughput genomic tools that capture the translation dynamics at the level of an entire cell.

We have developed a mathematical and computational modeling framework that tracks every single ribosome, tRNA, and mRNA molecule within a cell defined by its biophysical parameters, and allows us to determine how translation of each gene is affected by all other genes. We also generate high-throughput ribosome-profiling datasets that provide detailed snapshots of translation by mapping each elongating ribosome in the cell. We employ this combined modeling/experimental approach to study how a cell modulates translation in different environments, including during diseases. By quantifying the dynamics of translation we study the mechanistic basis of adaptation in evolving populations and study the evolution of translational differences between homologous genes in divergent species.

Evolution of biased codon usage

The genetic code is highly redundant with multiple codons coding for a particular amino acid. Even before the first genome was sequenced, it became apparent that synonymous codons of an amino acid are used unequally in coding sequences. Ever since, the relative importance of adaptive and non-adaptive factors affecting such codon usage bias (CUB) have been actively debated. The lack of a coherent framework to test alternate hypotheses, and a focus on heuristic indices have hindered our ability to disentangle the effects of different evolutionary pressures on codon bias.

We are developing a framework that integrates mechanistic models of protein translation with population-genetics models to understand and distinguish the effects of various adaptive and non-adaptive forces on the evolution of codon biases. Such models will allow us to make quantitative predictions on how codon usage would change with varying mutation and selection regimes.

Role of epistasis in molecular evolution

The phenotypic effect of a mutation at one genetic site often depends on alleles at other sites, a phenomenon known as epistasis. As a result, any mutation is expected to be contingent on earlier mutations and the fate of this mutation depends on the evolutionary history of the population. Therefore, epistasis can profoundly influence the process of evolution in populations and shape the patterns of divergence across species. Understanding the nature of epistatic interactions between sites will allow us to address basic questions in biology at the molecular scale – such as how large a role does history play in evolution? Do later events depend critically on specific earlier events, or do all events occur more or less independently?

We have explored the role of epistasis in the context of protein evolution. Models of protein evolution typically assume that each site in a sequence evolves independently. Nevertheless, the role of epistasis between sites during protein evolution is increasingly recognized as important, even though methods to infer epistasis are actively debated. By using models of protein stability within a population-genetic framework, we have shown that amino-acid substitutions are typically contingent on the presence of prior substitutions, and that substitutions that occur early in evolution become entrenched and difficult to modify as subsequent substitutions accrue. Such models provide key insights into the structure of a protein’s fitness landscape and the effects of historical contingency on protein evolution.

Diversification rate variation in phylogenies

Phylogenetic trees have become important tools to study macroevolutionary patterns and in particular, to study how diversification rates vary through both time and across lineages. Such inferences can inform us about how specific geological or climatic events might have shaped our current species diversity. They also provide insights into how specific evolutionary innovations might have lead to adaptive radiations.

In collaboration with Jim Fordyce and Ben Fitzpatrick, we developed a Parametric Rate Comparison (PRC) method to identify shifts in diversification rates in phylogenetic trees. The method explicitly compares the distribution of branch lengths in a subclade to those in remainder of the tree to identify subclades with variable diversification rates. The library for PRC is released as an R package: iteRates.

RNA Thermometers

RNA thermometers are a class of mRNAs that change their translation initiation rates with temperature. The mechanisms by which these molecules act is fairly simple. At low temperatures, the ribosome-binding site (RBS) is sequestered in a hair-pin structure preventing ribosomes from binding to the mRNA effectively. As temperature increases, these structures melt allowing ribosomes to initiate translation on them. These molecules are typically associated with a special class of sigma-factor genes in bacteria that regulate heat-shock and other chaperone genes. However, little is known about how widespread these class of molecules are. We studied the melting of RNAs with temperature by computationally sampling the distribution of the RNA structures at various temperatures using the RNA folding software Vienna. To our surprise, we found that while known RNA thermometers had a higher melting rate with temperature than non-thermometers, these higher rates weren't significant. This suggests that RNA thermometers are perhaps not a special class of structurally distinct RNA molecules.