Conventional study of chemical reactions tracks reactant concentrations. This is not a problem if the system of interest contains large number of molecules because all the information needed to describe it is effectively condensed in concentration values. For systems with small number of molecules, concentration values are not descriptive enough. This issue is of particular importance in biochemistry because the number of molecules inside a cell is certainly small.

In these cases, chemical reactions cannot be interpreted as differential equations followed by reactant concentrations. Instead, chemical reactions are interpreted as what they actually are, namely the conversion of a molecule or groups of molecules into different ones. Since occurrence of molecular reactions is largely determined by chance, e.g., a fortuitous encounter of reactants, stochastic modeling is warranted. Further note that it is reasonable to expect the probability of a particular reaction to depend only on the current numbers of molecules for each reactant. Therefore, if we regard numbers of molecules as the state of a stochastic system, chemical reactions can be reasonably modeled as a continuous time Markov chain (CTMC).

This idea is widely used to simulate chemical reactions. The rest of this page covers the following topics

  1. Lotka-Volterra predator-prey model. A simple example of how chemical reactions are modeled using differential equations.
  2. Gillespie’s algorithm. Discuss stochastic modeling of Lotka-Volterra’s equations and introduce the workhorse Gillespie’s algorithm.
  3. Dimerization kinetics. Back and forth conversion between the a molecule and its dimer.
  4. Enzymatic reactions. Enzymes mediate the production of molecules from reactant components.
  5. A simple auto-regulatory gene network. An auto-regulatory process to control the amount of protein transcribed from a particular gene.
  6. Lac operon. The mechanism to balance the digestion of glucose and lactose.

Matlab code for the examples discussed below is in this compressed folder. In the following we make references to files in this folder. Slides containing figures and brief explanations are here. Matlab code and slides were developed by Cameron Finucane during the summer of 2009.

Lotka-Volterra predator-prey model


Lotka and Volterra’s model involves a prey species Y and a predator species X. The reactions that define the model describe the birth and death of prey and predator elements. As shorthand notation for the model we use

R_1 :Y\longrightarrow2Y
R_2 :X + Y\longrightarrow2X
R_3 :X\longrightarrow\emptyset

The first reaction R_1 denotes spontaneous reproduction of prey Y. Reaction R_2 denotes consumption of a prey molecule Y to generate a copy of the predator molecule X. Reaction R_3 denotes spontaneous death of predators X. This is a very simple model but suffices for illustration purposes. Notice that the reactions as summarized in R_1R_3 are not enough to describe the dynamic behavior of the system for which we need to specify the rate at which the reactions occur. Therefore, we need to specify rates r_1(x,y), r_2(x,y), and r_3(x,y) that determine how likely a reaction is to occur.

Classical chemodynamics starts rewriting the above equations to describe the rate of change of predator and prey concentrations. Let y(t) denote the concentration of prey and x(t) the concentration of predator at time t. The rates of change in predator and prey concentration are given by the time derivatives dx(t)/dt and dy(t)/dt. The concentration of prey y(t) increases when reaction R_1 occurs and decreases when reaction R_2 occurs. Considering that R_1 happens at a rate r_1(x,y) and R_2 at a rate r_2(x,y) the rate of change dy(t)/dt in prey concentration is

dy(t)/dt=r_1(x,y)-r_2(x,y)

Likewise, predator numbers increase upon reaction R_2 and decrease with reaction R_3. Because these reactions happen at rates r_2(x,y) and r_3(x,y) it follows that

dx(t)/dt=r_2(x,y)-r_3(x,y)

There are reasons to expect reaction rates r_i to be proportional to the concentration of reactants involved in the reaction. That is, we expect r_1(x,y)=c_1 y to be proportional to the prey’s concentration y, r_2(x,y)=c_2 xy proportional to the product of both concentrations x and y and r_3(x,y)=c_3 y to be proportional to the predator’s concentration y. Regardless of what is reasonable to expect there is abundant empirical evidence that this is true. Substituting these expressions into the rate change equations above we obtain the pair of differential equations

dy(t)/dt=c_1 y-c_2 xy
dx(t)/dt=c_2 xy-c_3 x

The above system of ordinary differential equations (ODE) are the Lotka-Volterra’s ODEs. To simulate a deterministic Lotka-Volterra system use the code in the file ssas_lotka_volterra_deterministic.m.

Gillespie’s algorithm


The differential equations in the previous section are accurate if predator and prey numbers can be accurately inferred from respective concentrations. As expressed before this is not true if the number of reactants is small. In such case randomness might yield distinctive behaviors that can only be captured through a stochastic model. To introduce such stochastic models reconsider the prey-predator model

R_1 :Y\longrightarrow2Y
R_2 :X + Y\longrightarrow2X
R_3 :X\longrightarrow\emptyset

where Reaction R_1 denotes spontaneous reproduction of prey Y, R_2 consumption of a prey molecule Y to generate a copy of the predator molecule X and R_3 spontaneous death of predators X. To complete the description we interpret X and Y as the number of molecules of each species and introduce hazard functions h_1(X,Y), h_2(X,Y), and h_3(X,Y).

The definition of the hazard function h_i(X,Y) is as follows. Given that the number of molecules are X and Y at time t, the probability of reaction R_i occurring in an infinitesimal time interval (t, t+dt] is given by h_i(X,Y)dt. As a consequence of this definition, if no other reaction occurs, it follows that the time s_i that elapses until the next occurrence of reaction R_i is exponentially distributed with parameter h_i(X,Y). This is a basic property of continuous times Markov chains. Notice, though, that other reactions are likely to occur and that as a consequence, time s_i is not exponentially distributed. However, note that the probability of any reaction occurring in the infinitesimal time interval (t, t+dt] is given by h_1(X,Y)dt + h_2(X,Y)dt + h_3(X,Y)dt. Thus, upon defining the aggregate hazard h(X,Y) = h_1(X,Y) + h_2(X,Y) + h_3(X,Y) we can model the time s until the next reaction as exponentially distributed with parameter h(X,Y). Therefore, in a stochastic model of the above predator-prey relations, times s between reactions are exponentially distributed with parameter parameters h(X,Y). Once the inter-reaction time s is determined the reaction type R_i is determined by drawing a decision variable with probabilities proportional to the ratio h_1(X,Y)/h(X,Y), h_2(X,Y)/h(X,Y) and h_3(X,Y)/h(X,Y).

To complete the model we need to specify the reaction hazards h_i(X,Y). Reaction R_1 happens spontaneously when a prey molecule reproduces. Assuming that prey molecules act independently it is reasonable to assume that the hazard h_1(X,Y)=c_1Y is proportional to the number of prey molecules Y. Similarly, Reaction R_2 happens upon the chance encounter between a prey and a predator molecule. Notice that there are XY possible pairs of prey and predator molecules. Thus, hazard h_2(X,Y)=c_2XY is reasonably modeled as proportional to the product of number of molecules X and Y. Similarly to R_1, reaction R_3 happens spontaneously, its hazard therefore modeled as h_3(X,Y)=c_3X proportional to the number of predator molecules.

Implementation of this model yields Gillespie’s algorithm for simulation of chemical reactions. The algorithm’s steps are the following

  1. Initialize t = 0, X = X(0), Y = Y(0)
  2. Calculate all hazards h_i(X,Y)
  3. Find the total hazard, h(X,Y) = h_1(X,Y)+h_2(X,Y)+h_3(X,Y)
  4. Find the time to the next reaction: s \sim Exp[h(X,Y)]</li><li>Sett = t + s</li><li>Find the index of the current reaction, where Pr[R_i] = h_i(X,Y)/h(X,Y)
  5. Update the state vector to account for this reaction. If reaction R_1 is drawn, update Y=Y+1. If reaction 2 occurs update X=X+1 and Y=Y-1. If R_3 is drawn update X=X-1.
  6. Repeat from 2

The core steps in the above algorithm are: The random drawing of the next reaction time s (step 4). Determination of the reaction type R_i (step 6). Update of the reactant numbers X and Y (step 7). The remaining steps compute values that are needed to implement these steps. To simulate an stochastic version of the Predator-prey model discussed in the previous subsection use the file ssas_lotka_volterra_stochastic.m. To run this code you need the file ssas_gillespie.m that implements Gillespie’s algorithm for a generically defined chemical equation.

Dimerization kinetics


A dimer is a molecule composed of two identical (or similar) chemical structures. Each of the pieces of the dimer is called a monomer and can exist in free form. The dimer can separate into its two component monomers and two monomers may combine to form a dimer. The schematic representation of this process is

R_1 :D\longrightarrow2M
R_2 :2M\longrightarrowD

To complete the model we need to specify either reaction rates r_1(M,D) and r_2(M,D) for a deterministic model or hazards h_1(M,D) and h_2(M,D) for a stochastic framework. Reaction R_1 happens spontaneously on dimers. Its hazard is therefore h_1(M,D)=c_1D proportional to the number of dimer molecules. For reaction R_2 we have to notice that it happens upon encounter of a pair of monomers. To form a monomer pair we have M choices for the first member of the pair and M-1 for the second member. The number of ordered pairs is then M(M-1) and the number of unordered pairs is M(M-1)/2 (there are 2 ordered pairs for each unordered pair). The hazard for reaction R_2 is then modeled as h_2(M,D) = c_2M(M-1)/2 proportional to the number of unordered monomer pairs. To simulate dimerization kinetics in a deterministic setting use the code in the file ssas_dimerization_deterministic.m. The counterpart code in a stochastic setting is implemented in the file ssas_dimerization_stochastic.m.

Enzymatic reactions


Enzymes are molecules that mediate chemical reactions. A certain set of reactants can be converted into a certain set of products with some energy expenditure. The enzyme reduces the energy threshold for the reaction to occur. The simplest enzymatic reaction involves the conversion of a substrate S into a product P. Enzymes E react with the substrate to generate an intermediate product I. The intermediate product I then liberates the enzyme E to yield the final product P. A simple model of this process is given by the following set of reactions

R_1 :S + E\longrightarrowI
R_2 :I\longrightarrowS+E
R_3 :I\longrightarrowP+E

Reaction R_1 models the combination of a substrate molecule S and an enzyme E to yield an intermediate specimen I. The latter may disassociate from the enzyme yielding either the original substrate S (Reaction R_2) or the the final product P (Reaction R_3). Hazards can be determined using the ideas in previous examples. It is ready to realize that hazards can be written as h_1(S,E,I,P)=c_1 S E, h_2(S,E,I,P)=c_2 I and h_3(S,E,I,P)=c_3 I. Deterministic simulation of enzymatic reactions is in the file ssas_enzyme_deterministic. Stochastic simulation of enzymatic reactions is in ssas_enzyme_stochastic.m.

Auto-regulatory gene network


The three cases studied so far, namely the predator prey modeldimerization kinetics and enzymatic reactions serve as introductory examples, but there are not really interesting differences between the deterministic and stochastic models. Stochastic effects can only be appreciated in full in systems that can be reasonably expected to involve just a handful of specimens.

A good example of this occurs in gene expression. Protein synthesis inside a prokaryotic cell starts with transcription of the corresponding gene into a messenger ribonucleic acid (mRNA). The mRNA then passes through a ribosome to assembly the protein. The number of mRNA molecules synthesized may well be just a few units. The mRNA is an intermediate molecule, the ultimate outcome of this process being the generation of protein. An interesting question is how is the production of protein started and halted at convenient concentrations. The answer to this question varies but in general there are self-regulating feedback loops controlling the production of protein. Roughly speaking, presence of an external stimulus triggers transcription of mRNA and subsequent protein synthesis. As the number of protein molecules increases a byproduct is generated. This byproduct has affinity to bind with the encoding gene thus repressing future transcriptions and halting protein synthesis. A simplified version of an auto-regulatory gene network is the following

Transcription:G\longrightarrowG+mRNA
Assembly:mRNA\longrightarrowmRNA+P
Dimerization:2P\longleftrightarrowD
Repression:G+D\longleftrightarrowDG
mRNA degradation:mRNA\longrightarrow\emptyset
Protein degradation:P\longrightarrow\emptyset

Transcription and assembly reactions model the two step protein synthesis. The gene G is transcribed into mRNA and the mRNA is then used to synthesize protein P. Notice that G is not altered upon transcription and that neither is mRNA upon protein synthesis. Auto-regulation is through a dimer D of the protein P. This dimer generates spontaneously from two protein molecules and may also dissociate spontaneously. This back and forth conversion is denoted by a double arrow in the schematic above. The dimer D may attach to the gene G. When a dimer is attached to G, transcription is not possible and we say that the gene is repressed. We denote this as the generation of the molecule DG. The dimer may spontaneously separate from the gene, making transcription possible once again. This is modeled as the conversion of DG into D and G. In the repression reaction the arrow points in both directions to signify binding and unbinding of the dimer D to the gene G. The model is completed by the degradation of mRNA and protein.

This mechanism auto-regulates production of protein P. As the number of protein molecules increases, so does the number of dimers. As the number of dimers increases the likelihood of gene repression increases. We then expect the transcription of mRNA to be halted most of the time. As mRNA molecules degrade the synthesis of protein slows and eventually stops. When this happens, protein degradation reduces the number of dimers. Gene repression becomes less likely, mRNA transcription resumes and consequently so does protein synthesis. The code to simulate this simple auto-regulatory gene network is in the file ssas_autoregulatory_gene_network.m.

The lac operon


Auto-regulation in biological systems typically involves group of genes, the proteins they encode for and the byproducts of the reactions of this proteins with other reactants present in the medium. These mechanisms are called auto-regulatory gene networks. One of such networks is the lac operon formed by three adjacent genes that control the metabolism of lactose in some bacteria. Cells can only use glucose to generate energy, but they can reduce lactose to glucose if the latter is unavailable. Such digestion of lactose into glucose occurs in presence of the enzyme \beta-galactosidase. A simplified model of lactose digestion and glucose consumption is therefore

Lactose digestion:\beta G + L\longrightarrow\beta G + G
Glucose consumption:G\longrightarrow\emptyset

where we have use a simplified model of enzymatic reactions (compare with the model in enzymatic reactions. The first reaction models the reduction of lactose L to glucose G mediated by the enzyme \beta G. The second reaction models the consumption of glucose molecules G.

To enable lactose digestion as per the reactions above, cells have to produce the enzyme \beta-galactosidase, which in itself requires some energy expenditure. Thus, production of \beta-galactosidase is only justified when lactose is abundant and glucose scarce. This is controlled by the lac operon.

The constitutive genes on the lac operon are the promoter, the operator and the gene that encodes for \beta-galactosidase itself. These three genes are adjacent to each other. We note that there are in fact three different kinds of \beta-galactosidase, and three corresponding encoding genes, but for the discussion here we can regard them as a single gene. Promoters are integral parts of Deoxyribonucleic acid (DNA) transcription. Transcription of DNA into mRNA necessitates mediation of the enzyme RNA polymerase (RNAP). This enzyme binds to the promoter to initiate transcription of DNA into mRNA. If RNAP does not bind to the promoter, no transcription occurs. The operator is a gene located between the promoter and the lactose encoding sequence. If there is a molecule attached to the operator, it interferes with the action of the enzyme RNAP thereby halting transcription.

A model of this process is then to consider three forms of the the lac operon, the regular operon Op, the repressed operon ROp and the activated operon AOp. At different rates, transcription of DNA is possible from each of these states, whereby we have the following set of reactions

Repressed transcription:ROp\longrightarrowROp + mRNA
Regular transcription:Op\longrightarrowOp + mRNA
Activated transcription:AOp\longrightarrowAOp + mRNA

The difference in the above reactions is their hazard. Transcription from the repressed operon ROp is much slower than transcription from the regular operon Op resulting in a much smaller hazard. Similartly, for transcription from the activated operon AOp is much larger than the hazard for transcription from the regular operon Op. Notice that to complete the modeling of \beta-galactosidase production we need to add reactions to model the synthesis of protein from mRNA molecules. These reactions are

Protein synthesis:mRNA\longrightarrowmRNA + \beta G
mRNA degradation:mRNA\longrightarrow\emptyset
Enzyme degradation:\beta G\longrightarrow\emptyset

The first reaction models enzyme synthesis while the other two model degradation of mRNA and \beta G.

Besides promoter, operator and encoding genes, the control of \beta-galactosidase production involves a promoter protein and a repressor protein. Upstream of the lac operon is a gene coding for a repressor protein R with affinity for the operator gene. This protein is always expressed. When no lactose is present, the repressor binds to the operator thus hindering mRNA transcription and resulting in low \beta-galactosidase production. When lactose is present, however, the repressor binds preferentially to lactose therefore not interfering with transcription leading to increased production of \beta-galactosidase. The reactions modeling repression of the lac operon are then

Operon repression:R + Op\longleftrightarrowROp
Repressor neutralization:R + L\longleftrightarrowRL

The first reaction models the adherence of the repressor protein R to the operon OP to yield a repressed operon ROp. This reaction can be undone as signified by the double arrow above. The second reaction models the binding and unbinding of repressor specimens R to lactose molecules L. When there is lactose present we expect the second reaction to occur more frequently than the first. This is not necessarily because the hazard is larger but due to the much larger number of lactose proteins L with respect to the number of genes Op.

The second part of the control involves the catabolite activator protein (CAP) that when bound to the operon facilitates mRNA transcription of \beta-galactosidase. The amount of CAP present is inversely proportional to the amount of glucose. Hence, when glucose level decreases, the amount of CAP increases and the operon is likely to switch to its activated version. With the operon activated, the rate of mRNA transcription increases and as a consequence so does the production of \beta-galactosidase. The reactions to model this interaction are

Operon activation:CAP + Op\longleftrightarrowAOp
CAP repression:CAP + G\longleftrightarrowCAPG

The first reaction models the activation of the operon. The second reaction models binding of CAP to glucose G so that the number of free CAP molecules changes in opposite direction to the number of glucose molecules present. We note that CAP does not actually bind to glucose, but for a preliminary model the reactions above suffice.

When lactose and glucose are present this control mechanism results in a distinctive diauxie pattern with glucose consumed first and lactose processed after glucose is depleted. Simulation of the lac operon is in the file ssas_lac_operon.m.