
SimulationData for transcription process

TODO: add mapping of tRNA to charged tRNA if allowing more than one modified form of tRNA and separate mappings for tRNA and charged tRNA to AA TODO: handle ppGpp and DksA-ppGpp regulation separately

class reconstruction.ecoli.dataclasses.process.transcription.Transcription(raw_data, sim_data)[source]

Bases: object

SimulationData for the transcription process


Applies correction to RNAseq data for shorter genes as required when operon structure is included in the model.

_build_attenuation(raw_data, sim_data)[source]

Load fold changes related to transcriptional attenuation.

_build_charged_trna(raw_data, sim_data)[source]

Loads information and creates data structures necessary for charging of tRNA


Requires self.rna_data so can’t be built in translation even if some data structures would be more appropriate there.

_build_cistron_data(raw_data, sim_data)[source]

Build cistron-associated simulation data from raw data. Cistrons are sections of RNAs that encode for a specific polypeptide. A single RNA molecule may contain one or more cistrons.

_build_elongation_rates(raw_data, sim_data)[source]
_build_mature_rna_data(raw_data, sim_data)[source]

Build mature RNA-associated simulation data from raw data.

_build_new_gene_data(raw_data, sim_data)[source]

Load baseline values for new gene expression in all simulations.

_build_oric_terc_coordinates(raw_data, sim_data)[source]

Builds coordinates of oriC and terC that are used when calculating genomic positions of cistrons and RNAs relative to the origin

_build_ppgpp_regulation(raw_data, sim_data)[source]

Determine which genes are regulated by ppGpp and store the fold change in expression associated with each gene.

Attributes set:

ppgpp_regulated_genes (ndarray[str]): cistron ID of regulated genes ppgpp_fold_changes (ndarray[float]): log2 fold change for each gene

in ppgpp_regulated_genes

_ppgpp_growth_parameters: parameters for interpolate.splev

to estimate growth rate from ppGpp concentration

_build_rna_data(raw_data, sim_data)[source]

Build RNA-associated simulation data from raw data.

_build_transcription(raw_data, sim_data)[source]

Build transcription-associated simulation data from raw data.


Returns the genomic coordinates of a given gene coordinate relative to the origin of replication.


Normalize both free and ppGpp bound expression values to 1.

_solve_ppgpp_km(raw_data, sim_data)[source]

Solves for general expression rates for bound and free RNAP and a KM for ppGpp to RNAP based on global cellular measurements. Parameters are solved for at different doubling times using a gradient descent method to minimize the difference in expression of stable RNA compared to the measured RNA in a cell. Assumes a Hill coefficient of 2 for ppGpp binding to RNAP.

Attributes set:
_fit_ppgpp_fc (float): log2 fold change in stable RNA expression

from a fast doubling time to a slow doubling time based on the rates of bound and free RNAP expression found

_ppgpp_km_squared (float): squared and unitless KM value for

to limit computation needed for fraction bound

ppgpp_km (float with mol / volume units): KM for ppGpp binding



Adjust ppGpp expression based on fit for ribosome and RNAP physiological constraints using least squares fit for 3 conditions with different growth rates/ppGpp.

Modifies attributes:
exp_ppgpp (ndarray[float]): expression for each gene when RNAP

is bound to ppGpp, adjusted for necessary RNAP and ribosome expression, normalized to 1

exp_free (ndarray[float]): expression for each gene when RNAP

is not bound to ppGpp, adjusted for necessary RNAP and ribosome expression, normalized to 1


See docs/processes/transcription_regulation.pdf for a description of the math used in this section.


Adjusts ppGpp regulated expression to get expression with and without ppGpp regulation to match in basal condition and taking into account the effect transcription factors will have.

calculate_attenuation(sim_data, cell_specs)[source]

Calculate constants for each attenuated gene.


Creates stoich matrix from i, j, v arrays

Returns 2D array with rows of metabolites for each tRNA charging reaction on the column


Returns the indexes of transcription units containing the given RNA cistron given the ID of the cistron.


Calculates the expression of each gene at a given concentration of ppGpp.


ppgpp (float with or without mol / volume units) – concentration of ppGpp, if unitless, should represent the concentration of PPGPP_CONC_UNITS


normalized expression for each gene

Return type:



Calculates the expression of RNA transcription units that best fits the given expression levels of cistrons using nonnegative least squares.


Calculates the expression of tRNA transcription units that best fits the given expression levels of tRNA cistrons using nonnegative least squares.


Calculates the fraction of RNAP expected to be bound to ppGpp at a given concentration of ppGpp.


ppgpp (float with or without mol / volume units) – concentration of ppGpp, if unitless, should represent the concentration of PPGPP_CONC_UNITS


fraction of RNAP that will be bound to ppGpp

Return type:



Calculate the probability of a transcript stopping early due to attenuation.


Calculates expected RNA subgroup mass fractions based on ppGpp concentration. If ppGpp expression has not been set yet, uses default measured fractions.


ppgpp (float with or without mol / volume units) – concentration of ppGpp, if unitless, should represent the concentration of PPGPP_CONC_UNITS


mass fraction for each subgroup mass, values sum to 1

Return type:

dict[str, float]

make_elongation_rates(random, base, time_step, variable_elongation=False)[source]

Returns the indexes of cistrons that constitute the given transcription unit given the ID of the RNA transcription unit.


Called during the parca to determine expression of each transcription unit for ppGpp bound and free RNAP.

Attributes set:
exp_ppgpp (ndarray[float]): expression for each TU when RNAP is

bound to ppGpp

exp_free (ndarray[float]): expression for each TU when RNAP is not

bound to ppGpp

set_ppgpp_kinetics_parameters(init_container, constants)[source]
synth_prob_from_ppgpp(ppgpp, copy_number, balanced_rRNA_prob=True)[source]

Calculates the synthesis probability of each gene at a given concentration of ppGpp.

  • ppgpp (float with mol / volume units) – concentration of ppGpp

  • copy_number (Callable[float, int]) – function that gives the expected copy number given a doubling time and gene replication coordinate

  • balanced_rRNA_prob (bool) – if True, set synthesis probabilities of rRNA promoters equal to one another


prob (ndarray[float]): normalized synthesis probability for each gene factor (ndarray[float]): factor to adjust expression to probability for each gene


copy_number should be sim_data.process.replication.get_average_copy_number but saving the function handle as a class attribute prevents pickling of sim_data without additional handling

exception reconstruction.ecoli.dataclasses.process.transcription.TranscriptionDirectionError[source]

Bases: Exception