reconstruction.ecoli.dataclasses.process.transcription

SimulationData for transcription process

TODO: add mapping of tRNA to charged tRNA if allowing more than one modified form of tRNA and separate mappings for tRNA and charged tRNA to AA TODO: handle ppGpp and DksA-ppGpp regulation separately

class reconstruction.ecoli.dataclasses.process.transcription.Transcription(raw_data, sim_data)[source]

Bases: object

SimulationData for the transcription process

_apply_rnaseq_correction()[source]

Applies correction to RNAseq data for shorter genes as required when operon structure is included in the model.

_build_attenuation(raw_data, sim_data)[source]

Load fold changes related to transcriptional attenuation.

_build_charged_trna(raw_data, sim_data)[source]

Loads information and creates data structures necessary for charging of tRNA

Note

Requires self.rna_data so can’t be built in translation even if some data structures would be more appropriate there.

_build_cistron_data(raw_data, sim_data)[source]

Build cistron-associated simulation data from raw data. Cistrons are sections of RNAs that encode for a specific polypeptide. A single RNA molecule may contain one or more cistrons.

_build_elongation_rates(raw_data, sim_data)[source]
_build_mature_rna_data(raw_data, sim_data)[source]

Build mature RNA-associated simulation data from raw data.

_build_oric_terc_coordinates(raw_data, sim_data)[source]

Builds coordinates of oriC and terC that are used when calculating genomic positions of cistrons and RNAs relative to the origin

_build_ppgpp_regulation(raw_data, sim_data)[source]

Determine which genes are regulated by ppGpp and store the fold change in expression associated with each gene.

Attributes set:

ppgpp_regulated_genes (ndarray[str]): cistron ID of regulated genes ppgpp_fold_changes (ndarray[float]): log2 fold change for each gene

in ppgpp_regulated_genes

_ppgpp_growth_parameters: parameters for interpolate.splev

to estimate growth rate from ppGpp concentration

_build_rna_data(raw_data, sim_data)[source]

Build RNA-associated simulation data from raw data.

_build_transcription(raw_data, sim_data)[source]

Build transcription-associated simulation data from raw data.

_get_relative_coordinates(coordinates)[source]

Returns the genomic coordinates of a given gene coordinate relative to the origin of replication.

_normalize_ppgpp_expression()[source]

Normalize both free and ppGpp bound expression values to 1.

_solve_ppgpp_km(raw_data, sim_data)[source]

Solves for general expression rates for bound and free RNAP and a KM for ppGpp to RNAP based on global cellular measurements. Parameters are solved for at different doubling times using a gradient descent method to minimize the difference in expression of stable RNA compared to the measured RNA in a cell. Assumes a Hill coefficient of 2 for ppGpp binding to RNAP.

Attributes set:
_fit_ppgpp_fc (float): log2 fold change in stable RNA expression

from a fast doubling time to a slow doubling time based on the rates of bound and free RNAP expression found

_ppgpp_km_squared (float): squared and unitless KM value for

to limit computation needed for fraction bound

ppgpp_km (float with mol / volume units): KM for ppGpp binding

to RNAP

adjust_polymerizing_ppgpp_expression(sim_data)[source]

Adjust ppGpp expression based on fit for ribosome and RNAP physiological constraints using least squares fit for 3 conditions with different growth rates/ppGpp.

Modifies attributes:
exp_ppgpp (ndarray[float]): expression for each gene when RNAP

is bound to ppGpp, adjusted for necessary RNAP and ribosome expression, normalized to 1

exp_free (ndarray[float]): expression for each gene when RNAP

is not bound to ppGpp, adjusted for necessary RNAP and ribosome expression, normalized to 1

Note

See docs/processes/transcription_regulation.pdf for a description of the math used in this section.

adjust_ppgpp_expression_for_tfs(sim_data)[source]

Adjusts ppGpp regulated expression to get expression with and without ppGpp regulation to match in basal condition and taking into account the effect transcription factors will have.

calculate_attenuation(sim_data, cell_specs)[source]

Calculate constants for each attenuated gene.

charging_stoich_matrix()[source]

Creates stoich matrix from i, j, v arrays

Returns 2D array with rows of metabolites for each tRNA charging reaction on the column

cistron_id_to_rna_indexes(cistron_id)[source]

Returns the indexes of transcription units containing the given RNA cistron given the ID of the cistron.

expression_from_ppgpp(ppgpp)[source]

Calculates the expression of each gene at a given concentration of ppGpp.

Parameters:

ppgpp (float with or without mol / volume units) – concentration of ppGpp, if unitless, should represent the concentration of PPGPP_CONC_UNITS

Returns:

normalized expression for each gene

Return type:

ndarray[float]

fit_rna_expression(cistron_expression)[source]

Calculates the expression of RNA transcription units that best fits the given expression levels of cistrons using nonnegative least squares.

fit_trna_expression(tRNA_cistron_expression)[source]

Calculates the expression of tRNA transcription units that best fits the given expression levels of tRNA cistrons using nonnegative least squares.

fraction_rnap_bound_ppgpp(ppgpp)[source]

Calculates the fraction of RNAP expected to be bound to ppGpp at a given concentration of ppGpp.

Parameters:

ppgpp (float with or without mol / volume units) – concentration of ppGpp, if unitless, should represent the concentration of PPGPP_CONC_UNITS

Returns:

fraction of RNAP that will be bound to ppGpp

Return type:

float

get_attenuation_stop_probabilities(trna_conc)[source]

Calculate the probability of a transcript stopping early due to attenuation.

get_rna_fractions(ppgpp)[source]

Calculates expected RNA subgroup mass fractions based on ppGpp concentration. If ppGpp expression has not been set yet, uses default measured fractions.

Parameters:

ppgpp (float with or without mol / volume units) – concentration of ppGpp, if unitless, should represent the concentration of PPGPP_CONC_UNITS

Returns:

mass fraction for each subgroup mass, values sum to 1

Return type:

dict[str, float]

get_rnap_active_fraction_from_ppGpp(ppgpp)[source]
make_elongation_rates(random, base, time_step, variable_elongation=False)[source]
rna_id_to_cistron_indexes(rna_id)[source]

Returns the indexes of cistrons that constitute the given transcription unit given the ID of the RNA transcription unit.

set_ppgpp_expression(sim_data)[source]

Called during the parca to determine expression of each transcription unit for ppGpp bound and free RNAP.

Attributes set:
exp_ppgpp (ndarray[float]): expression for each TU when RNAP is

bound to ppGpp

exp_free (ndarray[float]): expression for each TU when RNAP is not

bound to ppGpp

set_ppgpp_kinetics_parameters(init_container, constants)[source]
synth_prob_from_ppgpp(ppgpp, copy_number)[source]

Calculates the synthesis probability of each gene at a given concentration of ppGpp.

Parameters:
  • ppgpp (float with mol / volume units) – concentration of ppGpp

  • copy_number (Callable[float, int]) – function that gives the expected copy number given a doubling time and gene replication coordinate

Returns

prob (ndarray[float]): normalized synthesis probability for each gene factor (ndarray[float]): factor to adjust expression to probability for each gene

Note

copy_number should be sim_data.process.replication.get_average_copy_number but saving the function handle as a class attribute prevents pickling of sim_data without additional handling

exception reconstruction.ecoli.dataclasses.process.transcription.TranscriptionDirectionError[source]

Bases: Exception