`reconstruction.ecoli.fit_sim_data_1`

The parca, aka parameter calculator.

TODO: establish a controlled language for function behaviors (i.e. create* set* fit*) TODO: functionalize so that values are not both set and returned from some methods

reconstruction.ecoli.fit_sim_data_1.adjust_promoters(sim_data, cell_specs, **kwargs)[source]

reconstruction.ecoli.fit_sim_data_1.apply_updates(func, args, labels, dest, cpus)[source]

Use multiprocessing (if cpus > 1) to apply args to a function to get dictionary updates for a destination dictionary.

Parameters:

func (Callable[[...], dict]) – function to call with args
args (list[tuple]) – list of args to apply to func
labels (list[str]) – label for each set of args for exception information
dest (dict) – destination dictionary that will be updated with results from each function call
cpus (int) – number of cpus to use

reconstruction.ecoli.fit_sim_data_1.basal_specs(sim_data, cell_specs, disable_ribosome_capacity_fitting=False, disable_rnapoly_capacity_fitting=False, variable_elongation_transcription=True, variable_elongation_translation=False, **kwargs)[source]

reconstruction.ecoli.fit_sim_data_1.buildBasalCellSpecifications(sim_data, variable_elongation_transcription=True, variable_elongation_translation=False, disable_ribosome_capacity_fitting=False, disable_rnapoly_capacity_fitting=False)[source]

Creates cell specifications for the basal condition by fitting expression. Relies on expressionConverge() to set the expression and update masses.

Inputs

disable_ribosome_capacity_fitting (bool) - if True, ribosome expression

is not fit - disable_rnapoly_capacity_fitting (bool) - if True, RNA polymerase expression is not fit

Requires

Metabolite concentrations based on ‘minimal’ nutrients
‘basal’ RNA expression
‘basal’ doubling time

Modifies

Average mass values of the cell
cistron expression
RNA expression and synthesis probabilities

returns:

- dict {‘basal’ –

‘concDict’ {metabolite_name (str): concentration (float with units)} -: dictionary of concentrations for each metabolite with a concentration
‘fit_cistron_expression’ (array of floats) - hypothetical expression for: each RNA cistron post-fit, total normalized to 1, if all transcription units were monocistronic

‘expression’ (array of floats) - expression for each RNA, total normalized to 1 ‘doubling_time’ (float with units) - cell doubling time ‘synthProb’ (array of floats) - synthesis probability for each RNA,

total normalized to 1

‘avgCellDryMassInit’ (float with units) - average initial cell dry mass ‘fitAvgSolubleTargetMolMass’ (float with units) - the adjusted dry mass

of the soluble fraction of a cell

bulkContainer (np.ndarray object) - Two columns: ‘id’ for name and ‘count’
for expected counts based on expression of all bulk molecules

rtype:

dict} with the following keys in the dict from key ‘basal’:

Notes

TODO - sets sim_data attributes and returns values - change to only return values

reconstruction.ecoli.fit_sim_data_1.buildCombinedConditionCellSpecifications(sim_data, cell_specs, variable_elongation_transcription=True, variable_elongation_translation=False, disable_ribosome_capacity_fitting=False, disable_rnapoly_capacity_fitting=False)[source]

Creates cell specifications for sets of transcription factors being active. These sets include conditions like ‘with_aa’ or ‘no_oxygen’ where multiple transcription factors will be active at the same time.

Inputs

cell_specs {condition (str): dict} - information about each individual

transcription factor condition - disable_ribosome_capacity_fitting (bool) - if True, ribosome expression is not fit - disable_rnapoly_capacity_fitting (bool) - if True, RNA polymerase expression is not fit

Requires

Metabolite concentrations based on nutrients for the condition
Adjusted ‘basal’ RNA expression
Doubling time for the combined condition
Fold changes in expression for each gene given the TF

Modifies

cell_specs dictionary for each combined condition
RNA expression and synthesis probabilities for each combined condition

Notes

TODO - determine how to handle fold changes when multiple TFs change the

same gene because multiplying both fold changes together might not be appropriate

reconstruction.ecoli.fit_sim_data_1.buildTfConditionCellSpecifications(sim_data, tf, variable_elongation_transcription=True, variable_elongation_translation=False, disable_ribosome_capacity_fitting=False, disable_rnapoly_capacity_fitting=False)[source]

Creates cell specifications for a given transcription factor by fitting expression. Will set for the active and inactive TF condition. Relies on expressionConverge() to set the expression and masses. Uses fold change data relative to the ‘basal’ condition to determine expression for a given TF.

Inputs

tf (str) - label for the transcription factor to fit (eg. ‘CPLX-125’)
disable_ribosome_capacity_fitting (bool) - if True, ribosome expression

is not fit - disable_rnapoly_capacity_fitting (bool) - if True, RNA polymerase expression is not fit

Requires

Metabolite concentrations based on nutrients for the TF
Adjusted ‘basal’ cistron expression
Doubling time for the TF
Fold changes in expression for each gene given the TF

returns:

- dict {tf + ‘__active’/’__inactive’ –

‘concDict’ {metabolite_name (str): concentration (float with units)} -: dictionary of concentrations for each metabolite with a concentration

‘expression’ (array of floats) - expression for each RNA, total normalized to 1 ‘doubling_time’ (float with units) - cell doubling time ‘synthProb’ (array of floats) - synthesis probability for each RNA,

total normalized to 1

‘cistron_expression’ (array of floats) - hypothetical expression for: each RNA cistron, calculated from basal cistron expression levels and fold change data
‘fit_cistron_expression’ (array of floats) - hypothetical expression for: each RNA cistron post-fit, total normalized to 1, if all transcription units were monocistronic

‘avgCellDryMassInit’ (float with units) - average initial cell dry mass ‘fitAvgSolubleTargetMolMass’ (float with units) - the adjusted dry mass

of the soluble fraction of a cell

bulkContainer (np.ndarray object) - Two columns: ‘id’ for name and ‘count’
for expected counts based on expression of all bulk molecules

rtype:

dict} with the following keys in each dict:

reconstruction.ecoli.fit_sim_data_1.calculateBulkDistributions(sim_data, expression, concDict, avgCellDryMassInit, doubling_time)[source]

Finds a distribution of copy numbers for macromolecules. While RNA and protein expression can be approximated using well-described statistical distributions, complexes require absolute copy numbers. To get these distributions, this function instantiates many cells with a reduced set of molecules, forms complexes, and iterates through equilibrium and two-component system processes until metabolite counts reach a steady-state. It then computes the resulting statistical distributions.

Requires

N_SEEDS (int) - the number of instantiated cells

Inputs

expression (array of floats) - expression for each RNA, normalized to 1
concDict {metabolite (str): concentration (float with units of mol/volume)} -

dictionary for concentrations of each metabolite with location tag - avgCellDryMassInit (float with units of mass) - initial dry cell mass - doubling_time (float with units of time) - doubling time for condition

returns:

- bulkAverageContainer (np.ndarray object) - Two columns (‘id’ for name and ‘count’) – for the mean of the counts of all bulk molecules
- bulkDeviationContainer (np.ndarray object) - Two columns (‘id’ for name and ‘count’) – for the standard deviation of the counts of all bulk molecules
- proteinMonomerAverageContainer (np.ndarray object) - Two columns (‘id’ for name and ‘count’) – for the mean of the counts of all protein monomers
- proteinMonomerDeviationContainer (np.ndarray object) - Two columns (‘id’ for name and ‘count’) – for the standard deviation of the counts of all protein monomers

reconstruction.ecoli.fit_sim_data_1.calculateMinPolymerizingEnzymeByProductDistribution(productLengths, elongationRates, netLossRate, productCounts)[source]

Compute the number of ribosomes required to maintain steady state.

dP/dt = production rate - loss rate dP/dt = e_r * (1/L) * R - (k_loss * P)

At steady state: dP/dt = 0 R = sum over i ((L_i / e_r) * k_loss_i * P_i)

Multiplying both sides by volume gives an equation in terms of counts.

P = protein concentration e_r = polypeptide elongation rate per ribosome L = protein length R = ribosome concentration k_loss = net protein loss rate i = ith protein

Inputs

productLengths (array of ints with units of amino_acids) - L, protein lengths
elongationRates (array of ints with units of amino_acid/time) e_r, polypeptide elongation rate
netLossRate (array of floats with units of 1/time) - k_loss, protein loss rate
productCounts (array of floats) - P, protein counts

returns:

- float with dimensionless units for the number of ribosomes required to
maintain steady state

reconstruction.ecoli.fit_sim_data_1.calculateMinPolymerizingEnzymeByProductDistributionRNA(productLengths, elongationRates, netLossRate)[source]

Compute the number of RNA polymerases required to maintain steady state of mRNA.

dR/dt = production rate - loss rate dR/dt = e_r * (1/L) * RNAp - k_loss

At steady state: dR/dt = 0 RNAp = sum over i ((L_i / e_r) * k_loss_i)

Multiplying both sides by volume gives an equation in terms of counts.

R = mRNA transcript concentration e_r = transcript elongation rate per RNAp L = transcript length RNAp = RNAp concentration k_loss = net transcript loss rate (unit: concentration / time) i = ith transcript

Inputs

productLengths (array of ints with units of nucleotides) - L, transcript lengths
elongationRates (array of ints with units of nucleotide/time) - e_r, transcript elongation rate
netLossRate (array of floats with units of 1/time) - k_loss, transcript loss rate

returns:

- float with dimensionless units for the number of RNA polymerases required to
maintain steady state

reconstruction.ecoli.fit_sim_data_1.calculatePromoterBoundProbability(sim_data, cell_specs)[source]

Calculate the probability that a transcription factor is bound to its associated promoter for all simulated growth conditions. The bulk average concentrations calculated for TFs and their ligands are used to compute the probabilities based on the type (0CS, 1CS, 2CS) of the TF.

Requires

Bulk average counts of transcription factors and associated ligands

for each condition (in cell_specs)

returns:

- pPromoterBound (Probability that a transcription factor is bound to)
its promoter, per growth condition and TF. Each probability is indexed by
pPromoterBound[condition][TF].

reconstruction.ecoli.fit_sim_data_1.calculateRnapRecruitment(sim_data, cell_specs)[source]

Constructs the basal_prob vector and delta_prob matrix from values of r. The basal_prob vector holds the basal transcription probabilities of each transcription unit. The delta_prob matrix holds the differences in transcription probabilities when transcription factors bind to the promoters of each transcription unit. Both values are stored in sim_data.

Requires

cell_specs[‘basal’]:
- [‘r_vector’]: Fit parameters on how the recruitment of a TF affects the expression
of a gene. High (positive) values of r indicate that the TF binding increases the probability that the gene is expressed. - [‘r_columns’]: mapping of column name to index in r

Modifies

Rescales values in basal_prob such that all values are positive
Adds basal_prob and delta_prob arrays to sim_data

reconstruction.ecoli.fit_sim_data_1.calculateTranslationSupply(sim_data, doubling_time, bulkContainer, avgCellDryMassInit)[source]

Returns the supply rates of all amino acids to translation given the desired doubling time. This creates a limit on the polypeptide elongation process, and thus on growth. The amino acid supply rate is found by calculating the concentration of amino acids per gram dry cell weight and multiplying by the loss to dilution given doubling time.

Inputs

doubling_time (float with units of time) - measured doubling times given the condition
bulkContainer (np.ndarray object) - Two columns: ‘id’ for name and ‘count’
for count of all bulk molecules
avgCellDryMassInit (float with units of mass) - the average initial cell dry mass

Notes

The supply of amino acids should not be based on a desired doubling time,

but should come from a more mechanistic basis. This would allow simulations of environmental shifts in which the doubling time is unknown.

reconstruction.ecoli.fit_sim_data_1.crc32(*arrays, initial=0)[source]

Return a CRC32 checksum of the given ndarrays.

Parameters:

arrays (ndarray)
initial (int)

Return type:

int

reconstruction.ecoli.fit_sim_data_1.createBulkContainer(sim_data, expression, doubling_time)[source]

Creates a container that tracks the counts of all bulk molecules. Relies on totalCountIdDistributionRNA and totalCountIdDistributionProtein to set the counts and IDs of all RNAs and proteins.

Inputs

expression (array of floats) - relative frequency distribution of RNA expression
doubling_time (float with units of time) - measured doubling time given the condition

returns:: - bulkContainer (np.ndarray object) - Two columns – for count of all bulk molecules
rtype:: ‘id’ for name and ‘count’

reconstruction.ecoli.fit_sim_data_1.expressionConverge(sim_data, expression, concDict, doubling_time, Km=None, conditionKey=None, variable_elongation_transcription=True, variable_elongation_translation=False, disable_ribosome_capacity_fitting=False, disable_rnapoly_capacity_fitting=False)[source]

Iteratively fits synthesis probabilities for RNA. Calculates initial expression based on gene expression data and makes adjustments to match physiological constraints for ribosome and RNAP counts. Relies on fitExpression() to converge

Inputs

expression (array of floats) - expression for each RNA, normalized to 1
concDict {metabolite (str): concentration (float with units of mol/volume)} -

dictionary for concentrations of each metabolite with location tag - doubling_time (float with units of time) - doubling time - Km (array of floats with units of mol/volume) - Km for each RNA associated with RNases - disable_ribosome_capacity_fitting (bool) - if True, ribosome expression is not fit - disable_rnapoly_capacity_fitting (bool) - if True, RNA polymerase expression is not fit

Requires

MAX_FITTING_ITERATIONS (int) - number of iterations to adjust expression

before an exception is raised - FITNESS_THRESHOLD (float) - acceptable change from one iteration to break the fitting loop

returns:

- expression (array of floats) - adjusted expression for each RNA,
normalized to 1
- synthProb (array of floats) - synthesis probability for each RNA which
accounts for expression and degradation rate, normalized to 1
- avgCellDryMassInit (float with units of mass) - expected initial dry cell mass
- fitAvgSolubleTargetMolMass (float with units of mass) - the adjusted dry mass
of the soluble fraction of a cell
- bulkContainer (np.ndarray object) - Two columns (‘id’ for name and ‘count’) – for expected counts based on expression of all bulk molecules

reconstruction.ecoli.fit_sim_data_1.expressionFromConditionAndFoldChange(transcription, condPerturbations, tfFCs)[source]

Adjusts expression of RNA based on fold changes from basal for a given condition. Since fold changes are reported for individual RNA cistrons, the changes are applied to the basal expression levels of each cistron and the resulting vector is mapped back to RNA expression through nonnegative least squares. For genotype perturbations, the expression of all RNAs that include the given cistron are set to the given value.

Inputs

transcription: Instance of the Transcription class from
reconstruction.ecoli.dataclasses.process.transcription
condPerturbations {cistron ID (str): fold change (float)} -
dictionary of fold changes for cistrons based on the given condition
tfFCs {cistron ID (str): fold change (float)} -
dictionary of fold changes for cistrons based on transcription factors in the given condition

returns:

- expression (array of floats) - adjusted expression for each RNA,
normalized to 1

Notes

TODO (Travis) - Might not properly handle if an RNA is adjusted from both a

perturbation and a transcription factor, currently RNA self regulation is not included in tfFCs

reconstruction.ecoli.fit_sim_data_1.final_adjustments(sim_data, cell_specs, **kwargs)[source]

reconstruction.ecoli.fit_sim_data_1.fitCondition(sim_data, spec, condition)[source]

Takes a given condition and returns the predicted bulk average, bulk deviation, protein monomer average, protein monomer deviation, and amino acid supply to translation. This relies on calculateBulkDistributions and calculateTranslationSupply.

Inputs

condition (str) - condition to fit (eg ‘CPLX0-7705__active’)
spec {property (str): property values} - cell specifications for the given condition.

This function uses the specs “expression”, “concDict”, “avgCellDryMassInit”, and “doubling_time”

returns:

- A dictionary {condition (str) (spec (dict)} with the updated spec dictionary)
with the following values updated –
- bulkAverageContainer (np.ndarray object) - Two columns: ‘id’ for name and ‘count’
  for the mean of the counts of all bulk molecules
- bulkDeviationContainer (np.ndarray object) - Two columns: ‘id’ for name and ‘count’
  for the standard deviation of the counts of all bulk molecules
- proteinMonomerAverageContainer (np.ndarray object) - Two columns: ‘id’ for name and ‘count’
  for the mean of the counts of all protein monomers
- proteinMonomerDeviationContainer (np.ndarray object) - Two columns: ‘id’ for name and ‘count’
  for the standard deviation of the counts of all protein monomers
- translation_aa_supply (array with units of mol/(mass.time)) - the supply rates
for each amino acid to translation

reconstruction.ecoli.fit_sim_data_1.fitExpression(sim_data, bulkContainer, doubling_time, avgCellDryMassInit, Km=None)[source]

Determines expression and synthesis probabilities for RNA molecules to fit protein levels and RNA degradation rates. Assumes a steady state analysis where the RNA synthesis probability will be the same as the degradation rate. If no Km is given, then RNA degradation is assumed to be linear otherwise degradation is calculated based on saturation with RNases.

Inputs

bulkContainer (np.ndarray object) - Two columns: ‘id’ for name and ‘count’
for expected count based on expression of all bulk molecules
doubling_time (float with units of time) - doubling time
avgCellDryMassInit (float with units of mass) - expected initial dry cell mass
Km (array of floats with units of mol/volume) - Km for each RNA associated

with RNases

Modifies

bulkContainer counts of RNA and proteins

returns:

- expression (array of floats) - adjusted expression for each RNA,
normalized to 1
- synth_prob (array of floats) - synthesis probability for each RNA which
accounts for expression and degradation rate, normalized to 1
- fit_cistron_expression (array of floats) - target expression levels of
each cistron (gene) used to calculate RNA expression levels
- cistron_expression_res (array of floats) - the residuals of the NNLS
problem solved to calculate RNA expression levels

Notes

TODO - sets bulkContainer counts and returns values - change to only return values

reconstruction.ecoli.fit_sim_data_1.fitLigandConcentrations(sim_data, cell_specs)[source]

Using the fit values of pPromoterBound, updates the set concentrations of ligand metabolites and the kd’s of the ligand-TF binding reactions.

Requires

Fitted pPromoterBound: probabilities that a TF will bind to its promoter,

fit by function fitPromoterBoundProbability().

Inputs

cell_specs {condition (str): dict} - information about each condition

Modifies

Set concentrations of metabolites that are ligands in 1CS
kd’s of equilibrium reactions in 1CS

reconstruction.ecoli.fit_sim_data_1.fitMaintenanceCosts(sim_data, bulkContainer)[source]

Fits the growth-associated maintenance (GAM) cost associated with metabolism.

The energetic costs associated with growth have been estimated utilizing flux-balance analysis and are used with FBA to obtain accurate growth predictions. In the whole-cell model, some of these costs are explicitly associated with the energetic costs of translation, a biomass assembly process. Consequently we must estimate the amount of energy utilized by translation per unit of biomass (i.e. dry mass) produced, and subtract that quantity from reported GAM to acquire the modified GAM that we use in the metabolic submodel.

Requires

amino acid counts associated with protein monomers
average initial dry mass
energetic (GTP) cost of translation (per amino acid polymerized)
observed growth-associated maintenance (GAM)

In dimensions of ATP or ATP equivalents consumed per biomass

Modifies

the “dark” ATP, i.e. the modified GAM

Notes

As more non-metabolic submodels account for energetic costs, this function should be extended to subtract those costs off the observed GAM.

There also exists, in contrast, non-growth-associated-maintenance (NGAM), which is relative to total biomass rather than the biomass accumulation rate. As the name would imply, this accounts for the energetic costs of maintaining the existing biomass. It is also accounted for in the metabolic submodel.

TODO (John): Rewrite as a true function.

reconstruction.ecoli.fit_sim_data_1.fitPromoterBoundProbability(sim_data, cell_specs)[source]

Calculates the probabilities (P) that each transcription factor will bind to its target RNA. This function initially calculates these probabilities from the bulk average counts of the TFs and ligands calculated from previous steps. Then, values of parameters alpha and r in the equation below are fit such that the computed RNA synthesis probabilities converge to the measured RNA synthesis probabilities.

v_{synth, j} = alpha_j + sum_{i} P_{T,i}*r_{ij}

Due to constraints applied in the optimization, both v and P need to be shifted from their initial values.

Requires

Bulk average counts of transcription factors and associated ligands

for each condition (in cell_specs)

Inputs

cell_specs {condition (str): dict} - information about each condition

Modifies

Probabilities of TFs binding to their promoters
RNA synthesis probabilities
cell_specs[‘basal’][‘r_vector’]: Fit parameters on how the recruitment of

a TF affects the expression of a gene. High (positive) values of r indicate that the TF binding increases the probability that the gene is expressed. - cell_specs[‘basal’][‘r_columns’]: mapping of column name to index in r

Notes

See supplementary materials on transcription regulation for details on the parameters being fit.

reconstruction.ecoli.fit_sim_data_1.fitSimData_1(raw_data, **kwargs)[source]

Fits parameters necessary for the simulation based on the knowledge base

Inputs:

raw_data (KnowledgeBaseEcoli) - knowledge base consisting of the: necessary raw data

cpus (int) - number of processes to use (if > 1, use multiprocessing) debug (bool) - if True, fit only one arbitrarily-chosen transcription

factor in order to speed up a debug cycle (should not be used for an actual simulation)

save_intermediates (bool) - if True, save the state (sim_data and cell_specs): to disk in intermediates_directory after each Parca step
intermediates_directory (str) - path to the directory to save intermediate: sim_data and cell_specs files to
load_intermediate (str) - the function name of the Parca step to load: sim_data and cell_specs from; functions prior to and including this will be skipped but all following functions will run
variable_elongation_transcription (bool) - enable variable elongation: for transcription
variable_elongation_translation (bool) - enable variable elongation for: translation
disable_ribosome_capacity_fitting (bool) - if True, ribosome expression: is not fit to protein synthesis demands
disable_rnapoly_capacity_fitting (bool) - if True, RNA polymerase: expression is not fit to protein synthesis demands
cache_dir (str) - path to the directory to save cached data for: affinities of RNAs binding to endoRNases

reconstruction.ecoli.fit_sim_data_1.fit_condition(sim_data, cell_specs, cpus=1, **kwargs)[source]

reconstruction.ecoli.fit_sim_data_1.initialize(sim_data, cell_specs, raw_data=None, **kwargs)[source]

reconstruction.ecoli.fit_sim_data_1.input_adjustments(sim_data, cell_specs, debug=False, **kwargs)[source]

reconstruction.ecoli.fit_sim_data_1.mRNADistributionFromProtein(distribution_protein, translation_efficiencies, netLossRate)[source]

dP_i / dt = k * M_i * e_i - P_i * Loss_i

At steady state: M_i = Loss_i * P_i / (k * e_i)

Fraction of protein for ith gene is defined as: f_i = P_i / P_total

Substituting in: M_i = Loss_i * f_i * P_total / (k * e_i)

Normalizing M_i by summing over all i cancels out k and P_total assuming a constant translation rate.

Inputs

distribution_protein (array of floats) - distribution for each protein,

normalized to 1 - translation_efficiencies (array of floats) - translational efficiency for each mRNA, normalized to 1 - netLossRate (array of floats with units of 1/time) - rate of loss for each protein

rtype:

array of floats for the distribution of each mRNA, normalized to 1

reconstruction.ecoli.fit_sim_data_1.netLossRateFromDilutionAndDegradationProtein(doublingTime, degradationRates)[source]

Compute total loss rate (summed contributions of degradation and dilution).

Inputs

doublingTime (float with units of time) - doubling time of the cell
degradationRates (array of floats with units of 1/time) - protein degradation rate

rtype:

array of floats with units of 1/time for the total loss rate for each protein

reconstruction.ecoli.fit_sim_data_1.netLossRateFromDilutionAndDegradationRNA(doublingTime, totalEndoRnaseCountsCapacity, Km, rnaConc, countsToMolar)[source]

Compute total loss rate (summed impact of degradation and dilution). Returns the loss rate in units of (counts/time) in preparation for use in the steady state analysis in fitExpression() and setRNAPCountsConstrainedByPhysiology() (see calculateMinPolymerizingEnzymeByProductDistributionRNA()).

Derived from steady state analysis of Michaelis-Menten enzyme kinetics with competitive inhibition: for a given RNA, all other RNAs compete for RNase.

V_i = k_cat * [ES_i] v_i = k_cat * [E]0 * ([S_i]/Km_i) / (1 + sum over j genes([S_j] / Km_j))

Inputs

doublingTime (float with units of time) - doubling time of the cell
totalEndoRnaseCountsCapacity (float with units of 1/time) total kinetic

capacity of all RNases in the cell - Km (array of floats with units of mol/volume) - Michaelis-Menten constant for each RNA - rnaConc (array of floats with units of mol/volume) - concentration for each RNA - countsToMolar (float with units of mol/volume) - conversion between counts and molar

rtype:

array of floats with units of 1/time for the total loss rate for each RNA

reconstruction.ecoli.fit_sim_data_1.netLossRateFromDilutionAndDegradationRNALinear(doublingTime, degradationRates, rnaCounts)[source]

Compute total loss rate (summed contributions of degradation and dilution). Returns the loss rate in units of (counts/time) in preparation for use in the steady state analysis in fitExpression() and setRNAPCountsConstrainedByPhysiology() (see calculateMinPolymerizingEnzymeByProductDistributionRNA()).

Requires

doublingTime (float with units of time) - doubling time of the cell
degradationRates (array of floats with units of 1/time) - degradation rate

for each RNA - rnaCounts (array of floats) - counts for each RNA

rtype:

array of floats with units of 1/time for the total loss rate for each RNA

reconstruction.ecoli.fit_sim_data_1.promoter_binding(sim_data, cell_specs, **kwargs)[source]

reconstruction.ecoli.fit_sim_data_1.proteinDistributionFrommRNA(distribution_mRNA, translation_efficiencies, netLossRate)[source]

dP_i / dt = k * M_i * e_i - P_i * Loss_i

At steady state: P_i = k * M_i * e_i / Loss_i

Fraction of mRNA for ith gene is defined as: f_i = M_i / M_total

Substituting in: P_i = k * f_i * e_i * M_total / Loss_i

Normalizing P_i by summing over all i cancels out k and M_total assuming constant translation rate.

Inputs

distribution_mRNA (array of floats) - distribution for each mRNA,

normalized to 1 - translation_efficiencies (array of floats) - translational efficiency for each mRNA, normalized to 1 - netLossRate (array of floats with units of 1/time) - rate of loss for each protein

rtype:

array of floats for the distribution of each protein, normalized to 1

reconstruction.ecoli.fit_sim_data_1.rescaleMassForSolubleMetabolites(sim_data, bulkMolCntr, concDict, doubling_time)[source]

Adjust the cell’s mass to accomodate target small molecule concentrations.

Inputs

bulkMolCntr (np.ndarray object) - Two columns: ‘id’ for name and ‘count’
for count of all bulk molecules
concDict (dict) - a dictionary of metabolite ID (string) : concentration (unit’d number, dimensions of concentration) pairs
doubling_time (float with units of time) - measured doubling times given the condition

Requires

Cell mass fraction data at a given doubling time.
Average cell density.
The conversion factor for transforming from the size of an average cell to the size of a cell immediately following division.
Avogadro’s number.
Concentrations of small molecules (including both dry mass components and water).

Modifies

Adds small molecule counts to bulkMolCntr.

returns:

- newAvgCellDryMassInit, the adjusted dry mass of a cell immediately following division.
- fitAvgSolubleTargetMolMass, the adjusted dry mass of the soluble fraction of a cell

reconstruction.ecoli.fit_sim_data_1.save_state(func)[source]

Wrapper for functions called in fitSimData_1() to allow saving and loading of sim_data and cell_specs at different points in the parameter calculation pipeline. This is useful for development in order to skip time intensive steps that are not required to recalculate in order to work with the desired stage of parameter calculation.

This wrapper expects arguments in the kwargs passed into a wrapped function:

save_intermediates (bool): if True, the state (sim_data and cell_specs): will be saved to disk in intermediates_directory
intermediates_directory (str): path to the directory to save intermediate: sim_data and cell_specs files to
load_intermediate (str): the name of the function to load sim_data and: cell_specs from, functions prior to and including this will be skipped but all following functions will run

reconstruction.ecoli.fit_sim_data_1.setInitialRnaExpression(sim_data, expression, doubling_time)[source]

Creates a container that with the initial count and ID of each RNA, calculated based on the mass fraction, molecular weight, and expression distribution of each RNA. For rRNA the counts are set based on mass, while for tRNA and mRNA the counts are set based on mass and relative abundance. Relies on the math function totalCountFromMassesAndRatios.

Requires

Needs information from the knowledge base about the mass fraction,

molecular weight, and distribution of each RNA species.

Inputs

expression (array of floats) - expression for each RNA, normalized to 1
doubling_time (float with units of time) - doubling time for condition

returns:

- expression (array of floats) - contains the adjusted RNA expression,
normalized to 1

Notes

Now rnaData[“synthProb”] does not match “expression”

reconstruction.ecoli.fit_sim_data_1.setKmCooperativeEndoRNonLinearRNAdecay(sim_data, bulkContainer, cache_dir)[source]

Fits the affinities (Michaelis-Menten constants) for RNAs binding to endoRNAses.

EndoRNAses perform the first step of RNA decay by cleaving a whole RNA somewhere inside its extent. This results in RNA fragments, which are then digested into monomers by exoRNAses. To model endoRNAse activity, we need to determine an affinity (Michaelis-Menten constant) for each RNA that is consistent with experimentally observed half-lives. The Michaelis-Menten constants must be determined simultaneously, as the RNAs must compete for the active site of the endoRNAse. (See the RnaDegradation Process class for more information about the dynamical model.) The parameters are estimated using a root solver (scipy.optimize.fsolve). (See the sim_data.process.rna_decay.kmLossFunction method for more information about the optimization problem.)

Requires

cell density, dry mass fraction, and average initial dry mass
Used to calculate the cell volume, which in turn is used to calculate concentrations.
observed RNA degradation rates (half-lives)
endoRNAse counts
endoRNAse catalytic rate constants
RNA counts
boolean options that enable sensitivity analyses (see Notes below)

Modifies

Michaelis-Menten constants for first-order decay (initially set to zeros)
Several optimization-related values
Sensitivity analyses (optional, see Notes below) Terminal values for optimization-related functions

rtype:

enoRNAse Km values, in units of M

Notes

If certain options are set, a sensitivity analysis will be performed using a range of metaparameters. Outputs will be cached and utilized instead of running the optimization if possible. The function that generates the optimization functions is defined under sim_data but has no dependency on sim_data, and therefore could be moved here or elsewhere. (TODO)

TODO (John): Refactor as a pure function. TODO (John): Why is this function called ‘cooperative’? It seems to instead

assume and model competitive binding.

TODO (John): Determine what part (if any) of the ‘linear’ parameter fitting: should be retained.

reconstruction.ecoli.fit_sim_data_1.setProteinDegRates(sim_data)[source]

This function’s goal is to set the degradation rates for a subset of proteins. It first gathers the index of the proteins it wants to modify, then changes the degradation rates of those proteins. These adjustments were made so that the simulation could run.

Requires

For each protein that needs to be modified it take in an adjustment factor.

Modifies

This function modifies the protein degradation rates for the chosen proteins in sim_data.

It takes their current degradation rate and multiplies them by the factor specified in adjustments.

reconstruction.ecoli.fit_sim_data_1.setRNADegRates(sim_data)[source]

This function’s goal is to adjust the degradation rates for a subset of metabolic RNA’s. It first gathers the index of the RNA’s it wants to modify, then changes the degradation rates of those RNAs. If the specified ID is that of an RNA cistron, the degradation rates of all RNA molecules containing the cistron are adjusted. (Note: since RNA concentrations are assumed to be in equilibrium, increasing the degradation rate increases the synthesis rates of these RNAs)

Requires

For each RNA that needs to be modified, it takes in an adjustment factor

Modifies

This function modifies the RNA degradation rates for the chosen RNAs in

sim_data. It takes their current degradation rate and multiplies them by the factor specified in adjustments.

reconstruction.ecoli.fit_sim_data_1.setRNAExpression(sim_data)[source]

This function’s goal is to set expression levels for a subset of RNAs. It first gathers the index of the RNA’s it wants to modify, then changes the expression levels of those RNAs, within sim_data, based on the specified adjustment factor. If the specified ID is an RNA cistron, the expression levels of all RNA molecules containing the cistron are adjusted.

Requires

For each RNA that needs to be modified, it takes in an adjustment factor.

Modifies

This function modifies the basal RNA expression levels set in sim_data,

for the chosen RNAs. It takes their current basal expression and multiplies them by the factor specified in adjustments. - After updating the basal expression levels for the given RNAs, the function normalizes all the basal expression levels.

reconstruction.ecoli.fit_sim_data_1.setRNAPCountsConstrainedByPhysiology(sim_data, bulkContainer, doubling_time, avgCellDryMassInit, variable_elongation_transcription, Km=None)[source]

Set counts of RNA polymerase based on two constraints: (1) Number of RNAP subunits required to maintain steady state of mRNAs (2) Expected RNAP subunit counts based on (mRNA) distribution recorded in

bulkContainer

Inputs

bulkContainer (np.ndarray object) - Two columns: ‘id’ for name and ‘count’
for count of all bulk molecules
doubling_time (float with units of time) - doubling time given the condition
avgCellDryMassInit (float with units of mass) - expected initial dry cell mass
Km (array of floats with units of mol/volume) - Km for each RNA associated

with RNases

Modifies

bulkContainer (np.ndarray object) - the counts of RNA polymerase
subunits are set according to Constraint 1

Notes

Constraint 2 is not being used – see final line of this function.

reconstruction.ecoli.fit_sim_data_1.setRibosomeCountsConstrainedByPhysiology(sim_data, bulkContainer, doubling_time, variable_elongation_translation)[source]

Set counts of ribosomal protein subunits based on three constraints: (1) Expected protein distribution doubles in one cell cycle (2) Measured rRNA mass fractions (3) Expected ribosomal protein subunit counts based on RNA expression data

Inputs

bulkContainer (np.ndarray object) - Two columns: ‘id’ for name and ‘count’: for count of all bulk molecules

doubling_time (float with units of time) - doubling time given the condition variable_elongation_translation (bool) - whether there is variable elongation for translation

Modifies

counts of ribosomal protein subunits in bulkContainer

reconstruction.ecoli.fit_sim_data_1.setTranslationEfficiencies(sim_data)[source]

This function’s goal is to set translation efficiencies for a subset of metabolic proteins. It first gathers the index of the proteins it wants to modify, then changes the monomer translation efficiencies based on the adjustment that is specified. These adjustments were made so that the simulation could run.

Requires

For each protein that needs to be modified, it takes in an adjustment factor.

Modifies

This function modifies, for a subset of proteins, their translational efficiencies in sim_data.

It takes their current efficiency and multiplies them by the factor specified in adjustments.

reconstruction.ecoli.fit_sim_data_1.set_balanced_translation_efficiencies(sim_data)[source]

Sets the translation efficiencies of a group of proteins to be equal to the mean value of all proteins within the group.

Requires

List of proteins that should have balanced translation efficiencies.

Modifies

Translation efficiencies of proteins within each specified group.

reconstruction.ecoli.fit_sim_data_1.set_conditions(sim_data, cell_specs, **kwargs)[source]

reconstruction.ecoli.fit_sim_data_1.tf_condition_specs(sim_data, cell_specs, cpus=1, disable_ribosome_capacity_fitting=False, disable_rnapoly_capacity_fitting=False, variable_elongation_transcription=True, variable_elongation_translation=False, **kwargs)[source]

reconstruction.ecoli.fit_sim_data_1.totalCountFromMassesAndRatios(totalMass, individualMasses, distribution)[source]

Function to determine the expected total counts for a group of molecules in order to achieve a total mass with a given distribution of individual molecules.

Math:

Total mass = dot(mass, count)

Fraction of i: f = count / Total counts

Substituting: Total mass = dot(mass, f * Total counts) Total mass = Total counts * dot(mass, f)

Total counts = Total mass / dot(mass, f)

Requires

totalMass (float with mass units): total mass of the group of molecules
individualMasses (array of floats with mass units): mass for individual

molecules in the group - distribution (array of floats): distribution of individual molecules, normalized to 1

returns:: - counts (float)
rtype:: total counts (does not need to be a whole number)

reconstruction.ecoli.fit_sim_data_1.totalCountIdDistributionProtein(sim_data, expression, doubling_time)[source]

Calculates the total counts of proteins from the relative expression of RNA, individual protein mass, and total protein mass. Relies on the math functions netLossRateFromDilutionAndDegradationProtein, proteinDistributionFrommRNA, totalCountFromMassesAndRatios.

Inputs

expression (array of floats) - relative frequency distribution of RNA expression
doubling_time (float with units of time) - measured doubling time given the condition

returns:

- total_count_protein (float) - total number of proteins
- ids_protein (array of str) - name of each protein with location tag
- distribution_protein (array of floats) - distribution for each protein,
normalized to 1

reconstruction.ecoli.fit_sim_data_1.totalCountIdDistributionRNA(sim_data, expression, doubling_time)[source]

Calculates the total counts of RNA from their relative expression, individual mass, and total RNA mass. Relies on the math function totalCountFromMassesAndRatios.

Inputs

expression (array of floats) - relative frequency distribution of RNA
expression
doubling_time (float with units of time) - measured doubling time given
the condition

returns:

- total_count_RNA (float) - total number of RNAs
- ids_rnas (array of str) - name of each RNA with location tag
- distribution_RNA (array of floats) - distribution for each RNA, – normalized to 1

reconstruction.ecoli.fit_sim_data_1

Inputs

Requires

Modifies

Inputs

Requires

Modifies

Inputs

Requires

Requires

Inputs

Inputs

Inputs

Requires

Requires

Modifies

Inputs

Inputs

Inputs

Requires

Inputs

Inputs

Inputs

Modifies

Requires

Inputs

Modifies

Requires

Modifies

Requires

Inputs

Modifies

Inputs

Inputs

Inputs

Requires

Inputs

Inputs

Requires

Modifies

Requires

Inputs

Requires

Modifies

Requires

Modifies

Requires

Modifies

Requires

Modifies

Inputs

Modifies

Inputs

Modifies

Requires

Modifies

Requires

Modifies

Requires

Inputs

Inputs

`reconstruction.ecoli.fit_sim_data_1`