reconstruction.ecoli.fit_sim_data_1
The parca, aka parameter calculator.
TODO: establish a controlled language for function behaviors (i.e. create* set* fit*) TODO: functionalize so that values are not both set and returned from some methods
- reconstruction.ecoli.fit_sim_data_1.apply_updates(func, args, labels, dest, cpus)[source]
Use multiprocessing (if cpus > 1) to apply args to a function to get dictionary updates for a destination dictionary.
- reconstruction.ecoli.fit_sim_data_1.basal_specs(sim_data, cell_specs, disable_ribosome_capacity_fitting=False, disable_rnapoly_capacity_fitting=False, variable_elongation_transcription=True, variable_elongation_translation=False, **kwargs)[source]
- reconstruction.ecoli.fit_sim_data_1.buildBasalCellSpecifications(sim_data, variable_elongation_transcription=True, variable_elongation_translation=False, disable_ribosome_capacity_fitting=False, disable_rnapoly_capacity_fitting=False)[source]
Creates cell specifications for the basal condition by fitting expression. Relies on expressionConverge() to set the expression and update masses.
Inputs
disable_ribosome_capacity_fitting (bool) - if True, ribosome expression
is not fit - disable_rnapoly_capacity_fitting (bool) - if True, RNA polymerase expression is not fit
Requires
Metabolite concentrations based on ‘minimal’ nutrients
‘basal’ RNA expression
‘basal’ doubling time
Modifies
Average mass values of the cell
cistron expression
RNA expression and synthesis probabilities
- returns:
- dict {‘basal’ –
- ‘concDict’ {metabolite_name (str): concentration (float with units)} -
dictionary of concentrations for each metabolite with a concentration
- ‘fit_cistron_expression’ (array of floats) - hypothetical expression for
each RNA cistron post-fit, total normalized to 1, if all transcription units were monocistronic
‘expression’ (array of floats) - expression for each RNA, total normalized to 1 ‘doubling_time’ (float with units) - cell doubling time ‘synthProb’ (array of floats) - synthesis probability for each RNA,
total normalized to 1
‘avgCellDryMassInit’ (float with units) - average initial cell dry mass ‘fitAvgSolubleTargetMolMass’ (float with units) - the adjusted dry mass
of the soluble fraction of a cell
- bulkContainer (np.ndarray object) - Two columns: ‘id’ for name and ‘count’
for expected counts based on expression of all bulk molecules
- rtype:
dict} with the following keys in the dict from key ‘basal’:
Notes
TODO - sets sim_data attributes and returns values - change to only return values
- reconstruction.ecoli.fit_sim_data_1.buildCombinedConditionCellSpecifications(sim_data, cell_specs, variable_elongation_transcription=True, variable_elongation_translation=False, disable_ribosome_capacity_fitting=False, disable_rnapoly_capacity_fitting=False)[source]
Creates cell specifications for sets of transcription factors being active. These sets include conditions like ‘with_aa’ or ‘no_oxygen’ where multiple transcription factors will be active at the same time.
Inputs
cell_specs {condition (str): dict} - information about each individual
transcription factor condition - disable_ribosome_capacity_fitting (bool) - if True, ribosome expression is not fit - disable_rnapoly_capacity_fitting (bool) - if True, RNA polymerase expression is not fit
Requires
Metabolite concentrations based on nutrients for the condition
Adjusted ‘basal’ RNA expression
Doubling time for the combined condition
Fold changes in expression for each gene given the TF
Modifies
cell_specs dictionary for each combined condition
RNA expression and synthesis probabilities for each combined condition
Notes
TODO - determine how to handle fold changes when multiple TFs change the
same gene because multiplying both fold changes together might not be appropriate
- reconstruction.ecoli.fit_sim_data_1.buildTfConditionCellSpecifications(sim_data, tf, variable_elongation_transcription=True, variable_elongation_translation=False, disable_ribosome_capacity_fitting=False, disable_rnapoly_capacity_fitting=False)[source]
Creates cell specifications for a given transcription factor by fitting expression. Will set for the active and inactive TF condition. Relies on expressionConverge() to set the expression and masses. Uses fold change data relative to the ‘basal’ condition to determine expression for a given TF.
Inputs
tf (str) - label for the transcription factor to fit (eg. ‘CPLX-125’)
disable_ribosome_capacity_fitting (bool) - if True, ribosome expression
is not fit - disable_rnapoly_capacity_fitting (bool) - if True, RNA polymerase expression is not fit
Requires
Metabolite concentrations based on nutrients for the TF
Adjusted ‘basal’ cistron expression
Doubling time for the TF
Fold changes in expression for each gene given the TF
- returns:
- dict {tf + ‘__active’/’__inactive’ –
- ‘concDict’ {metabolite_name (str): concentration (float with units)} -
dictionary of concentrations for each metabolite with a concentration
‘expression’ (array of floats) - expression for each RNA, total normalized to 1 ‘doubling_time’ (float with units) - cell doubling time ‘synthProb’ (array of floats) - synthesis probability for each RNA,
total normalized to 1
- ‘cistron_expression’ (array of floats) - hypothetical expression for
each RNA cistron, calculated from basal cistron expression levels and fold change data
- ‘fit_cistron_expression’ (array of floats) - hypothetical expression for
each RNA cistron post-fit, total normalized to 1, if all transcription units were monocistronic
‘avgCellDryMassInit’ (float with units) - average initial cell dry mass ‘fitAvgSolubleTargetMolMass’ (float with units) - the adjusted dry mass
of the soluble fraction of a cell
- bulkContainer (np.ndarray object) - Two columns: ‘id’ for name and ‘count’
for expected counts based on expression of all bulk molecules
- rtype:
dict} with the following keys in each dict:
- reconstruction.ecoli.fit_sim_data_1.calculateBulkDistributions(sim_data, expression, concDict, avgCellDryMassInit, doubling_time)[source]
Finds a distribution of copy numbers for macromolecules. While RNA and protein expression can be approximated using well-described statistical distributions, complexes require absolute copy numbers. To get these distributions, this function instantiates many cells with a reduced set of molecules, forms complexes, and iterates through equilibrium and two-component system processes until metabolite counts reach a steady-state. It then computes the resulting statistical distributions.
Requires
N_SEEDS (int) - the number of instantiated cells
Inputs
expression (array of floats) - expression for each RNA, normalized to 1
concDict {metabolite (str): concentration (float with units of mol/volume)} -
dictionary for concentrations of each metabolite with location tag - avgCellDryMassInit (float with units of mass) - initial dry cell mass - doubling_time (float with units of time) - doubling time for condition
- returns:
- bulkAverageContainer (np.ndarray object) - Two columns (‘id’ for name and ‘count’) – for the mean of the counts of all bulk molecules
- bulkDeviationContainer (np.ndarray object) - Two columns (‘id’ for name and ‘count’) – for the standard deviation of the counts of all bulk molecules
- proteinMonomerAverageContainer (np.ndarray object) - Two columns (‘id’ for name and ‘count’) – for the mean of the counts of all protein monomers
- proteinMonomerDeviationContainer (np.ndarray object) - Two columns (‘id’ for name and ‘count’) – for the standard deviation of the counts of all protein monomers
- reconstruction.ecoli.fit_sim_data_1.calculateMinPolymerizingEnzymeByProductDistribution(productLengths, elongationRates, netLossRate, productCounts)[source]
Compute the number of ribosomes required to maintain steady state.
dP/dt = production rate - loss rate dP/dt = e_r * (1/L) * R - (k_loss * P)
At steady state: dP/dt = 0 R = sum over i ((L_i / e_r) * k_loss_i * P_i)
Multiplying both sides by volume gives an equation in terms of counts.
P = protein concentration e_r = polypeptide elongation rate per ribosome L = protein length R = ribosome concentration k_loss = net protein loss rate i = ith protein
Inputs
productLengths (array of ints with units of amino_acids) - L, protein lengths
elongationRates (array of ints with units of amino_acid/time) e_r, polypeptide elongation rate
netLossRate (array of floats with units of 1/time) - k_loss, protein loss rate
productCounts (array of floats) - P, protein counts
- returns:
- float with dimensionless units for the number of ribosomes required to
maintain steady state
- reconstruction.ecoli.fit_sim_data_1.calculateMinPolymerizingEnzymeByProductDistributionRNA(productLengths, elongationRates, netLossRate)[source]
Compute the number of RNA polymerases required to maintain steady state of mRNA.
dR/dt = production rate - loss rate dR/dt = e_r * (1/L) * RNAp - k_loss
At steady state: dR/dt = 0 RNAp = sum over i ((L_i / e_r) * k_loss_i)
Multiplying both sides by volume gives an equation in terms of counts.
R = mRNA transcript concentration e_r = transcript elongation rate per RNAp L = transcript length RNAp = RNAp concentration k_loss = net transcript loss rate (unit: concentration / time) i = ith transcript
Inputs
productLengths (array of ints with units of nucleotides) - L, transcript lengths
elongationRates (array of ints with units of nucleotide/time) - e_r, transcript elongation rate
netLossRate (array of floats with units of 1/time) - k_loss, transcript loss rate
- returns:
- float with dimensionless units for the number of RNA polymerases required to
maintain steady state
- reconstruction.ecoli.fit_sim_data_1.calculatePromoterBoundProbability(sim_data, cell_specs)[source]
Calculate the probability that a transcription factor is bound to its associated promoter for all simulated growth conditions. The bulk average concentrations calculated for TFs and their ligands are used to compute the probabilities based on the type (0CS, 1CS, 2CS) of the TF.
Requires
Bulk average counts of transcription factors and associated ligands
for each condition (in cell_specs)
- returns:
- pPromoterBound (Probability that a transcription factor is bound to)
its promoter, per growth condition and TF. Each probability is indexed by
pPromoterBound[condition][TF].
- reconstruction.ecoli.fit_sim_data_1.calculateRnapRecruitment(sim_data, cell_specs)[source]
Constructs the basal_prob vector and delta_prob matrix from values of r. The basal_prob vector holds the basal transcription probabilities of each transcription unit. The delta_prob matrix holds the differences in transcription probabilities when transcription factors bind to the promoters of each transcription unit. Both values are stored in sim_data.
Requires
- cell_specs[‘basal’]:
[‘r_vector’]: Fit parameters on how the recruitment of a TF affects the expression
of a gene. High (positive) values of r indicate that the TF binding increases the probability that the gene is expressed. - [‘r_columns’]: mapping of column name to index in r
Modifies
Rescales values in basal_prob such that all values are positive
Adds basal_prob and delta_prob arrays to sim_data
- reconstruction.ecoli.fit_sim_data_1.calculateTranslationSupply(sim_data, doubling_time, bulkContainer, avgCellDryMassInit)[source]
Returns the supply rates of all amino acids to translation given the desired doubling time. This creates a limit on the polypeptide elongation process, and thus on growth. The amino acid supply rate is found by calculating the concentration of amino acids per gram dry cell weight and multiplying by the loss to dilution given doubling time.
Inputs
doubling_time (float with units of time) - measured doubling times given the condition
- bulkContainer (np.ndarray object) - Two columns: ‘id’ for name and ‘count’
for count of all bulk molecules
avgCellDryMassInit (float with units of mass) - the average initial cell dry mass
Notes
The supply of amino acids should not be based on a desired doubling time,
but should come from a more mechanistic basis. This would allow simulations of environmental shifts in which the doubling time is unknown.
- reconstruction.ecoli.fit_sim_data_1.crc32(*arrays, initial=0)[source]
Return a CRC32 checksum of the given ndarrays.
- reconstruction.ecoli.fit_sim_data_1.createBulkContainer(sim_data, expression, doubling_time)[source]
Creates a container that tracks the counts of all bulk molecules. Relies on totalCountIdDistributionRNA and totalCountIdDistributionProtein to set the counts and IDs of all RNAs and proteins.
Inputs
expression (array of floats) - relative frequency distribution of RNA expression
doubling_time (float with units of time) - measured doubling time given the condition
- returns:
- bulkContainer (np.ndarray object) - Two columns – for count of all bulk molecules
- rtype:
‘id’ for name and ‘count’
- reconstruction.ecoli.fit_sim_data_1.expressionConverge(sim_data, expression, concDict, doubling_time, Km=None, conditionKey=None, variable_elongation_transcription=True, variable_elongation_translation=False, disable_ribosome_capacity_fitting=False, disable_rnapoly_capacity_fitting=False)[source]
Iteratively fits synthesis probabilities for RNA. Calculates initial expression based on gene expression data and makes adjustments to match physiological constraints for ribosome and RNAP counts. Relies on fitExpression() to converge
Inputs
expression (array of floats) - expression for each RNA, normalized to 1
concDict {metabolite (str): concentration (float with units of mol/volume)} -
dictionary for concentrations of each metabolite with location tag - doubling_time (float with units of time) - doubling time - Km (array of floats with units of mol/volume) - Km for each RNA associated with RNases - disable_ribosome_capacity_fitting (bool) - if True, ribosome expression is not fit - disable_rnapoly_capacity_fitting (bool) - if True, RNA polymerase expression is not fit
Requires
MAX_FITTING_ITERATIONS (int) - number of iterations to adjust expression
before an exception is raised - FITNESS_THRESHOLD (float) - acceptable change from one iteration to break the fitting loop
- returns:
- expression (array of floats) - adjusted expression for each RNA,
normalized to 1
- synthProb (array of floats) - synthesis probability for each RNA which
accounts for expression and degradation rate, normalized to 1
- avgCellDryMassInit (float with units of mass) - expected initial dry cell mass
- fitAvgSolubleTargetMolMass (float with units of mass) - the adjusted dry mass
of the soluble fraction of a cell
- bulkContainer (np.ndarray object) - Two columns (‘id’ for name and ‘count’) – for expected counts based on expression of all bulk molecules
- reconstruction.ecoli.fit_sim_data_1.expressionFromConditionAndFoldChange(transcription, condPerturbations, tfFCs)[source]
Adjusts expression of RNA based on fold changes from basal for a given condition. Since fold changes are reported for individual RNA cistrons, the changes are applied to the basal expression levels of each cistron and the resulting vector is mapped back to RNA expression through nonnegative least squares. For genotype perturbations, the expression of all RNAs that include the given cistron are set to the given value.
Inputs
- transcription: Instance of the Transcription class from
reconstruction.ecoli.dataclasses.process.transcription
- condPerturbations {cistron ID (str): fold change (float)} -
dictionary of fold changes for cistrons based on the given condition
- tfFCs {cistron ID (str): fold change (float)} -
dictionary of fold changes for cistrons based on transcription factors in the given condition
- returns:
- expression (array of floats) - adjusted expression for each RNA,
normalized to 1
Notes
TODO (Travis) - Might not properly handle if an RNA is adjusted from both a
perturbation and a transcription factor, currently RNA self regulation is not included in tfFCs
- reconstruction.ecoli.fit_sim_data_1.fitCondition(sim_data, spec, condition)[source]
Takes a given condition and returns the predicted bulk average, bulk deviation, protein monomer average, protein monomer deviation, and amino acid supply to translation. This relies on calculateBulkDistributions and calculateTranslationSupply.
Inputs
condition (str) - condition to fit (eg ‘CPLX0-7705__active’)
spec {property (str): property values} - cell specifications for the given condition.
This function uses the specs “expression”, “concDict”, “avgCellDryMassInit”, and “doubling_time”
- returns:
- A dictionary {condition (str) (spec (dict)} with the updated spec dictionary)
with the following values updated –
- bulkAverageContainer (np.ndarray object) - Two columns: ‘id’ for name and ‘count’
for the mean of the counts of all bulk molecules
- bulkDeviationContainer (np.ndarray object) - Two columns: ‘id’ for name and ‘count’
for the standard deviation of the counts of all bulk molecules
- proteinMonomerAverageContainer (np.ndarray object) - Two columns: ‘id’ for name and ‘count’
for the mean of the counts of all protein monomers
- proteinMonomerDeviationContainer (np.ndarray object) - Two columns: ‘id’ for name and ‘count’
for the standard deviation of the counts of all protein monomers
translation_aa_supply (array with units of mol/(mass.time)) - the supply rates
for each amino acid to translation
- reconstruction.ecoli.fit_sim_data_1.fitExpression(sim_data, bulkContainer, doubling_time, avgCellDryMassInit, Km=None)[source]
Determines expression and synthesis probabilities for RNA molecules to fit protein levels and RNA degradation rates. Assumes a steady state analysis where the RNA synthesis probability will be the same as the degradation rate. If no Km is given, then RNA degradation is assumed to be linear otherwise degradation is calculated based on saturation with RNases.
Inputs
- bulkContainer (np.ndarray object) - Two columns: ‘id’ for name and ‘count’
for expected count based on expression of all bulk molecules
doubling_time (float with units of time) - doubling time
avgCellDryMassInit (float with units of mass) - expected initial dry cell mass
Km (array of floats with units of mol/volume) - Km for each RNA associated
with RNases
Modifies
bulkContainer counts of RNA and proteins
- returns:
- expression (array of floats) - adjusted expression for each RNA,
normalized to 1
- synth_prob (array of floats) - synthesis probability for each RNA which
accounts for expression and degradation rate, normalized to 1
- fit_cistron_expression (array of floats) - target expression levels of
each cistron (gene) used to calculate RNA expression levels
- cistron_expression_res (array of floats) - the residuals of the NNLS
problem solved to calculate RNA expression levels
Notes
TODO - sets bulkContainer counts and returns values - change to only return values
- reconstruction.ecoli.fit_sim_data_1.fitLigandConcentrations(sim_data, cell_specs)[source]
Using the fit values of pPromoterBound, updates the set concentrations of ligand metabolites and the kd’s of the ligand-TF binding reactions.
Requires
Fitted pPromoterBound: probabilities that a TF will bind to its promoter,
fit by function fitPromoterBoundProbability().
Inputs
cell_specs {condition (str): dict} - information about each condition
Modifies
Set concentrations of metabolites that are ligands in 1CS
kd’s of equilibrium reactions in 1CS
- reconstruction.ecoli.fit_sim_data_1.fitMaintenanceCosts(sim_data, bulkContainer)[source]
Fits the growth-associated maintenance (GAM) cost associated with metabolism.
The energetic costs associated with growth have been estimated utilizing flux-balance analysis and are used with FBA to obtain accurate growth predictions. In the whole-cell model, some of these costs are explicitly associated with the energetic costs of translation, a biomass assembly process. Consequently we must estimate the amount of energy utilized by translation per unit of biomass (i.e. dry mass) produced, and subtract that quantity from reported GAM to acquire the modified GAM that we use in the metabolic submodel.
Requires
amino acid counts associated with protein monomers
average initial dry mass
energetic (GTP) cost of translation (per amino acid polymerized)
observed growth-associated maintenance (GAM)
In dimensions of ATP or ATP equivalents consumed per biomass
Modifies
the “dark” ATP, i.e. the modified GAM
Notes
As more non-metabolic submodels account for energetic costs, this function should be extended to subtract those costs off the observed GAM.
There also exists, in contrast, non-growth-associated-maintenance (NGAM), which is relative to total biomass rather than the biomass accumulation rate. As the name would imply, this accounts for the energetic costs of maintaining the existing biomass. It is also accounted for in the metabolic submodel.
TODO (John): Rewrite as a true function.
- reconstruction.ecoli.fit_sim_data_1.fitPromoterBoundProbability(sim_data, cell_specs)[source]
Calculates the probabilities (P) that each transcription factor will bind to its target RNA. This function initially calculates these probabilities from the bulk average counts of the TFs and ligands calculated from previous steps. Then, values of parameters alpha and r in the equation below are fit such that the computed RNA synthesis probabilities converge to the measured RNA synthesis probabilities.
v_{synth, j} = alpha_j + sum_{i} P_{T,i}*r_{ij}
Due to constraints applied in the optimization, both v and P need to be shifted from their initial values.
Requires
Bulk average counts of transcription factors and associated ligands
for each condition (in cell_specs)
Inputs
cell_specs {condition (str): dict} - information about each condition
Modifies
Probabilities of TFs binding to their promoters
RNA synthesis probabilities
cell_specs[‘basal’][‘r_vector’]: Fit parameters on how the recruitment of
a TF affects the expression of a gene. High (positive) values of r indicate that the TF binding increases the probability that the gene is expressed. - cell_specs[‘basal’][‘r_columns’]: mapping of column name to index in r
Notes
See supplementary materials on transcription regulation for details on the parameters being fit.
- reconstruction.ecoli.fit_sim_data_1.fitSimData_1(raw_data, **kwargs)[source]
Fits parameters necessary for the simulation based on the knowledge base
- Inputs:
- raw_data (KnowledgeBaseEcoli) - knowledge base consisting of the
necessary raw data
cpus (int) - number of processes to use (if > 1, use multiprocessing) debug (bool) - if True, fit only one arbitrarily-chosen transcription
factor in order to speed up a debug cycle (should not be used for an actual simulation)
- save_intermediates (bool) - if True, save the state (sim_data and cell_specs)
to disk in intermediates_directory after each Parca step
- intermediates_directory (str) - path to the directory to save intermediate
sim_data and cell_specs files to
- load_intermediate (str) - the function name of the Parca step to load
sim_data and cell_specs from; functions prior to and including this will be skipped but all following functions will run
- variable_elongation_transcription (bool) - enable variable elongation
for transcription
- variable_elongation_translation (bool) - enable variable elongation for
translation
- disable_ribosome_capacity_fitting (bool) - if True, ribosome expression
is not fit to protein synthesis demands
- disable_rnapoly_capacity_fitting (bool) - if True, RNA polymerase
expression is not fit to protein synthesis demands
- reconstruction.ecoli.fit_sim_data_1.initialize(sim_data, cell_specs, raw_data=None, **kwargs)[source]
- reconstruction.ecoli.fit_sim_data_1.input_adjustments(sim_data, cell_specs, debug=False, **kwargs)[source]
- reconstruction.ecoli.fit_sim_data_1.mRNADistributionFromProtein(distribution_protein, translation_efficiencies, netLossRate)[source]
dP_i / dt = k * M_i * e_i - P_i * Loss_i
At steady state: M_i = Loss_i * P_i / (k * e_i)
Fraction of protein for ith gene is defined as: f_i = P_i / P_total
Substituting in: M_i = Loss_i * f_i * P_total / (k * e_i)
Normalizing M_i by summing over all i cancels out k and P_total assuming a constant translation rate.
Inputs
distribution_protein (array of floats) - distribution for each protein,
normalized to 1 - translation_efficiencies (array of floats) - translational efficiency for each mRNA, normalized to 1 - netLossRate (array of floats with units of 1/time) - rate of loss for each protein
- rtype:
array of floats for the distribution of each mRNA, normalized to 1
- reconstruction.ecoli.fit_sim_data_1.netLossRateFromDilutionAndDegradationProtein(doublingTime, degradationRates)[source]
Compute total loss rate (summed contributions of degradation and dilution).
Inputs
doublingTime (float with units of time) - doubling time of the cell
degradationRates (array of floats with units of 1/time) - protein degradation rate
- rtype:
array of floats with units of 1/time for the total loss rate for each protein
- reconstruction.ecoli.fit_sim_data_1.netLossRateFromDilutionAndDegradationRNA(doublingTime, totalEndoRnaseCountsCapacity, Km, rnaConc, countsToMolar)[source]
Compute total loss rate (summed impact of degradation and dilution). Returns the loss rate in units of (counts/time) in preparation for use in the steady state analysis in fitExpression() and setRNAPCountsConstrainedByPhysiology() (see calculateMinPolymerizingEnzymeByProductDistributionRNA()).
Derived from steady state analysis of Michaelis-Menten enzyme kinetics with competitive inhibition: for a given RNA, all other RNAs compete for RNase.
V_i = k_cat * [ES_i] v_i = k_cat * [E]0 * ([S_i]/Km_i) / (1 + sum over j genes([S_j] / Km_j))
Inputs
doublingTime (float with units of time) - doubling time of the cell
totalEndoRnaseCountsCapacity (float with units of 1/time) total kinetic
capacity of all RNases in the cell - Km (array of floats with units of mol/volume) - Michaelis-Menten constant for each RNA - rnaConc (array of floats with units of mol/volume) - concentration for each RNA - countsToMolar (float with units of mol/volume) - conversion between counts and molar
- rtype:
array of floats with units of 1/time for the total loss rate for each RNA
- reconstruction.ecoli.fit_sim_data_1.netLossRateFromDilutionAndDegradationRNALinear(doublingTime, degradationRates, rnaCounts)[source]
Compute total loss rate (summed contributions of degradation and dilution). Returns the loss rate in units of (counts/time) in preparation for use in the steady state analysis in fitExpression() and setRNAPCountsConstrainedByPhysiology() (see calculateMinPolymerizingEnzymeByProductDistributionRNA()).
Requires
doublingTime (float with units of time) - doubling time of the cell
degradationRates (array of floats with units of 1/time) - degradation rate
for each RNA - rnaCounts (array of floats) - counts for each RNA
- rtype:
array of floats with units of 1/time for the total loss rate for each RNA
- reconstruction.ecoli.fit_sim_data_1.proteinDistributionFrommRNA(distribution_mRNA, translation_efficiencies, netLossRate)[source]
dP_i / dt = k * M_i * e_i - P_i * Loss_i
At steady state: P_i = k * M_i * e_i / Loss_i
Fraction of mRNA for ith gene is defined as: f_i = M_i / M_total
Substituting in: P_i = k * f_i * e_i * M_total / Loss_i
Normalizing P_i by summing over all i cancels out k and M_total assuming constant translation rate.
Inputs
distribution_mRNA (array of floats) - distribution for each mRNA,
normalized to 1 - translation_efficiencies (array of floats) - translational efficiency for each mRNA, normalized to 1 - netLossRate (array of floats with units of 1/time) - rate of loss for each protein
- rtype:
array of floats for the distribution of each protein, normalized to 1
- reconstruction.ecoli.fit_sim_data_1.rescaleMassForSolubleMetabolites(sim_data, bulkMolCntr, concDict, doubling_time)[source]
Adjust the cell’s mass to accomodate target small molecule concentrations.
Inputs
- bulkMolCntr (np.ndarray object) - Two columns: ‘id’ for name and ‘count’
for count of all bulk molecules
concDict (dict) - a dictionary of metabolite ID (string) : concentration (unit’d number, dimensions of concentration) pairs
doubling_time (float with units of time) - measured doubling times given the condition
Requires
Cell mass fraction data at a given doubling time.
Average cell density.
The conversion factor for transforming from the size of an average cell to the size of a cell immediately following division.
Avogadro’s number.
Concentrations of small molecules (including both dry mass components and water).
Modifies
Adds small molecule counts to bulkMolCntr.
- returns:
- newAvgCellDryMassInit, the adjusted dry mass of a cell immediately following division.
- fitAvgSolubleTargetMolMass, the adjusted dry mass of the soluble fraction of a cell
- reconstruction.ecoli.fit_sim_data_1.save_state(func)[source]
Wrapper for functions called in fitSimData_1() to allow saving and loading of sim_data and cell_specs at different points in the parameter calculation pipeline. This is useful for development in order to skip time intensive steps that are not required to recalculate in order to work with the desired stage of parameter calculation.
- This wrapper expects arguments in the kwargs passed into a wrapped function:
- save_intermediates (bool): if True, the state (sim_data and cell_specs)
will be saved to disk in intermediates_directory
- intermediates_directory (str): path to the directory to save intermediate
sim_data and cell_specs files to
- load_intermediate (str): the name of the function to load sim_data and
cell_specs from, functions prior to and including this will be skipped but all following functions will run
- reconstruction.ecoli.fit_sim_data_1.setInitialRnaExpression(sim_data, expression, doubling_time)[source]
Creates a container that with the initial count and ID of each RNA, calculated based on the mass fraction, molecular weight, and expression distribution of each RNA. For rRNA the counts are set based on mass, while for tRNA and mRNA the counts are set based on mass and relative abundance. Relies on the math function totalCountFromMassesAndRatios.
Requires
Needs information from the knowledge base about the mass fraction,
molecular weight, and distribution of each RNA species.
Inputs
expression (array of floats) - expression for each RNA, normalized to 1
doubling_time (float with units of time) - doubling time for condition
- returns:
- expression (array of floats) - contains the adjusted RNA expression,
normalized to 1
Notes
Now rnaData[“synthProb”] does not match “expression”
- reconstruction.ecoli.fit_sim_data_1.setKmCooperativeEndoRNonLinearRNAdecay(sim_data, bulkContainer)[source]
Fits the affinities (Michaelis-Menten constants) for RNAs binding to endoRNAses.
EndoRNAses perform the first step of RNA decay by cleaving a whole RNA somewhere inside its extent. This results in RNA fragments, which are then digested into monomers by exoRNAses. To model endoRNAse activity, we need to determine an affinity (Michaelis-Menten constant) for each RNA that is consistent with experimentally observed half-lives. The Michaelis-Menten constants must be determined simultaneously, as the RNAs must compete for the active site of the endoRNAse. (See the RnaDegradation Process class for more information about the dynamical model.) The parameters are estimated using a root solver (scipy.optimize.fsolve). (See the sim_data.process.rna_decay.kmLossFunction method for more information about the optimization problem.)
Requires
- cell density, dry mass fraction, and average initial dry mass
Used to calculate the cell volume, which in turn is used to calculate concentrations.
observed RNA degradation rates (half-lives)
endoRNAse counts
endoRNAse catalytic rate constants
RNA counts
boolean options that enable sensitivity analyses (see Notes below)
Modifies
Michaelis-Menten constants for first-order decay (initially set to zeros)
- Several optimization-related values
Sensitivity analyses (optional, see Notes below) Terminal values for optimization-related functions
- rtype:
enoRNAse Km values, in units of M
Notes
If certain options are set, a sensitivity analysis will be performed using a range of metaparameters. Outputs will be cached and utilized instead of running the optimization if possible. The function that generates the optimization functions is defined under sim_data but has no dependency on sim_data, and therefore could be moved here or elsewhere. (TODO)
TODO (John): Refactor as a pure function. TODO (John): Why is this function called ‘cooperative’? It seems to instead
assume and model competitive binding.
- TODO (John): Determine what part (if any) of the ‘linear’ parameter fitting
should be retained.
- reconstruction.ecoli.fit_sim_data_1.setProteinDegRates(sim_data)[source]
This function’s goal is to set the degradation rates for a subset of proteins. It first gathers the index of the proteins it wants to modify, then changes the degradation rates of those proteins. These adjustments were made so that the simulation could run.
Requires
For each protein that needs to be modified it take in an adjustment factor.
Modifies
This function modifies the protein degradation rates for the chosen proteins in sim_data.
It takes their current degradation rate and multiplies them by the factor specified in adjustments.
- reconstruction.ecoli.fit_sim_data_1.setRNADegRates(sim_data)[source]
This function’s goal is to adjust the degradation rates for a subset of metabolic RNA’s. It first gathers the index of the RNA’s it wants to modify, then changes the degradation rates of those RNAs. If the specified ID is that of an RNA cistron, the degradation rates of all RNA molecules containing the cistron are adjusted. (Note: since RNA concentrations are assumed to be in equilibrium, increasing the degradation rate increases the synthesis rates of these RNAs)
Requires
For each RNA that needs to be modified, it takes in an adjustment factor
Modifies
This function modifies the RNA degradation rates for the chosen RNAs in
sim_data. It takes their current degradation rate and multiplies them by the factor specified in adjustments.
- reconstruction.ecoli.fit_sim_data_1.setRNAExpression(sim_data)[source]
This function’s goal is to set expression levels for a subset of RNAs. It first gathers the index of the RNA’s it wants to modify, then changes the expression levels of those RNAs, within sim_data, based on the specified adjustment factor. If the specified ID is an RNA cistron, the expression levels of all RNA molecules containing the cistron are adjusted.
Requires
For each RNA that needs to be modified, it takes in an adjustment factor.
Modifies
This function modifies the basal RNA expression levels set in sim_data,
for the chosen RNAs. It takes their current basal expression and multiplies them by the factor specified in adjustments. - After updating the basal expression levels for the given RNAs, the function normalizes all the basal expression levels.
- reconstruction.ecoli.fit_sim_data_1.setRNAPCountsConstrainedByPhysiology(sim_data, bulkContainer, doubling_time, avgCellDryMassInit, variable_elongation_transcription, Km=None)[source]
Set counts of RNA polymerase based on two constraints: (1) Number of RNAP subunits required to maintain steady state of mRNAs (2) Expected RNAP subunit counts based on (mRNA) distribution recorded in
bulkContainer
Inputs
- bulkContainer (np.ndarray object) - Two columns: ‘id’ for name and ‘count’
for count of all bulk molecules
doubling_time (float with units of time) - doubling time given the condition
avgCellDryMassInit (float with units of mass) - expected initial dry cell mass
Km (array of floats with units of mol/volume) - Km for each RNA associated
with RNases
Modifies
- bulkContainer (np.ndarray object) - the counts of RNA polymerase
subunits are set according to Constraint 1
Notes
Constraint 2 is not being used – see final line of this function.
- reconstruction.ecoli.fit_sim_data_1.setRibosomeCountsConstrainedByPhysiology(sim_data, bulkContainer, doubling_time, variable_elongation_translation)[source]
Set counts of ribosomal protein subunits based on three constraints: (1) Expected protein distribution doubles in one cell cycle (2) Measured rRNA mass fractions (3) Expected ribosomal protein subunit counts based on RNA expression data
Inputs
- bulkContainer (np.ndarray object) - Two columns: ‘id’ for name and ‘count’
for count of all bulk molecules
doubling_time (float with units of time) - doubling time given the condition variable_elongation_translation (bool) - whether there is variable elongation for translation
Modifies
counts of ribosomal protein subunits in bulkContainer
- reconstruction.ecoli.fit_sim_data_1.setTranslationEfficiencies(sim_data)[source]
This function’s goal is to set translation efficiencies for a subset of metabolic proteins. It first gathers the index of the proteins it wants to modify, then changes the monomer translation efficiencies based on the adjustment that is specified. These adjustments were made so that the simulation could run.
Requires
For each protein that needs to be modified, it takes in an adjustment factor.
Modifies
This function modifies, for a subset of proteins, their translational efficiencies in sim_data.
It takes their current efficiency and multiplies them by the factor specified in adjustments.
- reconstruction.ecoli.fit_sim_data_1.set_balanced_translation_efficiencies(sim_data)[source]
Sets the translation efficiencies of a group of proteins to be equal to the mean value of all proteins within the group.
Requires
List of proteins that should have balanced translation efficiencies.
Modifies
Translation efficiencies of proteins within each specified group.
- reconstruction.ecoli.fit_sim_data_1.tf_condition_specs(sim_data, cell_specs, cpus=1, disable_ribosome_capacity_fitting=False, disable_rnapoly_capacity_fitting=False, variable_elongation_transcription=True, variable_elongation_translation=False, **kwargs)[source]
- reconstruction.ecoli.fit_sim_data_1.totalCountFromMassesAndRatios(totalMass, individualMasses, distribution)[source]
Function to determine the expected total counts for a group of molecules in order to achieve a total mass with a given distribution of individual molecules.
- Math:
Total mass = dot(mass, count)
Fraction of i: f = count / Total counts
Substituting: Total mass = dot(mass, f * Total counts) Total mass = Total counts * dot(mass, f)
Total counts = Total mass / dot(mass, f)
Requires
totalMass (float with mass units): total mass of the group of molecules
individualMasses (array of floats with mass units): mass for individual
molecules in the group - distribution (array of floats): distribution of individual molecules, normalized to 1
- returns:
- counts (float)
- rtype:
total counts (does not need to be a whole number)
- reconstruction.ecoli.fit_sim_data_1.totalCountIdDistributionProtein(sim_data, expression, doubling_time)[source]
Calculates the total counts of proteins from the relative expression of RNA, individual protein mass, and total protein mass. Relies on the math functions netLossRateFromDilutionAndDegradationProtein, proteinDistributionFrommRNA, totalCountFromMassesAndRatios.
Inputs
expression (array of floats) - relative frequency distribution of RNA expression
doubling_time (float with units of time) - measured doubling time given the condition
- returns:
- total_count_protein (float) - total number of proteins
- ids_protein (array of str) - name of each protein with location tag
- distribution_protein (array of floats) - distribution for each protein,
normalized to 1
- reconstruction.ecoli.fit_sim_data_1.totalCountIdDistributionRNA(sim_data, expression, doubling_time)[source]
Calculates the total counts of RNA from their relative expression, individual mass, and total RNA mass. Relies on the math function totalCountFromMassesAndRatios.
Inputs
- expression (array of floats) - relative frequency distribution of RNA
expression
- doubling_time (float with units of time) - measured doubling time given
the condition
- returns:
- total_count_RNA (float) - total number of RNAs
- ids_rnas (array of str) - name of each RNA with location tag
- distribution_RNA (array of floats) - distribution for each RNA, – normalized to 1