validation.ecoli.validation_data

ValidationData for Ecoli

Raw data processed into forms convienent for validation and analysis

class validation.ecoli.validation_data.Constants(raw_data)[source]

Bases: object

_build_constants(raw_data)[source]
class validation.ecoli.validation_data.EssentialGenes(validation_data_raw)[source]

Bases: object

_load_essential_genes(validation_data_raw)[source]
class validation.ecoli.validation_data.GeneFunctions(validation_data_raw)[source]

Bases: object

_loadGeneFunctions(validation_data_raw)[source]
class validation.ecoli.validation_data.GetterFunctions(raw_data, sim_data)[source]

Bases: object

getterFunctions

_build_all_masses(raw_data, sim_data)[source]

Builds dictionary of molecular weights keyed with the IDs of molecules. Depending on the type of the molecule, weights are either pulled directly from raw data, or dynamically calculated from the existing data.

_build_compartments(raw_data, sim_data)[source]
_build_full_chromosome_mass(raw_data, sim_data)[source]

Calculates the mass of the full chromosome from its sequence and the weigths of polymerized dNTPs.

_build_genomic_coordinates(raw_data)[source]

Builds a dictionary of genomic coordinates of DNA sites. Keys are the IDs of the sites, and the values are tuples of the left-end coordinate and the right-end coordinate of the site. Sites whose types are included in IGNORED_DNA_SITE_TYPES are ignored.

_build_modified_protein_masses(raw_data, sim_data)[source]

Builds dictionary of molecular weights of modified proteins keyed with the molecule IDs. Molecular weights are calculated from the stoichiometries of the modification reactions.

_build_modified_rna_masses(raw_data)[source]

Builds dictionary of molecular weights of modified RNAs keyed with the molecule IDs. Molecular weights are calculated from the stoichiometries of the modification reactions.

_build_polymerized_subunit_masses(sim_data)[source]

Builds dictionary of molecular weights of polymerized subunits (NTPs, dNTPs, and amino acids) by subtracting the end weights from the weights of original subunits.

_build_protein_complex_compartments(raw_data, all_compartments_sorted)[source]

Builds dictionary of compartment tags for protein complexes keyed with the molecule IDs. Each complex is assigned to the compartment that contains a subunit of the complex and has the highest complexation priority. Compartment tags of subunits that are also protein complexes are determined recursively. Compartments of metabolite subunits are not considered since metabolites are assigned to all compartments that are being modeled.

_build_protein_complex_masses(raw_data)[source]

Builds dictionary of molecular weights of protein complexes keyed with the molecule IDs. Molecular weights are calculated from the stoichiometries of the complexation/equilibrium reactions. For complexes whose subunits are also complexes, the molecular weights are calculated recursively.

_build_protein_masses(raw_data, sim_data)[source]

Builds dictionary of molecular weights of protein monomers keyed with the protein IDs. Molecular weights are calculated from the protein sequence and the weights of polymerized amino acids.

_build_protein_sequences(raw_data)[source]

Builds the amino acid sequences of each protein monomer using sequences in raw_data.

_build_rna_masses(raw_data, sim_data)[source]

Builds dictionary of molecular weights of RNAs keyed with the RNA IDs. Molecular weights are calculated from the RNA sequence and the weights of polymerized NTPs.

_build_rna_sequences(raw_data)[source]

Builds nucleotide sequences of each transcription unit using the genome sequence and the left and right end positions.

_build_sequences(raw_data)[source]

Builds sequences of RNAs and proteins.

_build_submass_array(mw, submass_name)[source]

Converts a scalar molecular weight value to an array of submasses, given the name of the submass category that the molecule belongs to.

Parameters:
Return type:

ndarray[Any, dtype[float64]]

_compartment_tag = re.compile('\\[[a-z]]')
get_all_valid_molecules()[source]

Returns a list of all molecule IDs with assigned submass arrays and compartments.

Return type:

list[str]

get_compartment(mol_id)[source]

Returns the list of one-letter codes for the compartments that the molecule with the given ID can exist in.

Parameters:

mol_id (str)

Return type:

list[list[str]]

get_compartment_tag(id_)[source]

Look up a molecule id and return a compartment suffix tag like ‘[c]’.

Parameters:

id_ (str)

Return type:

str

get_compartments(ids)[source]

Returns a list of the list of one-letter codes for the compartments that each of the molecules with the given IDs can exist in.

Parameters:

ids (list[str] | ndarray)

Return type:

list[list[str]]

get_genomic_coordinates(site_id)[source]

Returns the genomic coordinates of the left and right ends of a DNA site given the ID of the site.

Parameters:

site_id (str)

Return type:

tuple[int, int]

get_mass(mol_id)[source]

Return the total mass of the molecule with a given ID.

Parameters:

mol_id (str)

get_masses(mol_ids)[source]

Return an array of total masses of the molecules with the given IDs.

Parameters:

mol_ids (list | ndarray)

get_miscrnas_with_singleton_tus()[source]

Returns a list of all miscRNA IDs with corresponding single-gene transcription units.

Return type:

list[str]

get_sequences(ids)[source]

Return a list of sequences of the molecules with the given IDs.

Parameters:

ids (list[str] | ndarray)

Return type:

list[str]

get_singleton_tu_id(rna_id)[source]

Returns the ID of the single-gene transcription unit corresponding to the given miscRNA ID, if such a transcription unit exists. This is necessary to replace some references to miscRNA IDs in complexation or equilibrium reactions with their corresponding TU IDs, which are the molecules that are actually transcribed.

Parameters:

rna_id (str)

Return type:

str

get_submass_array(mol_id)[source]

Return the submass array of the molecule with a given ID.

Parameters:

mol_id (str)

is_valid_molecule(mol_id)[source]

Returns True if the molecule with the given ID is a valid molecule (has both a submass array and a compartment tag).

Parameters:

mol_id (str)

Return type:

bool

class validation.ecoli.validation_data.GrowthRateParameters(raw_data, sim_data)[source]

Bases: object

_fit_ribosome_elongation_rate_by_ppgpp(ppgpp, rate)[source]
get_fraction_active_ribosome(doubling_time)[source]
get_fraction_active_rnap(doubling_time)[source]
get_ppGpp_conc(doubling_time)[source]
get_ribosome_elongation_rate(doubling_time)[source]
get_ribosome_elongation_rate_by_ppgpp(ppgpp, max_rate=None)[source]
get_rnap_elongation_rate(doubling_time)[source]
class validation.ecoli.validation_data.InternalState(raw_data, sim_data)[source]

Bases: object

Internal State

_build_bulk_molecule_specs(sim_data, molecule_ids)[source]

Builds a list of molecule IDs with compartment tags and a corresponding array of molecular masses to add to the bulk state. :param molecule_ids: List of molecule IDs w/o compartment tags :type molecule_ids: List[str]

Returns:

List of molecule IDs

with compartment tags

masses (np.ndarray): Array of molecular masses divided into

submasses

Return type:

molecule_ids_with_compartments (List[str])

_build_bulk_molecules(sim_data)[source]

Add data (IDs and mass) for all classes of bulk molecules.

_build_compartments(raw_data, sim_data)[source]
_build_unique_molecules(sim_data)[source]

Add data (name, mass, and attribute data structure) for all classes of unique molecules.

class validation.ecoli.validation_data.KnowledgeBaseEcoli(operons_on, remove_rrna_operons, remove_rrff, stable_rrna, new_genes_option='off')[source]

Bases: object

Parameters:
  • operons_on (bool)

  • remove_rrna_operons (bool)

  • remove_rrff (bool)

  • stable_rrna (bool)

  • new_genes_option (str)

_check_new_gene_ids(nested_attr)[source]

Check to ensure each new gene, RNA, and protein id starts with NG.

_get_new_gene_sequence(nested_attr)[source]

Determine genome sequnce for insertion using the sequences and relative locations of the new genes.

_join_data()[source]

Add rows that are specified in additional files. Data will only be added if all the loaded columns from both datasets match.

_load_parameters(dir_name, file_name)[source]
_load_sequence(file_path)[source]
_load_tsv(dir_name, file_name)[source]
_modify_data()[source]

Modify entires in rows that are specified to be modified. Rows must be identified by their entries in the first column (usually the ID column).

_prune_data()[source]

Remove rows that are specified to be removed. Data will only be removed if all data in a row in the file specifying rows to be removed matches the same columns in the raw data file.

_update_gene_insertion_location(nested_attr)[source]

Update insertion location of new genes to prevent conflicts.

_update_gene_locations(nested_attr, insert_pos)[source]

Modify positions of original genes based upon the insertion location of new genes. Returns end position of the gene insertion.

_update_global_coordinates(data, insert_pos, insert_len)[source]

Updates the left and right end positions for all elements in data if their positions will be impacted by the new gene insertion.

Parameters:
  • data – Data attribute to update

  • insert_pos – Location of new gene insertion

  • insert_len – Length of new gene insertion

class validation.ecoli.validation_data.Mass(raw_data, sim_data)[source]

Bases: object

_build_CD_periods(raw_data, sim_data)[source]
_build_constants(raw_data, sim_data)[source]
_build_dependent_constants()[source]
_build_submasses(raw_data, sim_data)[source]
_build_trna_data(raw_data, sim_data)[source]
_calculateGrowthRateDependentDnaMass(doubling_time)[source]
_clipTau_d(doubling_time)[source]
_getFitParameters(dry_mass_composition, mass_fraction_name)[source]
getBiomassAsConcentrations(doubling_time, rp_ratio=None)[source]
get_avg_cell_dry_mass(doubling_time)[source]

Gets the dry mass for an average cell at the given doubling time.

Parameters:

doubling_time (Unum) – expected doubling time

Returns:

average cell dry mass

Return type:

Unum

get_basal_rna_fractions()[source]

Measured RNA subgroup mass fractions. Fractions should change in other conditions with growth rate (see transcription.get_rna_fractions()).

get_component_masses(doubling_time)[source]
get_dna_critical_mass(doubling_time)[source]

Returns the critical mass for replication initiation. Faster growing cells maintain a consistent initiation mass but slower growing cells are smaller and will never reach this mass so it needs to be adjusted lower for them.

Parameters:

doubling_time (Unum) – expected doubling time of cell

Returns:

Critical mass for DNA replication initiation

Return type:

Unum

get_mass_fractions(doubling_time)[source]
get_mass_fractions_from_rna_protein_ratio(ratio)[source]
get_trna_distribution(doubling_time)[source]
class validation.ecoli.validation_data.MoleculeGroups(raw_data, sim_data)[source]

Bases: object

Helper class to extract molecule IDs of “special” groups of molecules. All values returned are lists of strings.

_build_molecule_groups(raw_data, sim_data)[source]
class validation.ecoli.validation_data.MoleculeIds(raw_data, sim_data)[source]

Bases: object

Helper class to extract molecule IDs of “special” molecules. All values returned are strings.

_buildMoleculeIds()[source]
class validation.ecoli.validation_data.Process(raw_data, sim_data)[source]

Bases: object

class validation.ecoli.validation_data.Protein(validation_data_raw, knowledge_base_raw)[source]

Bases: object

_loadHouser2015Counts(validation_data_raw)[source]
_loadSchmidt2015Counts(validation_data_raw)[source]
_loadTaniguchi2010Counts(validation_data_raw)[source]
_loadWisniewski2014Counts(validation_data_raw, knowledge_base_raw)[source]
_load_li(validation_data_raw)[source]
class validation.ecoli.validation_data.ReactionFlux(validation_data_raw, knowledge_base_raw)[source]

Bases: object

_loadToya2010Fluxes(validation_data_raw)[source]
class validation.ecoli.validation_data.Relation(raw_data, sim_data)[source]

Bases: object

_build_RNA_to_tf_mapping(raw_data, sim_data)[source]

Builds a dictionary that maps RNA IDs to a list of all transcription factor IDs that regulate the given RNA. All TFs that target any of the constituent cistrons in the RNA are added to each list.

_build_cistron_to_monomer_mapping(raw_data, sim_data)[source]

Build a vector that can map vectors that describe a property for RNA cistrons into a vector that describes the same property for the corresponding monomers if used as an index array. Assumes that each monomer maps to a single RNA cistron (A single RNA can map to multiple monomers).

e.g. monomer_property = RNA_cistron_property[

sim_data.relation.cistron_to_monomer_mapping]

_build_monomer_to_mRNA_cistron_mapping(raw_data, sim_data)[source]

Builds a sparse matrix that can map vectors that describe a property for protein monomers into a vector that describes the same property for the corresponding mRNA cistrons if multiplied to the right of the original vector. The transformed property must be additive (i.e. if two proteins map to the same cistron, the values given for the two proteins are added to yield a value for the cistron).

The full matrix can be returned by calling monomer_to_mRNA_cistron_mapping().

_build_monomer_to_tu_mapping(raw_data, sim_data)[source]

Builds a dictionary that maps monomer IDs to a list of all transcription unit IDs that the monomer can be translated from.

_build_tf_to_RNA_mapping(raw_data, sim_data)[source]

Builds a dictionary that maps transcription factor IDs to a list of all RNA IDs that are targeted by the given TF. All RNA transcription units that contain any of the cistrons regulated by the TF are added to each list.

monomer_to_mRNA_cistron_mapping()[source]

Returns the full version of the sparse matrix built by _build_monomer_to_mRNA_cistron_mapping().

e.g. mRNA_property = sim_data.relation.monomer_to_mRNA_cistron_mapping().T.dot(

monomer_property)

class validation.ecoli.validation_data.ValidationDataEcoli[source]

Bases: object

_add_amino_acid_growth_rates(validation_data_raw)[source]

Loads growth rates with single amino acids supplemented in media.

amino_acid_media_growth_rates: dict with data from 4 replicates
{

media ID (str): {

‘mean’: mean max growth rate from 4 replicates (float with units per time) ‘std’: standard deviation for max growth rate from 4 replicates (float with units per time)

}

}

amino_acid_media_dose_dependent_growth_rates: dict with data from single measurements from 4 concentrations
{

media ID (str): {

‘conc’: concentration of the amino acid in media (np.ndarray[float] with units mol/volume) ‘growth’: max growth rate corresponding to each media concentration (float with units per time)

}

}

initialize(validation_data_raw, knowledge_base_raw)[source]