validation.ecoli.validation_data
ValidationData for Ecoli
Raw data processed into forms convienent for validation and analysis
- class validation.ecoli.validation_data.GetterFunctions(raw_data, sim_data)[source]
Bases:
object
getterFunctions
- _build_all_masses(raw_data, sim_data)[source]
Builds dictionary of molecular weights keyed with the IDs of molecules. Depending on the type of the molecule, weights are either pulled directly from raw data, or dynamically calculated from the existing data.
- _build_full_chromosome_mass(raw_data, sim_data)[source]
Calculates the mass of the full chromosome from its sequence and the weigths of polymerized dNTPs.
- _build_genomic_coordinates(raw_data)[source]
Builds a dictionary of genomic coordinates of DNA sites. Keys are the IDs of the sites, and the values are tuples of the left-end coordinate and the right-end coordinate of the site. Sites whose types are included in IGNORED_DNA_SITE_TYPES are ignored.
- _build_modified_protein_masses(raw_data, sim_data)[source]
Builds dictionary of molecular weights of modified proteins keyed with the molecule IDs. Molecular weights are calculated from the stoichiometries of the modification reactions.
- _build_modified_rna_masses(raw_data)[source]
Builds dictionary of molecular weights of modified RNAs keyed with the molecule IDs. Molecular weights are calculated from the stoichiometries of the modification reactions.
- _build_polymerized_subunit_masses(sim_data)[source]
Builds dictionary of molecular weights of polymerized subunits (NTPs, dNTPs, and amino acids) by subtracting the end weights from the weights of original subunits.
- _build_protein_complex_compartments(raw_data, all_compartments_sorted)[source]
Builds dictionary of compartment tags for protein complexes keyed with the molecule IDs. Each complex is assigned to the compartment that contains a subunit of the complex and has the highest complexation priority. Compartment tags of subunits that are also protein complexes are determined recursively. Compartments of metabolite subunits are not considered since metabolites are assigned to all compartments that are being modeled.
- _build_protein_complex_masses(raw_data)[source]
Builds dictionary of molecular weights of protein complexes keyed with the molecule IDs. Molecular weights are calculated from the stoichiometries of the complexation/equilibrium reactions. For complexes whose subunits are also complexes, the molecular weights are calculated recursively.
- _build_protein_masses(raw_data, sim_data)[source]
Builds dictionary of molecular weights of protein monomers keyed with the protein IDs. Molecular weights are calculated from the protein sequence and the weights of polymerized amino acids.
- _build_protein_sequences(raw_data)[source]
Builds the amino acid sequences of each protein monomer using sequences in raw_data.
- _build_rna_masses(raw_data, sim_data)[source]
Builds dictionary of molecular weights of RNAs keyed with the RNA IDs. Molecular weights are calculated from the RNA sequence and the weights of polymerized NTPs.
- _build_rna_sequences(raw_data)[source]
Builds nucleotide sequences of each transcription unit using the genome sequence and the left and right end positions.
- _build_submass_array(mw, submass_name)[source]
Converts a scalar molecular weight value to an array of submasses, given the name of the submass category that the molecule belongs to.
- _compartment_tag = re.compile('\\[[a-z]]')
- get_all_valid_molecules()[source]
Returns a list of all molecule IDs with assigned submass arrays and compartments.
- get_compartment(mol_id)[source]
Returns the list of one-letter codes for the compartments that the molecule with the given ID can exist in.
- get_compartment_tag(id_)[source]
Look up a molecule id and return a compartment suffix tag like ‘[c]’.
- get_compartments(ids)[source]
Returns a list of the list of one-letter codes for the compartments that each of the molecules with the given IDs can exist in.
- get_genomic_coordinates(site_id)[source]
Returns the genomic coordinates of the left and right ends of a DNA site given the ID of the site.
- get_mass(mol_id)[source]
Return the total mass of the molecule with a given ID.
- Parameters:
mol_id (str)
- get_miscrnas_with_singleton_tus()[source]
Returns a list of all miscRNA IDs with corresponding single-gene transcription units.
- get_singleton_tu_id(rna_id)[source]
Returns the ID of the single-gene transcription unit corresponding to the given miscRNA ID, if such a transcription unit exists. This is necessary to replace some references to miscRNA IDs in complexation or equilibrium reactions with their corresponding TU IDs, which are the molecules that are actually transcribed.
- class validation.ecoli.validation_data.GrowthRateParameters(raw_data, sim_data)[source]
Bases:
object
- class validation.ecoli.validation_data.InternalState(raw_data, sim_data)[source]
Bases:
object
Internal State
- _build_bulk_molecule_specs(sim_data, molecule_ids)[source]
Builds a list of molecule IDs with compartment tags and a corresponding array of molecular masses to add to the bulk state. :param molecule_ids: List of molecule IDs w/o compartment tags :type molecule_ids: List[str]
- Returns:
- List of molecule IDs
with compartment tags
- masses (np.ndarray): Array of molecular masses divided into
submasses
- Return type:
molecule_ids_with_compartments (List[str])
- class validation.ecoli.validation_data.KnowledgeBaseEcoli(operons_on, remove_rrna_operons, remove_rrff, stable_rrna, new_genes_option='off')[source]
Bases:
object
- Parameters:
- _check_new_gene_ids(nested_attr)[source]
Check to ensure each new gene, RNA, and protein id starts with NG.
- _get_new_gene_sequence(nested_attr)[source]
Determine genome sequnce for insertion using the sequences and relative locations of the new genes.
- _join_data()[source]
Add rows that are specified in additional files. Data will only be added if all the loaded columns from both datasets match.
- _modify_data()[source]
Modify entires in rows that are specified to be modified. Rows must be identified by their entries in the first column (usually the ID column).
- _prune_data()[source]
Remove rows that are specified to be removed. Data will only be removed if all data in a row in the file specifying rows to be removed matches the same columns in the raw data file.
- _update_gene_insertion_location(nested_attr)[source]
Update insertion location of new genes to prevent conflicts.
- _update_gene_locations(nested_attr, insert_pos)[source]
Modify positions of original genes based upon the insertion location of new genes. Returns end position of the gene insertion.
- _update_global_coordinates(data, insert_pos, insert_len)[source]
Updates the left and right end positions for all elements in data if their positions will be impacted by the new gene insertion.
- Parameters:
data – Data attribute to update
insert_pos – Location of new gene insertion
insert_len – Length of new gene insertion
- class validation.ecoli.validation_data.Mass(raw_data, sim_data)[source]
Bases:
object
- get_avg_cell_dry_mass(doubling_time)[source]
Gets the dry mass for an average cell at the given doubling time.
- Parameters:
doubling_time (Unum) – expected doubling time
- Returns:
average cell dry mass
- Return type:
Unum
- get_basal_rna_fractions()[source]
Measured RNA subgroup mass fractions. Fractions should change in other conditions with growth rate (see transcription.get_rna_fractions()).
- get_dna_critical_mass(doubling_time)[source]
Returns the critical mass for replication initiation. Faster growing cells maintain a consistent initiation mass but slower growing cells are smaller and will never reach this mass so it needs to be adjusted lower for them.
- Parameters:
doubling_time (Unum) – expected doubling time of cell
- Returns:
Critical mass for DNA replication initiation
- Return type:
Unum
- class validation.ecoli.validation_data.MoleculeGroups(raw_data, sim_data)[source]
Bases:
object
Helper class to extract molecule IDs of “special” groups of molecules. All values returned are lists of strings.
- class validation.ecoli.validation_data.MoleculeIds(raw_data, sim_data)[source]
Bases:
object
Helper class to extract molecule IDs of “special” molecules. All values returned are strings.
- class validation.ecoli.validation_data.Protein(validation_data_raw, knowledge_base_raw)[source]
Bases:
object
- class validation.ecoli.validation_data.ReactionFlux(validation_data_raw, knowledge_base_raw)[source]
Bases:
object
- class validation.ecoli.validation_data.Relation(raw_data, sim_data)[source]
Bases:
object
- _build_RNA_to_tf_mapping(raw_data, sim_data)[source]
Builds a dictionary that maps RNA IDs to a list of all transcription factor IDs that regulate the given RNA. All TFs that target any of the constituent cistrons in the RNA are added to each list.
- _build_cistron_to_monomer_mapping(raw_data, sim_data)[source]
Build a vector that can map vectors that describe a property for RNA cistrons into a vector that describes the same property for the corresponding monomers if used as an index array. Assumes that each monomer maps to a single RNA cistron (A single RNA can map to multiple monomers).
e.g. monomer_property = RNA_cistron_property[
sim_data.relation.cistron_to_monomer_mapping]
- _build_monomer_to_mRNA_cistron_mapping(raw_data, sim_data)[source]
Builds a sparse matrix that can map vectors that describe a property for protein monomers into a vector that describes the same property for the corresponding mRNA cistrons if multiplied to the right of the original vector. The transformed property must be additive (i.e. if two proteins map to the same cistron, the values given for the two proteins are added to yield a value for the cistron).
The full matrix can be returned by calling monomer_to_mRNA_cistron_mapping().
- _build_monomer_to_tu_mapping(raw_data, sim_data)[source]
Builds a dictionary that maps monomer IDs to a list of all transcription unit IDs that the monomer can be translated from.
- class validation.ecoli.validation_data.ValidationDataEcoli[source]
Bases:
object
- _add_amino_acid_growth_rates(validation_data_raw)[source]
Loads growth rates with single amino acids supplemented in media.
- amino_acid_media_growth_rates: dict with data from 4 replicates
- {
media ID (str): {
‘mean’: mean max growth rate from 4 replicates (float with units per time) ‘std’: standard deviation for max growth rate from 4 replicates (float with units per time)
}
}
- amino_acid_media_dose_dependent_growth_rates: dict with data from single measurements from 4 concentrations
- {
media ID (str): {
‘conc’: concentration of the amino acid in media (np.ndarray[float] with units mol/volume) ‘growth’: max growth rate corresponding to each media concentration (float with units per time)
}
}