Stores

Stores are upgraded dictionaries that store the simulation state. The descriptions for each type of store below are prefaced by a series of relevant attributes that consist of:

Path:

Location of the store for use in process topologies

Updater:

Function used to apply updates to the store

Divider:

Function used to split the store during cell division

Serializer:

Instance of vivarium.core.registry.Serializer used to serialize store data before being emitted

Schema:

Helper function for store in ports_schema methods

Helpers:

Other useful helper functions

Bulk Molecules

Path:

('bulk',)

Updater:

ecoli.library.schema.bulk_numpy_updater()

Divider:

ecoli.library.schema.divide_bulk()

Serializer:

ecoli.library.schema.get_bulk_counts

Schema:

ecoli.library.schema.numpy_schema()

Helpers:

ecoli.library.schema.bulk_name_to_idx(), ecoli.library.schema.counts()

Warning

vivarium-core does not copy store values so processes must be careful not to unintentionally modify mutable values. The bulk molecules array is read-only for extra protection (see WRITEABLE flag in numpy.ndarray.flags).

Note

Bulk molecules are named as such because they represent species for which all molecules are treated as interchangeable (e.g. water).

The bulk molecules store consists of a structured Numpy array with the following named fields:

  1. id (str): Names of bulk molecules as pulled from EcoCyc

    Each end with a bracketed “location tag” (e.g. [c]) containing one of the abbreviations defined in the reconstruction/ecoli/flat/compartments.tsv file (see Cell Component Ontology)

  2. count (numpy.int64): Counts of bulk molecules

    Instead of the full structured array, the evolve_state method of partitioned processes receive a one-dimensional array of partitioned counts (see Partitioning).

  3. {}_submass (numpy.float64): Field for each submass

    Eight submasses are rRNA, tRNA, mRNA, miscRNA, nonspecific_RNA, protein, metabolite, water, DNA

Initialization

To create the initial value for this store, the model will go through the following three options in order:

  1. Load custom initial state

    Set initial_state option for ecoli_master_sim

  2. Load from saved state JSON

    Set initial_state_file option for ecoli_master_sim

  3. Generate from sim_data

    generate_initial_state() uses the sim_data object generated by the ParCa to calculate initial state

Partitioning

Motivation

To support the use of independent sub-models for different biological processes (e.g. FBA for metabolism, Gillespie for complexation, etc.), the model allows processes to run mostly independently. At a high level, over the course of a single simulation step, each process will see the simulation state as it was before any other process has run. Each process will then calculate an update to apply to the simulation state and all updates will be simultaneously applied once all processes have run.

This setup has a potential problem: two processes may both decide to deplete the count of the same molecule, resulting in a final count that is negative. To prevent this from happening, the model forces processes to communicate their bulk molecule requests to a special allocator process (Allocator). The allocator process will divide the bulk molecules so that each process sees a functional count that is proportional to their request.

For example, if process A requests 100 of molecule X and process B requests 400 of molecule X but the cell only has 400 molecules of X, the allocator will divde the molecules as follows:

  • Process A: \(\frac{100}{100 + 400} * 400 = 80\) molecules of X

  • Process B: \(\frac{400}{100 + 400} * 400 = 320\) molecules of X

Note

Processes in the model are more dependent on one another than in this simplified example.

For example, since molecule binding and complexation events occur on timescales much shorter than the default 1 second simulation timestep, we run TfUnbinding, update the simulation state, then run Equilibrium and TwoComponentSystem, update the simulation state, and finally run TfBinding, and update the simulation state. This allows transcription factors that are currently bound to promoters a chance to form complexes or participate in other reactions, better reflecting the transient binding dynamics of real cells.

Steps and Flows

To allow processes to run with a pre-specified order within each timestep, we can make use of a special subclass of the typical Vivarium Process class: Step. All “processes” in the model are actually instances of Step. These Steps are configured to run in user-configured “execution layers” by way of a flow that is included in the simulation configuration (see ecoli_master_sim).

A flow is a dictionary that specifies the dependencies for each Step. For example, if a user wants Step B to run only after Step A has updated the simulation state, the user can include Step A as a dependency of Step B:

{
    "Step B": [("Step A",)]
}

Note

Dependencies must be in the form of paths like those that you would find in a topology.

Vivarium will parse the flow to construct a directed acyclic graph and figure out the order in which to run steps by stratifying them into “execution layers”. For example, consider the following flow:

{
    "Step B": [("Step A",)],
    "Step C": [("Step A",)],
    "Step D": [("Step C",)]
}

Vivarium will parse this into the following sequence of execution layers:

  1. Step A

  2. Step B and Step C (order does not matter)

  3. Step D

Each timestep, Step A will run and update the simulation state, Steps B and C will run with a view of the state that was updated by Step A, and finally Step D will run with a view of the state that was updated by every other step.

Implementation

All partitioned processes are instances of the PartitionedProcess class. This both serves to identify the processes that require partitioning and also implements a standard next_update method that allows these processes to be run on their own (as in migration tests).

Warning

In instances of PartitionedProcess, all ports connected to the bulk molecule store MUST be called bulk to be properly partitioned. Conversely, ports that are not meant to be partitioned should NEVER be called bulk in any PartitionedProcess.

In the model, each partitioned process is used to create two separate steps: a Requester and an Evolver. For each execution layer in the flow given to EcoliSim, Ecoli will arrange the requesters and evolvers into four execution layers in the final model:

  1. Requesters:

    Each will call the calculate_request() method of a PartitionedProcess in said layer and write its requests to a process-specific request store

  2. Allocator:

    An instance of Allocator that reads all request stores for processes in execution layer, proportionally allocates bulk molecules to processes according to requests, and writes allocated counts to process-specific allocate stores

  3. Evolvers:

    Each will replace all views into the bulk store with the counts allocated to its corresponding PartitionedProcess in its allocate store, call the evolve_state() method of its PartitionedProcess, update the bulk molecule counts, and send unique molecule updates to be accumulated by each unique molecule updater (see UniqueNumpyUpdater)

  4. Unique updater:

    An instance of UniqueUpdate that tells unique molecule updaters to apply accumulated updates (see UniqueNumpyUpdater for details)

Note

The Requester and Evolver for each partitioned process share the same PartitionedProcess instance. This allows instance variables (see aa_supply for an example in PolypeptideElongation) to be updated and shared between the calculate_request() and evolve_state() methods of each PartitionedProcess.

Accessing Non-partitioned Counts

There are certain processes that require access to the total, non-partitioned count of certain bulk molecules. For example, Metabolism needs to know the total counts to all amino acids to accurately implement tRNA charging. To give these processes access to non-partitioned counts, an additional port is added to their ports_schema methods and topologies that is also connected to the bulk molecules store. By convention, this port is called bulk_total to differentiate it from the partitioned bulk port. Evolvers will overwrite the partitioned bulk port with the allocated bulk molecule counts while leaving the bulk_total port untouched, giving their associated PartitionedProcess instances access to the unpartitioned bulk molecule counts in their evolve_state() methods.

Indexing

Processes typically use the ecoli.library.schema.bulk_name_to_idx() helper function to get the indices for a set of molecules (e.g. all NTPs). These indices are typically cached as instance attributes (e.g. self.ntp_idx) in the next_update method of a process.

Though counts can be directly retrieved from the Numpy structured array (e.g. states['bulk']['count'][self.ntp_idx]), partitioned processes do not have access to the Numpy structured array in their evolve_state methods due to how partitioning was implemented in the model (see Implementation). To standardize count access across processes, the helper function ecoli.library.schema.counts() can handle both of these scenarios and also guarantees that the returned counts can be safely edited without unintentionally mutating the source array.

Unique Molecules

Path:

('unique',)

Updater:

ecoli.library.schema.UniqueNumpyUpdater.updater()

Dividers:

See ecoli.library.schema.UNIQUE_DIVIDERS

Serializer:

ecoli.library.schema.get_unique_fields

Schema:

ecoli.library.schema.numpy_schema()

Helpers:

ecoli.library.schema.attrs()

Warning

vivarium-core does not copy store values so processes must be careful not to unintentionally modify mutable values. Each unique molecule array is read-only for extra protection (see WRITEABLE flag in numpy.ndarray.flags).

Note

Unique molecules are named as such because they represent species for which individual molecules are not treated as interchangeable (e.g. different RNA molecules may have different sequences).

The unique molecules store contains a substore for each unique molecule (e.g. RNA, active RNAP, etc.). Each unique molecule substore contains a structured Numpy array with a variety of named fields, each representing an attribute of interest for that class of unique molecules (e.g. coordinates for a gene unique molecule). All unique molecules will have the following named fields:

  1. unique_index (int): Unique identifier for each unique molecule

    When processes add new unique molecules, the helper function ecoli.library.schema.create_unqiue_indexes() is used to generate unique indices for each molecule to be added.

  2. _entryState (numpy.int8): 1 for active row, 0 for inactive row

    When unique molecules are deleted (e.g. RNA degradation), all of their data, including the _entryState field, is set to 0. When unique molecues are added (e.g. RNA transcription), the updater places the data for these new molecules into the rows that are identified as inactive by the helper function ecoli.library.schema.get_free_indices(), which also grows the array if necessary.

  3. massDiff_{} (numpy.float64): Field for each dynamic submass

    The eight submasses are rRNA, tRNA, mRNA, miscRNA, nonspecific_RNA, protein, metabolite, water, and DNA. An example of a dynamic submass is the constantly changing protein mass of the polypeptide associated with an actively translating ribosome.

Initialization

See Initialization.

Accessing

Processes use the ecoli.library.schema.attrs() helper function to access any number of attributes for all active (_entryState is 1) unique molecules of a given type (e.g. RNA, active RNAP, etc.).