Stores
Stores are upgraded dictionaries that store the simulation state. The descriptions for each type of store below are prefaced by a series of relevant attributes that consist of:
- Path:
Location of the store for use in process topologies
- Updater:
Function used to apply updates to the store
- Divider:
Function used to split the store during cell division
- Serializer:
Instance of
vivarium.core.registry.Serializer
used to serialize store data before being emitted- Schema:
Helper function for store in
ports_schema
methods- Helpers:
Other useful helper functions
Bulk Molecules
- Path:
('bulk',)
- Updater:
- Divider:
- Serializer:
- Schema:
- Helpers:
ecoli.library.schema.bulk_name_to_idx()
,ecoli.library.schema.counts()
Warning
vivarium-core
does not copy store values so processes must
be careful not to unintentionally modify mutable values. The bulk
molecules array is read-only for extra protection (see WRITEABLE
flag in numpy.ndarray.flags
).
Note
Bulk molecules are named as such because they represent species for which all molecules are treated as interchangeable (e.g. water).
The bulk molecules store consists of a structured Numpy array with the following named fields:
id
(str
): Names of bulk molecules as pulled from EcoCycEach end with a bracketed “location tag” (e.g.
[c]
) containing one of the abbreviations defined in thereconstruction/ecoli/flat/compartments.tsv
file (see Cell Component Ontology)
count
(numpy.int64
): Counts of bulk moleculesInstead of the full structured array, the
evolve_state
method of partitioned processes receive a one-dimensional array of partitioned counts (see Partitioning).
{}_submass
(numpy.float64
): Field for each submassEight submasses are rRNA, tRNA, mRNA, miscRNA, nonspecific_RNA, protein, metabolite, water, DNA
Initialization
To create the initial value for this store, the model will go through the following three options in order:
- Load custom initial state
Set
initial_state
option forecoli_master_sim
- Load from saved state JSON
Set
initial_state_file
option forecoli_master_sim
- Generate from
sim_data
generate_initial_state()
uses thesim_data
object generated by the ParCa to calculate initial state
Partitioning
Motivation
To support the use of independent sub-models for different biological processes (e.g. FBA for metabolism, Gillespie for complexation, etc.), the model allows processes to run mostly independently. At a high level, over the course of a single simulation step, each process will see the simulation state as it was before any other process has run. Each process will then calculate an update to apply to the simulation state and all updates will be simultaneously applied once all processes have run.
This setup has a potential problem: two processes may both decide to deplete
the count of the same molecule, resulting in a final count that is negative.
To prevent this from happening, the model forces processes to communicate
their bulk molecule requests to a special allocator process
(Allocator
). The allocator process
will divide the bulk molecules so that each process sees a functional count
that is proportional to their request.
For example, if process A requests 100 of molecule X and process B requests 400 of molecule X but the cell only has 400 molecules of X, the allocator will divde the molecules as follows:
Process A: \(\frac{100}{100 + 400} * 400 = 80\) molecules of X
Process B: \(\frac{400}{100 + 400} * 400 = 320\) molecules of X
Note
Processes in the model are more dependent on one another than in this simplified example.
For example, since molecule binding and complexation
events occur on timescales much shorter than the default 1 second
simulation timestep, we run TfUnbinding
,
update the simulation state, then run
Equilibrium
and
TwoComponentSystem
,
update the simulation state, and finally run
TfBinding
,
and update the simulation state. This allows transcription factors that are
currently bound to promoters a chance to form complexes or participate in
other reactions, better reflecting the transient binding dynamics of real cells.
Steps and Flows
To allow processes to run with a pre-specified order within
each timestep, we can make use of a special subclass of the typical Vivarium
Process
class:
Step
. All “processes” in the model
are actually instances of Step
. These Steps
are configured to run in user-configured “execution layers” by way of a flow
that is included in the simulation configuration (see
ecoli_master_sim
).
A flow
is a dictionary that specifies the dependencies for each Step. For
example, if a user wants Step B to run only after Step A has updated the
simulation state, the user can include Step A as a dependency of Step B:
{
"Step B": [("Step A",)]
}
Note
Dependencies must be in the form of paths like those that you would find in a topology.
Vivarium will parse the flow
to construct a directed acyclic graph
and figure out the order in which to run steps by stratifying them into
“execution layers”. For example, consider the following flow
:
{
"Step B": [("Step A",)],
"Step C": [("Step A",)],
"Step D": [("Step C",)]
}
Vivarium will parse this into the following sequence of execution layers:
Step A
Step B and Step C (order does not matter)
Step D
Each timestep, Step A will run and update the simulation state, Steps B and C will run with a view of the state that was updated by Step A, and finally Step D will run with a view of the state that was updated by every other step.
Implementation
All partitioned processes are instances of the
PartitionedProcess
class. This both
serves to identify the processes that require partitioning and also implements
a standard next_update
method that allows these processes to be run on
their own (as in
migration tests).
Warning
In instances of PartitionedProcess
,
all ports connected to the bulk molecule store MUST be called
bulk
to be properly partitioned. Conversely, ports that are not meant
to be partitioned should NEVER be called bulk
in any
PartitionedProcess
.
In the model, each partitioned process is used to create two separate steps:
a Requester
and an
Evolver
. For each execution layer
in the flow
given to EcoliSim
,
Ecoli
will arrange the requesters and
evolvers into four execution layers in the final model:
- Requesters:
Each will call the
calculate_request()
method of aPartitionedProcess
in said layer and write its requests to a process-specificrequest
store
- Allocator:
An instance of
Allocator
that reads allrequest
stores for processes in execution layer, proportionally allocates bulk molecules to processes according to requests, and writes allocated counts to process-specificallocate
stores
- Evolvers:
Each will replace all views into the
bulk
store with the counts allocated to its correspondingPartitionedProcess
in itsallocate
store, call theevolve_state()
method of itsPartitionedProcess
, update the bulk molecule counts, and send unique molecule updates to be accumulated by each unique molecule updater (seeUniqueNumpyUpdater
)
- Unique updater:
An instance of
UniqueUpdate
that tells unique molecule updaters to apply accumulated updates (seeUniqueNumpyUpdater
for details)
Note
The Requester
and
Evolver
for each partitioned process
share the same PartitionedProcess
instance. This allows instance variables
(see
aa_supply
for an example in
PolypeptideElongation
)
to be updated and shared between the
calculate_request()
and evolve_state()
methods of each PartitionedProcess
.
Accessing Non-partitioned Counts
There are certain processes that require access to the total, non-partitioned
count of certain bulk molecules. For example,
Metabolism
needs to know the total
counts to all amino acids to accurately implement tRNA charging. To give these
processes access to non-partitioned counts, an additional port is added to
their ports_schema
methods and topologies that is also connected to the
bulk molecules store. By convention, this port is called bulk_total
to
differentiate it from the partitioned bulk
port. Evolvers will overwrite
the partitioned bulk
port with the allocated bulk molecule counts while
leaving the bulk_total
port untouched, giving their associated
PartitionedProcess
instances access to
the unpartitioned bulk molecule counts in their
evolve_state()
methods.
Indexing
Processes typically use the ecoli.library.schema.bulk_name_to_idx()
helper function
to get the indices for a set of molecules (e.g. all NTPs). These indices are typically cached
as instance attributes (e.g. self.ntp_idx
) in the next_update
method of a process.
Though counts can be directly retrieved from the Numpy structured array (e.g.
states['bulk']['count'][self.ntp_idx]
), partitioned processes do not have access to the
Numpy structured array in their evolve_state
methods due to how partitioning was
implemented in the model (see Implementation). To standardize count
access across processes, the helper function
ecoli.library.schema.counts()
can handle both of these scenarios and
also guarantees that the returned counts can be safely edited without
unintentionally mutating the source array.
Unique Molecules
- Path:
('unique',)
- Updater:
- Dividers:
- Serializer:
- Schema:
- Helpers:
Warning
vivarium-core
does not copy store values so processes must
be careful not to unintentionally modify mutable values. Each unique
molecule array is read-only for extra protection (see WRITEABLE
flag in numpy.ndarray.flags
).
Note
Unique molecules are named as such because they represent species for which individual molecules are not treated as interchangeable (e.g. different RNA molecules may have different sequences).
The unique molecules store contains a substore for each unique molecule (e.g.
RNA, active RNAP, etc.). Each unique molecule substore contains a
structured Numpy array
with a variety of named fields, each representing an attribute of interest
for that class of unique molecules (e.g. coordinates
for a gene
unique
molecule). All unique molecules will have the following named fields:
unique_index
(int
): Unique identifier for each unique moleculeWhen processes add new unique molecules, the helper function
ecoli.library.schema.create_unqiue_indexes()
is used to generate unique indices for each molecule to be added.
_entryState
(numpy.int8
): 1 for active row, 0 for inactive rowWhen unique molecules are deleted (e.g. RNA degradation), all of their data, including the
_entryState
field, is set to 0. When unique molecues are added (e.g. RNA transcription), the updater places the data for these new molecules into the rows that are identified as inactive by the helper functionecoli.library.schema.get_free_indices()
, which also grows the array if necessary.
massDiff_{}
(numpy.float64
): Field for each dynamic submassThe eight submasses are rRNA, tRNA, mRNA, miscRNA, nonspecific_RNA, protein, metabolite, water, and DNA. An example of a dynamic submass is the constantly changing protein mass of the polypeptide associated with an actively translating ribosome.
Initialization
See Initialization.
Accessing
Processes use the ecoli.library.schema.attrs()
helper function to access
any number of attributes for all active (_entryState
is 1) unique molecules
of a given type (e.g. RNA, active RNAP, etc.).