======
Stores
======

Stores are upgraded dictionaries that store the simulation state. The 
descriptions for each type of store below are prefaced by a series of 
relevant attributes that consist of:

:Path: Location of the store for use in process topologies
:Updater: Function used to apply updates to the store
:Divider: Function used to split the store during cell division
:Serializer: Instance of :py:class:`vivarium.core.registry.Serializer` 
    used to serialize store data before being emitted
:Schema: Helper function for store in ``ports_schema`` methods
:Helpers: Other useful helper functions

.. _bulk:
--------------
Bulk Molecules
--------------

:Path: ``('bulk',)``
:Updater: :py:func:`ecoli.library.schema.bulk_numpy_updater`
:Divider: :py:func:`ecoli.library.schema.divide_bulk`
:Serializer: :py:class:`ecoli.library.schema.get_bulk_counts`
:Schema: :py:func:`ecoli.library.schema.numpy_schema`
:Helpers: :py:func:`ecoli.library.schema.bulk_name_to_idx`,
    :py:func:`ecoli.library.schema.counts`

.. WARNING::
    ``vivarium-core`` **does not** copy store values so processes must
    be careful not to unintentionally modify mutable values. The bulk 
    molecules array is read-only for extra protection (see ``WRITEABLE`` 
    flag in :py:attr:`numpy.ndarray.flags`).

.. note::
    Bulk molecules are named as such because they represent species for 
    which all molecules are treated as interchangeable (e.g. water).

The bulk molecules store consists of a 
`structured Numpy array <https://numpy.org/doc/stable/user/basics.rec.html>`_ 
with the following named fields:

    1. ``id`` (:py:class:`str`): Names of bulk molecules as pulled from `EcoCyc <https://ecocyc.org/>`_
        Each end with a bracketed "location tag" (e.g. ``[c]``) containing
        one of the abbreviations defined in the 
        ``reconstruction/ecoli/flat/compartments.tsv`` file (see
        `Cell Component Ontology <http://brg.ai.sri.com/CCO/downloads/cco.html>`_)
    2. ``count`` (:py:attr:`numpy.int64`): Counts of bulk molecules
        Instead of the full structured array, the ``evolve_state`` method of partitioned
        processes receive a one-dimensional array of partitioned counts (see :ref:`partitioning`).
    3. ``{}_submass`` (:py:attr:`numpy.float64`): Field for each submass
        Eight submasses are rRNA, tRNA, mRNA, miscRNA, nonspecific_RNA, protein, metabolite, water, DNA

.. _initialization:
Initialization
==============
To create the initial value for this store, the model will go through 
the following three options in order:

    1. Load custom initial state
        Set ``initial_state`` option for 
        :py:mod:`~ecoli.experiments.ecoli_master_sim`

    2. Load from saved state JSON
        Set ``initial_state_file`` option for 
        :py:mod:`~ecoli.experiments.ecoli_master_sim`

    3. Generate from ``sim_data``
        :py:meth:`~ecoli.library.sim_data.LoadSimData.generate_initial_state` 
        uses the ``sim_data`` object generated by the ParCa to calculate 
        initial state


.. _partitioning:
Partitioning
============

Motivation
----------
To support the use of independent sub-models for different biological processes 
(e.g. FBA for metabolism, Gillespie for complexation, etc.), the model allows 
processes to run mostly independently. At a high level, over the course of a 
single simulation step, each process will see the simulation state as it was 
before any other process has run. Each process will then calculate an update 
to apply to the simulation state and all updates will be simultaneously 
applied once all processes have run. 

This setup has a potential problem: two processes may both decide to deplete 
the count of the same molecule, resulting in a final count that is negative. 
To prevent this from happening, the model forces processes to communicate 
their bulk molecule requests to a special allocator process 
(:py:class:`~ecoli.processes.allocator.Allocator`). The allocator process 
will divide the bulk molecules so that each process sees a functional count 
that is proportional to their request.

For example, if process A requests 100 of molecule X and process B requests 
400 of molecule X but the cell only has 400 molecules of X, the allocator 
will divde the molecules as follows:

- Process A: :math:`\frac{100}{100 + 400} * 400 = 80` molecules of X 
- Process B: :math:`\frac{400}{100 + 400} * 400 = 320` molecules of X

.. note::
    Processes in the model are more dependent on one another than in this 
    simplified example.

For example, since molecule binding and complexation 
events occur on timescales much shorter than the default 1 second 
simulation timestep, we run :py:class:`~ecoli.processes.tf_unbinding.TfUnbinding`, 
update the simulation state, then run 
:py:class:`~ecoli.processes.equilibrium.Equilibrium` and 
:py:class:`~ecoli.processes.two_component_system.TwoComponentSystem`, 
update the simulation state, and finally run 
:py:class:`~ecoli.processes.tf_binding.TfBinding`, 
and update the simulation state. This allows transcription factors that are 
currently bound to promoters a chance to form complexes or participate in 
other reactions, better reflecting the transient binding dynamics of real cells.

Steps and Flows
---------------
To allow processes to run with a pre-specified order within 
each timestep, we can make use of a special subclass of the typical Vivarium 
:py:class:`~vivarium.core.process.Process` class: 
:py:class:`~vivarium.core.process.Step`. All "processes" in the model 
are actually instances of :py:class:`~vivarium.core.process.Step`. These Steps 
are configured to run in user-configured "execution layers" by way of a ``flow`` 
that is included in the simulation configuration (see 
:py:mod:`~ecoli.experiemnts.ecoli_master_sim`).

A ``flow`` is a dictionary that specifies the dependencies for each Step. For 
example, if a user wants Step B to run only after Step A has updated the 
simulation state, the user can include Step A as a dependency of Step B::

    {
        "Step B": [("Step A",)]
    }

.. note::
    Dependencies must be in the form of paths like those that you would find 
    in a topology.

Vivarium will parse the ``flow`` to construct a directed acyclic graph  
and figure out the order in which to run steps by stratifying them into 
"execution layers". For example, consider the following ``flow``::

    {
        "Step B": [("Step A",)],
        "Step C": [("Step A",)],
        "Step D": [("Step C",)]
    }

Vivarium will parse this into the following sequence of execution layers: 

1. Step A
2. Step B and Step C (order does not matter)
3. Step D

Each timestep, Step A will run and update the simulation state, Steps B and C 
will run with a view of the state that was updated by Step A, and finally 
Step D will run with a view of the state that was updated by every other step.

.. _implementation:
Implementation
--------------
All partitioned processes are instances of the 
:py:class:`~ecoli.processes.partition.PartitionedProcess` class. This both 
serves to identify the processes that require partitioning and also implements 
a standard ``next_update`` method that allows these processes to be run on 
their own (as in 
`migration tests <https://github.com/CovertLab/vivarium-ecoli/tree/master/migration>`_).

.. WARNING::
    In instances of :py:class:`~ecoli.processes.partition.PartitionedProcess`, 
    all ports connected to the bulk molecule store **MUST** be called 
    ``bulk`` to be properly partitioned. Conversely, ports that are not meant 
    to be partitioned should **NEVER** be called ``bulk`` in any 
    :py:class:`~ecoli.processes.partition.PartitionedProcess`.

In the model, each partitioned process is used to create two separate steps: 
a :py:class:`~ecoli.processes.partition.Requester` and an 
:py:class:`~ecoli.processes.partition.Evolver`. For each execution layer 
in the ``flow`` given to :py:class:`~ecoli.experiments.ecoli_master_sim.EcoliSim`, 
:py:class:`~ecoli.composites.ecoli_master.Ecoli` will arrange the requesters and 
evolvers into four execution layers in the final model: 

1. Requesters: 
    Each will call the 
    :py:meth:`~ecoli.processes.partition.PartitionedProcess.calculate_request`
    method of a :py:class:`~ecoli.processes.partition.PartitionedProcess` 
    in said layer and write its requests to a process-specific ``request`` store

2. Allocator: 
    An instance of :py:class:`~ecoli.processes.allocator.Allocator` 
    that reads all ``request`` stores for processes in execution layer, 
    proportionally allocates bulk molecules to processes according to requests, 
    and writes allocated counts to process-specific ``allocate`` stores

3. Evolvers: 
    Each will replace all views into the ``bulk`` store with the counts allocated 
    to its corresponding :py:class:`~ecoli.processes.partition.PartitionedProcess` 
    in its ``allocate`` store, call the 
    :py:meth:`~ecoli.processes.partition.PartitionedProcess.evolve_state` 
    method of its :py:class:`~ecoli.processes.partition.PartitionedProcess`, 
    update the bulk molecule counts, and send unique molecule updates 
    to be accumulated by each unique molecule updater 
    (see :py:class:`~ecoli.library.schema.UniqueNumpyUpdater`)

4. Unique updater: 
    An instance of 
    :py:class:`~ecoli.processes.unique_update.UniqueUpdate` that tells 
    unique molecule updaters to apply accumulated updates 
    (see :py:class:`~ecoli.library.schema.UniqueNumpyUpdater` for details)

.. note::
    The :py:class:`~ecoli.processes.partition.Requester` and 
    :py:class:`~ecoli.processes.partition.Evolver` for each partitioned process 
    share the same :py:class:`~ecoli.processes.partition.PartitionedProcess` 
    instance. This allows instance variables  
    (see 
    :py:data:`~ecoli.processes.polypeptide_elongation.PolypeptideElongation.aa_supply`
    for an example in 
    :py:class:`~ecoli.processes.polypeptide_elongation.PolypeptideElongation`)
    to be updated and shared between the 
    :py:meth:`~ecoli.processes.partition.PartitionedProcess.calculate_request` 
    and :py:meth:`~ecoli.processes.partition.PartitionedProcess.evolve_state` 
    methods of each :py:class:`~ecoli.processes.partition.PartitionedProcess`.

Accessing Non-partitioned Counts
--------------------------------
There are certain processes that require access to the total, non-partitioned 
count of certain bulk molecules. For example, 
:py:class:`~ecoli.processes.metabolism.Metabolism` needs to know the total 
counts to all amino acids to accurately implement tRNA charging. To give these 
processes access to non-partitioned counts, an additional port is added to 
their ``ports_schema`` methods and topologies that is also connected to the 
bulk molecules store. By convention, this port is called ``bulk_total`` to 
differentiate it from the partitioned ``bulk`` port. Evolvers will overwrite 
the partitioned ``bulk`` port with the allocated bulk molecule counts while 
leaving the ``bulk_total`` port untouched, giving their associated 
:py:class:`~ecoli.processes.partition.PartitionedProcess` instances access to 
the unpartitioned bulk molecule counts in their 
:py:meth:`~ecoli.processes.partition.PartitionedProcess.evolve_state` methods. 


Indexing
========
Processes typically use the :py:func:`ecoli.library.schema.bulk_name_to_idx` helper function 
to get the indices for a set of molecules (e.g. all NTPs). These indices are typically cached 
as instance attributes (e.g. ``self.ntp_idx``) in the ``next_update`` method of a process.

Though counts can be directly retrieved from the Numpy structured array (e.g. 
``states['bulk']['count'][self.ntp_idx]``), partitioned processes do not have access to the 
Numpy structured array in their ``evolve_state`` methods due to how partitioning was 
implemented in the model (see :ref:`implementation`). To standardize count 
access across processes, the helper function 
:py:func:`ecoli.library.schema.counts` can handle both of these scenarios and 
also guarantees that the returned counts can be safely edited without 
unintentionally mutating the source array.


----------------
Unique Molecules
----------------

:Path: ``('unique',)``
:Updater: :py:meth:`ecoli.library.schema.UniqueNumpyUpdater.updater`
:Dividers: See :py:data:`ecoli.library.schema.UNIQUE_DIVIDERS`
:Serializer: :py:class:`ecoli.library.schema.get_unique_fields`
:Schema: :py:func:`ecoli.library.schema.numpy_schema`
:Helpers: :py:func:`ecoli.library.schema.attrs`

.. WARNING::
    ``vivarium-core`` **does not** copy store values so processes must
    be careful not to unintentionally modify mutable values. Each unique 
    molecule array is read-only for extra protection (see ``WRITEABLE`` 
    flag in :py:attr:`numpy.ndarray.flags`).

.. note::
    Unique molecules are named as such because they represent species for 
    which individual molecules are not treated as interchangeable (e.g. 
    different RNA molecules may have different sequences).

The unique molecules store contains a substore for each unique molecule (e.g. 
RNA, active RNAP, etc.). Each unique molecule substore contains a 
`structured Numpy array <https://numpy.org/doc/stable/user/basics.rec.html>`_ 
with a variety of named fields, each representing an attribute of interest 
for that class of unique molecules (e.g. ``coordinates`` for a ``gene`` unique 
molecule). All unique molecules will have the following named fields:

    1. ``unique_index`` (:py:class:`int`): Unique identifier for each unique molecule
        When processes add new unique molecules, the helper function 
        :py:func:`ecoli.library.schema.create_unqiue_indexes` is used to generate 
        unique indices for each molecule to be added.
    2. ``_entryState`` (:py:attr:`numpy.int8`): 1 for active row, 0 for inactive row
        When unique molecules are deleted (e.g. RNA degradation), all of their data, 
        including the ``_entryState`` field, is set to 0. When unique molecues are 
        added (e.g. RNA transcription), the updater places the data for these new 
        molecules into the rows that are identified as inactive by the helper function 
        :py:func:`ecoli.library.schema.get_free_indices`, which also grows the array 
        if necessary. 
    3. ``massDiff_{}`` (:py:attr:`numpy.float64`): Field for each dynamic submass
        The eight submasses are rRNA, tRNA, mRNA, miscRNA, nonspecific_RNA, protein, 
        metabolite, water, and DNA. An example of a dynamic submass is the constantly
        changing protein mass of the polypeptide associated with an actively 
        translating ribosome.

Initialization
==============
See :ref:`initialization`.

Accessing
=========
Processes use the :py:func:`ecoli.library.schema.attrs` helper function to access 
any number of attributes for all active (``_entryState`` is 1) unique molecules 
of a given type (e.g. RNA, active RNAP, etc.).