Experiments
EcoliSim
is the primary
interface for configuring and running single-cell simulations. We refer
to simulations as experiments, and all simulations (or batches
of simulations run in a single workflow, see Workflows) are
identified via a unique experiment ID.
Warning
If data is being persisted to disk (see Parquet Emitter), simulations or workflows will overwrite data from any past simulations or workflows with the same experiment ID.
When running workflows with runscripts.workflow
(see Workflows),
users are prevented from accidentally overwriting data by nextflow
, the software
used to run the workflow. Specifically, nextflow generates an HTML execution report
in the output folder for a given experiment ID (see Output)
and will refuse to run another workflow with the same experiment ID unless
that execution report is renamed, moved, or deleted.
Configuration
EcoliSim
offers three methods
for configuring simulations:
Using a JSON configuration file via the command-line option
--config
Using the object-oriented interface
Using command-line options
In general, we recommend that you use the JSON configuration interface as much
as possible. This is because the JSON configuration format is standardized across
all of the main interfaces for the model (the scripts in runscripts
and ecoli_master_sim
). The object-oriented interface
allows users to programatically set simulation options and is mainly intended for
small ad-hoc test simulations or for creating your own experiment file (see
Create Your Own). The command-line interface is much more limited than
the other two and only offers access to a few key configuration options. It is
mainly intended for internal use (e.g. in Nextflow workflow scripts).
JSON Config Files
The EcoliSim
class relies upon
the helper SimConfig
class to load
configuration options from JSON files and merge them with options specified via
the command line. The configuration options are always loaded in the following order,
with options loaded later on overriding those from earlier sources:
The options in the default JSON config file (located at
default_config_path
)The options in the JSON config file specified via
--config
in the command line.The options specified via the command line.
In most cases, configuration options that appear in more than one
of the above sources are successively overriden in their entirety. The sole
exceptions are configuration options listed in
LIST_KEYS_TO_MERGE
. These
options hold lists of values that are concatenated with one another instead
of being wholly overriden.
Notice that the options in the default JSON config file are always loaded
first. This means that if you would like to run a simulation or workflow
that leaves some of these options alone, you can simply omit those options
from the JSON config file that you create and pass to your runscript of choice
via --config
.
Below is an annotated copy of the default simulation-related configuration
options from the default JSON config file (see the file located at
default_config_path
for the most up-to-date defaults). Note that JSON configuration files passed
as input to the scripts in runscripts
accept additional keys that are
documented in Workflows.
{
# List of string filenames in the ecoli/composites/ecoli_configs directory
# (include .json extension). These files are loaded in order and merged
# into the configuration of this file. Avoid overly complex inheritance
# chains if possible.
"inherit_from": [],
# String that uniquely identifies simulation (or workflow if passed
# as input to runscripts/workflow.py). Special characters and spaces
# are not allowed (hyphens are OK).
"experiment_id": "experiment_id_one"
# Whether to append date and time to experiment ID in the following format
# experiment_id_%Y%m%d-%H%M%S.
"suffix_time": true,
# Optional string description of simulation
"description": "",
# Whether to display vivarium-core progress bar
"progress_bar" : true,
# Path to pickle file output from parameter calculator (runscripts/parca.py).
# Only used for single sim run with ecoli/experiments/ecoli_master_sim.py.
# Ignored when run with runscripts/workflow.py because each simulation is
# automatically run with the appropriate variant/baseline simulation data.
"sim_data_path": "reconstruction/sim_data/kb/simData.cPickle",
# Pick between "timeseries" to save simulation output in-memory (good
# for single-cell ad-hoc analysis) or "parquet" to save output persistently
# to Parquet files on disk (good for workflows and more in-depth analyses)
"emitter" : "timeseries",
# If choosing "parquet" emitter, must provide "out_dir" with path (relative
# or absolute) to output folder OR "out_uri" with URI for Google Cloud Storage
# bucket. Only provide one of the above.
"emitter_arg": {"out_dir": "out"},
# See API documentation on vivarium-core for vivarium.core.engine.Engine.
# Can usually leave as false.
"emit_topology" : false,
"emit_processes" : false,
"emit_config" : false,
# Whether to save process updates to log_update stores. Should only be used
# if choosing "timeseries" emitter. See "Log Updates" heading in "Composites"
# documentation for more information.
"log_updates" : false,
# Controls output format for ecoli.experiments.ecoli_master_sim.EcoliSim.query.
# Should only be used if choosing "timeseries" emitter. See API documentation
# for the query function for more information.
"raw_output" : true,
# Initial seed used to generate the seeds that are used to initialize
# the psuedorandom number generators in the model. Only used for single
# simulations run using ecoli/experiments/ecoli_master_sim.py. Workflows
# run with runscripts/workflow.py generate initial seeds using the value
# of a different configuration option named "lineage_seed".
"seed": 0,
# Special flags to enable mechanisms related to antibiotic resistance.
# See API documentation for ecoli.library.sim_data.LoadSimData for more
# information.
"mar_regulon": false,
"amp_lysis": false,
# String name of file inside "data" folder containing saved JSON initial
# state (omit .json extension). See "Initialization" headings in "Store"
# documentation and ecoli.composites.ecoli_master.Ecoli.initial_state
# documentation for more details.
"initial_state_file": "",
# List of string file names inside "data" folder (can be nested like
# "data/overrides/*") containing manual overrides for targeted values
# in initial state (whether that initial state came from "initial_state"
# or "initial_state_file"). Omit .json extension. See API documentation
# for ecoli.composites.ecoli_master.Ecoli.initial_state.
"initial_state_overrides": [],
# Dictionary of values to populate initial state with. Supersedes any file
# names specified in "initial_state_file". See API documentation
# for ecoli.composites.ecoli_master.Ecoli.initial_state for more details,
# including what happens if neither "initial_state" nor "initial_state_file"
# are provided (as is the case here).
"initial_state": {},
# Global time step for all simulation processes. See "Time Step" heading
# in "Processes" documentation for more details, including extra steps that
# one must take to add a process with a different time step. MUST BE FLOAT.
"time_step": 1.0,
# Maximum time to run simulation for. By default, we only run simulations
# until reaching division with ecoli/experiments/ecoli_master_sim.py
# and runscripts/workflow.py. Most of the time, division occurs well before
# 10800 seconds have elapsed. However, if this is not the case, this time
# sets a hard stopping point for the simulation. MUST BE FLOAT.
"total_time": 10800.0,
# The value to initialize the ("global_time",) store with. Mainly used for
# simulations run with runscripts/workflow.py, which frequently entail
# simulating daughter cells after a mother cell divides. MUST BE FLOAT.
# Note that the "total_time" option is applied on top of this value.
# For example, for an "initial_global_time" of 3000.0 and a "total_time"
# of 10000.0, the simulation will have a hard stopping point at 13000.0 s.
"initial_global_time": 0.0,
# Whether to raise ecoli.experiments.ecoli_master_sim.TimeLimitError when
# a simulation reaches the hard stopping point or to gracefully stop with
# no error raised.
"fail_at_total_time": false,
# String identifier for single cell simulation. For workflows run with
# runscripts/workflow.py, subsequent generations will append "0" and "1"
# to this initial agent ID for each daughter cell (only "0" if not
# simulating both daughter cells, see "Workflow" documentation).
"agent_id": "0",
# Whether to add processes and associated topologies for cell
# division. See "Division Modifications" heading in "Composites" docs.
"divide": true,
# Local or absolute path to directory where initial states for daughter
# cells are saved as JSONs named ``daughter_state_0.json`` and
# ``daughter_state_1.json``. These can be moved to the ``data``
# folder and passed as ``initial_state_file`` to run simulations
# of the daughter cells.
"daughter_outdir": "out",
# Whether to add process and associated topology for triggering division
# after a D period has elapsed following the completion of chromosome
# replication. If False, division is triggered when the store located
# at the path for "division_variable" reaches "division_threshold".
"d_period": true,
# Threshold that "division_variable" must reach in order for division
# to be triggered. When "d_period" is True, this must be set to True
# and "division_variable" must be set to ["divide"] because the
# ecoli.processes.cell_division.MarkDPeriod process sets the ["divide"]
# store to True one D period after chromosome replication finishes.
"division_threshold": true,
# Path to store containing value that triggers division upon reaching
# "division_threshold".
"division_variable": ["divide"],
# Path to store containing full chromosome unique molecules. Used by
# division process to ensure that a cell contains two complete
# chromosomes before replicating (can occur when "d_period" is False
# and "division_variable" is cell mass for example). Will wait for
# there to be two complete full chromosomes before dividing even
# if "division_variable" hits "division_threshold".
"chromosome_path": ["unique", "full_chromosome"],
# Whether to simulate cell inside a binned 2D spatial environment
# with support for reaction diffusion. See API documentation for
# ecoli.composites.environment.lattice.Lattice composite. This is
# mainly useful for colony simulations.
"spatial_environment": false,
# Configuration options for Lattice composite. See the JSON config
# file at ecoli/composites/ecoli_configs/spatial.json for an example.
"spatial_environment_config": {},
# Whether to serialize the simulation state to JSON and save it to
# files at the times listed in "save_times". See the API documentation
# for ecoli.experiments.ecoli_master_sim.EcoliSim.save_states. This can
# be useful to save and reload the simulation at a certain time for
# debugging purposes.
"save": false,
"save_times": [],
# List of process names to add to model on top of defaults.
"add_processes" : [],
# List of process names to remove from defaults (or processes added
# by other JSONs in the "inherit_from" hierarchy).
"exclude_processes" : [],
# Mapping of process names to names of processes to replace them with.
# For example, {"ecoli-metabolism" : "ecoli-metabolism-redux-classic"}
# replaces the default metabolism process with one registered in
# ecoli/processes/__init__.py as "ecoli-metabolism-redux-classic"
"swap_processes" : {},
# Whether to print profiling statistics for simulation run.
# TODO: Check whether this still works.
"profile": false,
# List of names of processes to include in model. The blank lines between
# process names here indicate the boundaries between successive execution
# layers as described in the "Steps and Flows" sub-heading in the "Stores"
# documentation (with the exception of "global_clock" which inherits from
# Process and not Step). You can verify that this is the case by working
# through the dependencies in the "flow" below.
"processes": [
"post-division-mass-listener", # Run and apply update
"bulk-timeline", # Once layer above finishes, run and
"media_update", # apply updates in arbitrary order
"exchange_data",
"ecoli-tf-unbinding", # Once layer above finishes, run and update
"ecoli-equilibrium", # Once layer above finishes, run Requesters,
"ecoli-two-component-system", # then Allocator, then Evolvers,
"ecoli-rna-maturation", # then UniqueUpdate (see "Partitioning")
"ecoli-tf-binding",
"ecoli-transcript-initiation",
"ecoli-polypeptide-initiation",
"ecoli-chromosome-replication",
"ecoli-protein-degradation",
"ecoli-rna-degradation",
"ecoli-complexation",
"ecoli-transcript-elongation",
"ecoli-polypeptide-elongation",
"ecoli-chromosome-structure",
"ecoli-metabolism",
"ecoli-mass-listener",
"RNA_counts_listener",
"rna_synth_prob_listener",
"monomer_counts_listener",
"dna_supercoiling_listener",
"replication_data_listener",
"rnap_data_listener",
"unique_molecule_counts",
"ribosome_data_listener",
"global_clock"
],
# Mapping of process names to dictionaries of parameters to override
# defaults with, if any. Processes that do not have a registered
# function in ecoli.library.sim_data.LoadSimData.get_config_by_name
# MUST specify either "default" or a dictionary of parameters here.
# See ecoli.composites.ecoli_master.Ecoli.generate_processes_and_steps
# for more details.
"process_configs": {
"global_clock": {},
"replication_data_listener": {"time_step": 1}
},
# Mapping of process names to topology dictionaries. Processes that
# did not register their topology in ecoli.processes.registry.topology_registry
# by importing it and calling topology_registry.register(NAME, TOPOLOGY)
# MUST specify a topology dictionary here.
"topology": {
"bulk-timeline": {
"bulk": ["bulk"],
"global": ["timeline"],
"media_id": ["environment", "media_id"]
},
"global_clock": {
"global_time": ["global_time"],
"next_update_time": ["next_update_time"]
}
},
# Mapping of Step names to paths to Step dependencies. See the
# "Steps and Flows" sub-heading in the "Stores" documentation.
"flow": {
"post-division-mass-listener": [],
"media_update": [["post-division-mass-listener"]],
"exchange_data": [["media_update"]],
"ecoli-tf-unbinding": [["media_update"]],
"ecoli-equilibrium": [["ecoli-tf-unbinding"]],
"ecoli-two-component-system": [["ecoli-tf-unbinding"]],
"ecoli-rna-maturation": [["ecoli-tf-unbinding"]],
"ecoli-tf-binding": [["ecoli-equilibrium"]],
"ecoli-transcript-initiation": [["ecoli-tf-binding"]],
"ecoli-polypeptide-initiation": [["ecoli-tf-binding"]],
"ecoli-chromosome-replication": [["ecoli-tf-binding"]],
"ecoli-protein-degradation": [["ecoli-tf-binding"]],
"ecoli-rna-degradation": [["ecoli-tf-binding"]],
"ecoli-complexation": [["ecoli-tf-binding"]],
"ecoli-transcript-elongation": [["ecoli-complexation"]],
"ecoli-polypeptide-elongation": [["ecoli-complexation"]],
"ecoli-chromosome-structure": [["ecoli-polypeptide-elongation"]],
"ecoli-metabolism": [["ecoli-chromosome-structure"]],
"ecoli-mass-listener": [["ecoli-metabolism"]],
"RNA_counts_listener": [["ecoli-metabolism"]],
"rna_synth_prob_listener": [["ecoli-metabolism"]],
"monomer_counts_listener": [["ecoli-metabolism"]],
"dna_supercoiling_listener": [["ecoli-metabolism"]],
"replication_data_listener": [["ecoli-metabolism"]],
"rnap_data_listener": [["ecoli-metabolism"]],
"unique_molecule_counts": [["ecoli-metabolism"]],
"ribosome_data_listener": [["ecoli-metabolism"]]
}
}
Here are some general rules to remember when writing your own JSON config files:
Strings must be enclosed in double quotes (not single quotes)
Booleans are lowercase
None values are written as (unquoted)
null
Trailing commas are not allowed
Comments are not allowed
Tuples (e.g. in topologies or flows) are written as lists (
["bulk"]
instead of("bulk",)
)
Note
It is strongly recommended that fail_at_total_time
be set to True
when running multi-generation workflows. If a simulation reaches total time
without dividing, this results in a more informative error message instead
of a Nextflow error about missing daughter cell states.
Output
If emitter
was set to parquet
, then folders containing the simulation output are
created as described in Parquet Emitter.
If division
is set to True, ecoli_master_sim
will
save the initial states of the two daughter cells resulting from cell division
in daughter_outdir
as JSON files. These files can be moved to the data
folder and passed as initial_state_file
to simulate the daughter cells.
Additionally, the file division_time.sh
will be created in the folder where
you started the simulation. This script, when run, sets the environment variable
division_time
to the time at which the cell divided. It is intended for internal
use when running a simulation workflow with runscripts.workflow
, allowing
Nextflow to correctly set the initial_global_time
for daughter cell simulations.
Schema Overrides
One powerful feature of the JSON configuration approach is the ability to override the port schemas
specified by processes. To do so, one simply adds a _schema
key to the config for a process
under the process_configs
option. In the following example, we have overridden the schema for
how the “ecoli-mass-listener” process divides the cell mass.
"process_configs": {
"ecoli-mass-listener": {
"_schema": {
"listeners": {
"mass": {"cell_mass": {"_divider": "set"}}
}
}
}
}
Another use of schema overrides is to emit data that would normally not be emitted
by setting _emit
to True
.
"process_configs": {
"ecoli-mass-listener": {
"_schema": {
"unique": {
"active_ribosome": {"_emit": true}
}
}
}
}
Warning
Vivarium includes internal checks to ensure that all ports connected to a store give the same or compatible (no conflicting keys) schemas for that store. This means that if you would like to override the schema for a store with many connecting ports, you will need to override the schemas for all the relevant ports.
Colony Simulations
While EcoliSim
was only designed
to handle simulation of single cells in isolation,
ecoli_engine_process
was made to simulate
multi-cell colonies in shared, dynamic spatial environments.
Engine Process
In simple terms, instances of EngineProcess
wrap an entire Vivarium simulation as a process that can be incremented time step by
time step and interact bidirectionally with the outer simulation. Refer to the API
documentation for ecoli_engine_process
for more details.
Configuring Colony Simulations
All of the configuration
options listed above still apply to simulations started with
ecoli_engine_process
. There are only three new options:
engine_process_reports
: List of paths (e.g.["bulk"]
for bulk store) inside each cell to save in final colony output.emit_paths
: List of paths in outer simulation (e.g. locations of each cell in spatial environment) to save in final colony output.parallel
: Inecoli_engine_process
, each simulated cell is contained within a single process (specifically, an instance ofEngineProcess
). Therefore, assuming cells only need to communicate a tiny amount of information between one another, interprocess overhead is low and running these cells in parallel can greatly speed up the colony simulation.
In addition to these new configuration options, several previously mentioned options become much more useful in the context of colony simulations:
save
andsave_times
can be used to create snapshots of the colony state to start many colony simulations from, for example, a 16-cell state usinginitial_state_file
without having to wait for 16 generations every time. The names of the files saved can be given an optional prefix configured via thecolony_save_prefix
option.spatial_environment
andspatial_environment_config
: The benefit of running simulations inside a shared, dynamic spatial environment is only fully realized when many cells are interacting with one another inside this environment.
Create Your Own
For more control over a simulation than what is provided by the default
ecoli_master_sim
experiment (as well as the
workflow runscript runscripts.workflow
, see Workflows),
you can create your own experiment file. Some examples of custom experiment
files in the ecoli/experiments
folder include:
tet_amp_sim
: Modifies the initial state to add new bulk molecules (see Bulk Molecules) for antibiotics-related molecules and adds two transcription factor binding sites to all promoters for MarA and MarR. Also adds command-line options for external concentration of tetracycline and ampicillin.metabolism_redux_sim
: Replaces the default metabolism process (Metabolism
) with experimental alternatives (e.g.MetabolismReduxClassic
). Makes use of the object-oriented interface for sim configuration mentioned in Configuration (e.g.sim.total_time = 100
).