Experiments

EcoliSim is the primary interface for configuring and running single-cell simulations. We refer to simulations as experiments, and all simulations (or batches of simulations run in a single workflow, see Workflows) are identified via a unique experiment ID.

Warning

If data is being persisted to disk (see Parquet Emitter), simulations or workflows will overwrite data from any past simulations or workflows with the same experiment ID.

When running workflows with runscripts.workflow (see Workflows), users are prevented from accidentally overwriting data by nextflow, the software used to run the workflow. Specifically, nextflow generates an HTML execution report in the output folder for a given experiment ID (see Output) and will refuse to run another workflow with the same experiment ID unless that execution report is renamed, moved, or deleted.

Configuration

EcoliSim offers three methods for configuring simulations:

  1. Using a JSON configuration file via the command-line option --config

  2. Using the object-oriented interface

  3. Using command-line options

In general, we recommend that you use the JSON configuration interface as much as possible. This is because the JSON configuration format is standardized across all of the main interfaces for the model (the scripts in runscripts and ecoli_master_sim). The object-oriented interface allows users to programatically set simulation options and is mainly intended for small ad-hoc test simulations or for creating your own experiment file (see Create Your Own). The command-line interface is much more limited than the other two and only offers access to a few key configuration options. It is mainly intended for internal use (e.g. in Nextflow workflow scripts).

JSON Config Files

The EcoliSim class relies upon the helper SimConfig class to load configuration options from JSON files and merge them with options specified via the command line. The configuration options are always loaded in the following order, with options loaded later on overriding those from earlier sources:

  1. The options in the default JSON config file (located at default_config_path)

  2. The options in the JSON config file specified via --config in the command line.

  3. The options specified via the command line.

In most cases, configuration options that appear in more than one of the above sources are successively overriden in their entirety. The sole exceptions are configuration options listed in LIST_KEYS_TO_MERGE. These options hold lists of values that are concatenated with one another instead of being wholly overriden.

Notice that the options in the default JSON config file are always loaded first. This means that if you would like to run a simulation or workflow that leaves some of these options alone, you can simply omit those options from the JSON config file that you create and pass to your runscript of choice via --config.

Below is an annotated copy of the default simulation-related configuration options from the default JSON config file (see the file located at default_config_path for the most up-to-date defaults). Note that JSON configuration files passed as input to the scripts in runscripts accept additional keys that are documented in Workflows.

{
    # List of string filenames in the ecoli/composites/ecoli_configs directory
    # (include .json extension). These files are loaded in order and merged
    # into the configuration of this file. Avoid overly complex inheritance
    # chains if possible.
    "inherit_from": [],
    # String that uniquely identifies simulation (or workflow if passed
    # as input to runscripts/workflow.py). Avoid special characters as we
    # quote experiment IDs using urlparse.parse.quote_plus, which may make
    # experiment IDs with special characters hard to deciphe later.
    "experiment_id": "experiment_id_one"
    # Whether to append date and time to experiment ID in the following format
    # experiment_id_%d-%m-%Y_%H-%M-%S.
    "suffix_time": true,
    # Optional string description of simulation
    "description": "",
    # Whether to display vivarium-core progress bar
    "progress_bar" : true,
    # Path to pickle file output from parameter calculator (runscripts/parca.py).
    # Only used for single sim run with ecoli/experiments/ecoli_master_sim.py.
    # Ignored when run with runscripts/workflow.py because each simulation is
    # automatically run with the appropriate variant/baseline simulation data.
    "sim_data_path": "reconstruction/sim_data/kb/simData.cPickle",
    # Pick between "timeseries" to save simulation output in-memory (good
    # for single-cell ad-hoc analysis) or "parquet" to save output persistently
    # to Parquet files on disk (good for workflows and more in-depth analyses)
    "emitter" : "timeseries",
    # If choosing "parquet" emitter, must provide "out_dir" with path (relative
    # or absolute) to output folder OR "out_uri" with URI for Google Cloud Storage
    # bucket. Only provide one of the above.
    "emitter_arg": {"out_dir": "out"},
    # See API documentation on vivarium-core for vivarium.core.engine.Engine.
    # Can usually leave as false.
    "emit_topology" : false,
    "emit_processes" : false,
    "emit_config" : false,
    # Whether to save process updates to log_update stores. Should only be used
    # if choosing "timeseries" emitter. See "Log Updates" heading in "Composites"
    # documentation for more information.
    "log_updates" : false,
    # Controls output format for ecoli.experiments.ecoli_master_sim.EcoliSim.query.
    # Should only be used if choosing "timeseries" emitter. See API documentation
    # for the query function for more information.
    "raw_output" : true,
    # Initial seed used to generate the seeds that are used to initialize
    # the psuedorandom number generators in the model. Only used for single
    # simulations run using ecoli/experiments/ecoli_master_sim.py. Workflows
    # run with runscripts/workflow.py generate initial seeds using the value
    # of a different configuration option named "lineage_seed".
    "seed": 0,
    # Special flags to enable mechanisms related to antibiotic resistance.
    # See API documentation for ecoli.library.sim_data.LoadSimData for more
    # information.
    "mar_regulon": false,
    "amp_lysis": false,
    # String name of file inside "data" folder containing saved JSON initial
    # state (omit .json extension). See "Initialization" headings in "Store"
    # documentation and ecoli.composites.ecoli_master.Ecoli.initial_state
    # documentation for more details.
    "initial_state_file": "",
    # List of string file names inside "data" folder (can be nested like
    # "data/overrides/*") containing manual overrides for targeted values
    # in initial state (whether that initial state came from "initial_state"
    # or "initial_state_file"). Omit .json extension. See API documentation
    # for ecoli.composites.ecoli_master.Ecoli.initial_state.
    "initial_state_overrides": [],
    # Dictionary of values to populate initial state with. Supersedes any file
    # names specified in "initial_state_file". See API documentation
    # for ecoli.composites.ecoli_master.Ecoli.initial_state for more details,
    # including what happens if neither "initial_state" nor "initial_state_file"
    # are provided (as is the case here).
    "initial_state": {},
    # Global time step for all simulation processes. See "Time Step" heading
    # in "Processes" documentation for more details, including extra steps that
    # one must take to add a process with a different time step. MUST BE FLOAT.
    "time_step": 1.0,
    # Maximum time to run simulation for. By default, we only run simulations
    # until reaching division with ecoli/experiments/ecoli_master_sim.py
    # and runscripts/workflow.py. Most of the time, division occurs well before
    # 10800 seconds have elapsed. However, if this is not the case, this time
    # sets a hard stopping point for the simulation. MUST BE FLOAT.
    "total_time": 10800.0,
    # The value to initialize the ("global_time",) store with. Mainly used for
    # simulations run with runscripts/workflow.py, which frequently entail
    # simulating daughter cells after a mother cell divides. MUST BE FLOAT.
    # Note that the "total_time" option is applied on top of this value.
    # For example, for an "initial_global_time" of 3000.0 and a "total_time"
    # of 10000.0, the simulation will have a hard stopping point at 13000.0 s.
    "initial_global_time": 0.0,
    # Whether to raise ecoli.experiments.ecoli_master_sim.TimeLimitError when
    # a simulation reaches the hard stopping point or to gracefully stop with
    # no error raised.
    "fail_at_total_time": false,
    # String identifier for single cell simulation. For workflows run with
    # runscripts/workflow.py, subsequent generations will append "0" and "1"
    # to this initial agent ID for each daughter cell (only "0" if not
    # simulating both daughter cells, see "Workflow" documentation).
    "agent_id": "0",
    # Run each Process in parallel. This incurs a lot of overhead and most
    # processes in our model are Steps anyways. Keep at default: False.
    "parallel": false,
    # Whether to add processes and associated topologies for cell
    # division. See "Division Modifications" heading in "Composites" docs.
    "divide": true,
    # Local or absolute path to directory where initial states for daughter
    # cells are saved as JSONs named ``daughter_state_0.json`` and
    # ``daughter_state_1.json``. These can be moved to the ``data``
    # folder and passed as ``initial_state_file`` to run simulations
    # of the daughter cells.
    "daughter_outdir": "out",
    # Whether to add process and associated topology for triggering division
    # after a D period has elapsed following the completion of chromosome
    # replication. If False, division is triggered when the store located
    # at the path for "division_variable" reaches "division_threshold".
    "d_period": true,
    # Threshold that "division_variable" must reach in order for division
    # to be triggered. When "d_period" is True, this must be set to True
    # and "division_variable" must be set to ["divide"] because the
    # ecoli.processes.cell_division.MarkDPeriod process sets the ["divide"]
    # store to True one D period after chromosome replication finishes.
    "division_threshold": true,
    # Path to store containing value that triggers division upon reaching
    # "division_threshold".
    "division_variable": ["divide"],
    # Path to store containing full chromosome unique molecules. Used by
    # division process to ensure that a cell contains two complete
    # chromosomes before replicating (can occur when "d_period" is False
    # and "division_variable" is cell mass for example). Will wait for
    # there to be two complete full chromosomes before dividing even
    # if "division_variable" hits "division_threshold".
    "chromosome_path": ["unique", "full_chromosome"],
    # Whether to simulate cell inside a binned 2D spatial environment
    # with support for reaction diffusion. See API documentation for
    # ecoli.composites.environment.lattice.Lattice composite. This is
    # mainly useful for colony simulations.
    "spatial_environment": false,
    # Configuration options for Lattice composite. See the JSON config
    # file at ecoli/composites/ecoli_configs/spatial.json for an example.
    "spatial_environment_config": {},
    # Whether to serialize the simulation state to JSON and save it to
    # files at the times listed in "save_times". See the API documentation
    # for ecoli.experiments.ecoli_master_sim.EcoliSim.save_states. This can
    # be useful to save and reload the simulation at a certain time for
    # debugging purposes.
    "save": false,
    "save_times": [],
    # List of process names to add to model on top of defaults.
    "add_processes" : [],
    # List of process names to remove from defaults (or processes added
    # by other JSONs in the "inherit_from" hierarchy).
    "exclude_processes" : [],
    # Mapping of process names to names of processes to replace them with.
    # For example, {"ecoli-metabolism" : "ecoli-metabolism-redux-classic"}
    # replaces the default metabolism process with one registered in
    # ecoli/processes/__init__.py as "ecoli-metabolism-redux-classic"
    "swap_processes" : {},
    # Whether to print profiling statistics for simulation run.
    # TODO: Check whether this still works.
    "profile": false,
    # List of names of processes to include in model. The blank lines between
    # process names here indicate the boundaries between successive execution
    # layers as described in the "Steps and Flows" sub-heading in the "Stores"
    # documentation (with the exception of "global_clock" which inherits from
    # Process and not Step). You can verify that this is the case by working
    # through the dependencies in the "flow" below.
    "processes": [
        "post-division-mass-listener", # Run and apply update

        "bulk-timeline", # Once layer above finishes, run and
        "media_update", # apply updates in arbitrary order
        "exchange_data",

        "ecoli-tf-unbinding", # Once layer above finishes, run and update

        "ecoli-equilibrium", # Once layer above finishes, run Requesters,
        "ecoli-two-component-system", # then Allocator, then Evolvers,
        "ecoli-rna-maturation", # then UniqueUpdate (see "Partitioning")

        "ecoli-tf-binding",

        "ecoli-transcript-initiation",
        "ecoli-polypeptide-initiation",
        "ecoli-chromosome-replication",
        "ecoli-protein-degradation",
        "ecoli-rna-degradation",
        "ecoli-complexation",

        "ecoli-transcript-elongation",
        "ecoli-polypeptide-elongation",

        "ecoli-chromosome-structure",

        "ecoli-metabolism",

        "ecoli-mass-listener",
        "RNA_counts_listener",
        "rna_synth_prob_listener",
        "monomer_counts_listener",
        "dna_supercoiling_listener",
        "replication_data_listener",
        "rnap_data_listener",
        "unique_molecule_counts",
        "ribosome_data_listener",

        "global_clock"
    ],
    # Mapping of process names to dictionaries of parameters to override
    # defaults with, if any. Processes that do not have a registered
    # function in ecoli.library.sim_data.LoadSimData.get_config_by_name
    # MUST specify either "default" or a dictionary of parameters here.
    # See ecoli.composites.ecoli_master.Ecoli.generate_processes_and_steps
    # for more details.
    "process_configs": {
        "global_clock": {},
        "replication_data_listener": {"time_step": 1}
    },
    # Mapping of process names to topology dictionaries. Processes that
    # did not register their topology in ecoli.processes.registry.topology_registry
    # by importing it and calling topology_registry.register(NAME, TOPOLOGY)
    # MUST specify a topology dictionary here.
    "topology": {
        "bulk-timeline": {
            "bulk": ["bulk"],
            "global": ["timeline"],
            "media_id": ["environment", "media_id"]
        },
        "global_clock": {
            "global_time": ["global_time"],
            "next_update_time": ["next_update_time"]
        }
    },
    # Mapping of Step names to paths to Step dependencies. See the
    # "Steps and Flows" sub-heading in the "Stores" documentation.
    "flow": {
        "post-division-mass-listener": [],
        "media_update": [["post-division-mass-listener"]],
        "exchange_data": [["media_update"]],

        "ecoli-tf-unbinding": [["media_update"]],

        "ecoli-equilibrium": [["ecoli-tf-unbinding"]],
        "ecoli-two-component-system": [["ecoli-tf-unbinding"]],
        "ecoli-rna-maturation": [["ecoli-tf-unbinding"]],

        "ecoli-tf-binding": [["ecoli-equilibrium"]],

        "ecoli-transcript-initiation": [["ecoli-tf-binding"]],
        "ecoli-polypeptide-initiation": [["ecoli-tf-binding"]],
        "ecoli-chromosome-replication": [["ecoli-tf-binding"]],
        "ecoli-protein-degradation": [["ecoli-tf-binding"]],
        "ecoli-rna-degradation": [["ecoli-tf-binding"]],
        "ecoli-complexation": [["ecoli-tf-binding"]],

        "ecoli-transcript-elongation": [["ecoli-complexation"]],
        "ecoli-polypeptide-elongation": [["ecoli-complexation"]],

        "ecoli-chromosome-structure": [["ecoli-polypeptide-elongation"]],

        "ecoli-metabolism": [["ecoli-chromosome-structure"]],

        "ecoli-mass-listener": [["ecoli-metabolism"]],
        "RNA_counts_listener": [["ecoli-metabolism"]],
        "rna_synth_prob_listener": [["ecoli-metabolism"]],
        "monomer_counts_listener": [["ecoli-metabolism"]],
        "dna_supercoiling_listener": [["ecoli-metabolism"]],
        "replication_data_listener": [["ecoli-metabolism"]],
        "rnap_data_listener": [["ecoli-metabolism"]],
        "unique_molecule_counts": [["ecoli-metabolism"]],
        "ribosome_data_listener": [["ecoli-metabolism"]]
    }
}

Here are some general rules to remember when writing your own JSON config files:

  • Strings must be enclosed in double quotes (not single quotes)

  • Booleans are lowercase

  • None values are written as (unquoted) null

  • Trailing commas are not allowed

  • Comments are not allowed

  • Tuples (e.g. in topologies or flows) are written as lists (["bulk"] instead of ("bulk",))

Note

It is strongly recommended that fail_at_total_time be set to True when running multi-generation workflows. If a simulation reaches total time without dividing, this results in a more informative error message instead of a Nextflow error about missing daughter cell states.

Output

If emitter was set to parquet, then folders containing the simulation output are created as described in Parquet Emitter.

If division is set to True, ecoli_master_sim will save the initial states of the two daughter cells resulting from cell division in daughter_outdir as JSON files. These files can be moved to the data folder and passed as initial_state_file to simulate the daughter cells. Additionally, the file division_time.sh will be created in the folder where you started the simulation. This script, when run, sets the environment variable division_time to the time at which the cell divided. It is intended for internal use when running a simulation workflow with runscripts.workflow, allowing Nextflow to correctly set the initial_global_time for daughter cell simulations.

Schema Overrides

One powerful feature of the JSON configuration approach is the ability to override the port schemas specified by processes. To do so, one simply adds a _schema key to the config for a process under the process_configs option. In the following example, we have overridden the schema for how the “ecoli-mass-listener” process divides the cell mass.

"process_configs": {
    "ecoli-mass-listener": {
        "_schema": {
            "listeners": {
                "mass": {"cell_mass": {"_divider": "set"}}
            }
        }
    }
}

Another use of schema overrides is to emit data that would normally not be emitted by setting _emit to True.

"process_configs": {
    "ecoli-mass-listener": {
        "_schema": {
            "unique": {
                "active_ribosome": {"_emit": true}
            }
        }
    }
}

Warning

Vivarium includes internal checks to ensure that all ports connected to a store give the same or compatible (no conflicting keys) schemas for that store. This means that if you would like to override the schema for a store with many connecting ports, you will need to override the schemas for all the relevant ports.

Colony Simulations

While EcoliSim was only designed to handle simulation of single cells in isolation, ecoli_engine_process was made to simulate multi-cell colonies in shared, dynamic spatial environments.

Engine Process

In simple terms, instances of EngineProcess wrap an entire Vivarium simulation as a process that can be incremented time step by time step and interact bidirectionally with the outer simulation. Refer to the API documentation for ecoli_engine_process for more details.

Configuring Colony Simulations

All of the configuration options listed above still apply to simulations started with ecoli_engine_process. There are only two new options:

  • engine_process_reports: List of paths (e.g. ["bulk"] for bulk store) inside each cell to save in final colony output.

  • emit_paths: List of paths in outer simulation (e.g. locations of each cell in spatial environment) to save in final colony output.

In addition to these new configuration options, several previously mentioned options become much more useful in the context of colony simulations:

  • save and save_times can be used to create snapshots of the colony state to start many colony simulations from, for example, a 16-cell state using initial_state_file without having to wait for 16 generations every time. The names of the files saved can be given an optional prefix configured via the colony_save_prefix option.

  • parallel: In ecoli_engine_process, each simulated cell is contained within a single process (specifically, an instance of EngineProcess). Therefore, assuming cells only need to communicate a tiny amount of information between one another, interprocess overhead is low and running these cells in parallel can greatly speed up the colony simulation.

  • spatial_environment and spatial_environment_config: The benefit of running simulations inside a shared, dynamic spatial environment is only fully realized when many cells are interacting with one another inside this environment.

Create Your Own

For more control over a simulation than what is provided by the default ecoli_master_sim experiment (as well as the workflow runscript runscripts.workflow, see Workflows), you can create your own experiment file. Some examples of custom experiment files in the ecoli/experiments folder include:

  • tet_amp_sim: Modifies the initial state to add new bulk molecules (see Bulk Molecules) for antibiotics-related molecules and adds two transcription factor binding sites to all promoters for MarA and MarR. Also adds command-line options for external concentration of tetracycline and ampicillin.

  • metabolism_redux_sim: Replaces the default metabolism process (Metabolism) with experimental alternatives (e.g. MetabolismReduxClassic). Makes use of the object-oriented interface for sim configuration mentioned in Configuration (e.g. sim.total_time = 100).