runscripts.analysis

runscripts.analysis.ANALYSIS_TYPES = {'multidaughter': ['experiment_id', 'variant', 'lineage_seed', 'generation'], 'multiexperiment': [], 'multigeneration': ['experiment_id', 'variant', 'lineage_seed'], 'multiseed': ['experiment_id', 'variant'], 'multivariant': ['experiment_id'], 'parca': [], 'single': ['experiment_id', 'variant', 'lineage_seed', 'generation', 'agent_id']}

Mapping of all possible analysis types to the combination of identifiers that must be unique for each subset of the data given to that analysis type as input.

runscripts.analysis.FILTERS = {'agent_id': <class 'str'>, 'experiment_id': <class 'str'>, 'generation': <class 'int'>, 'lineage_seed': <class 'int'>, 'variant': <class 'int'>}

Mapping of data filters to data type.

runscripts.analysis.create_duckdb_conn(out_uri, gcs_bucket, n_cpus=None)[source]
runscripts.analysis.main()[source]
runscripts.analysis.parse_variant_data_dir(experiment_id, variant_data_dir)[source]

For each experiment ID and corresponding variant sim data directory, load the variant metadata JSON and parse the variant sim data file names to construct mappings from experiments to variants to variant metadata and variant sim_data paths.

Parameters:
  • experiment_id (list[str]) – List of experiment IDs

  • variant_data_dir (list[str]) – List of directories containing output from create_variants.py, one for each experiment ID, in order

Returns:

Tuple containing three dictionaries:

(
    {experiment_id: {variant_id: variant_metadata, ...}, ...},
    {experiment_id: {variant_id: variant_sim_data_path, ...}, ...}
    {experiment_id: variant_name, ...}
)

Return type:

tuple[dict[str, dict[int, Any]], dict[str, dict[int, str]], dict[str, str]]