runscripts.analysis

runscripts.analysis.ANALYSIS_TYPES = {'multidaughter': ['experiment_id', 'variant', 'lineage_seed', 'generation'], 'multiexperiment': [], 'multigeneration': ['experiment_id', 'variant', 'lineage_seed'], 'multiseed': ['experiment_id', 'variant'], 'multivariant': ['experiment_id'], 'parca': [], 'single': ['experiment_id', 'variant', 'lineage_seed', 'generation', 'agent_id']}

Mapping of all possible analysis types to the combination of identifiers that must be unique for each subset of the data given to that analysis type as input.

runscripts.analysis.FILTERS = {'agent_id': <class 'str'>, 'experiment_id': <class 'str'>, 'generation': <class 'int'>, 'lineage_seed': <class 'int'>, 'variant': <class 'int'>}

Mapping of data filters to data type.

runscripts.analysis.build_duckdb_filter(config)[source]

Build a DuckDB WHERE clause from config filters.

Parameters:

config (dict) – Configuration dictionary with filter values

Returns:

DuckDB WHERE clause string

Return type:

str

runscripts.analysis.build_query_strings(analysis_type, duckdb_filter, config_sql, history_sql, success_sql, outdir, conn)[source]

Build query strings for a given analysis type.

Parameters:
  • analysis_type (str) – Type of analysis (e.g., “multivariant”, “single”)

  • duckdb_filter (str) – DuckDB WHERE clause

  • config_sql (str) – SQL query for config data

  • history_sql (str) – SQL query for history data

  • success_sql (str) – SQL query for success data

  • outdir (str) – Output directory path

  • conn – DuckDB connection

Returns:

Dictionary mapping filter strings to tuples of (history_query, config_query, success_query, output_dir, variant_set)

Return type:

dict[str, tuple[str, str, str, str, set]]

runscripts.analysis.filter_variant_dicts(variant_set, variant_metadata, sim_data_dict, variant_names)[source]

Filter variant dictionaries to only include variants in the given set.

Parameters:
  • variant_set (set[tuple[str, int]]) – Set of (experiment_id, variant_id) tuples to keep

  • variant_metadata (dict[str, dict[int, Any]]) – Full variant metadata dictionary

  • sim_data_dict (dict[str, dict[int, str]]) – Full sim_data dictionary

  • variant_names (dict[str, str]) – Variant names dictionary

Returns:

Tuple of (filtered_variant_metadata, filtered_sim_data_dict, filtered_variant_names)

Return type:

tuple[dict[str, dict[int, Any]], dict[str, dict[int, str]], dict[str, str]]

runscripts.analysis.load_variant_metadata(config)[source]

Load variant metadata from configured sources.

Parameters:

config (dict) – Configuration dictionary

Returns:

Tuple of (variant_metadata, sim_data_dict, variant_names)

Raises:
  • KeyError – If experiment_id not in config

  • AssertionError – If multiple experiment IDs without proper variant_data_dir

Return type:

tuple[dict[str, dict[int, Any]], dict[str, dict[int, str]], dict[str, str]]

runscripts.analysis.main()[source]
runscripts.analysis.make_sim_data_dict(exp_id, variants, sim_data_path)[source]
Parameters:
runscripts.analysis.parse_cpu_arg()[source]
runscripts.analysis.parse_variant_data_dir(experiment_id, variant_data_dir)[source]

For each experiment ID and corresponding variant sim data directory, load the variant metadata JSON and parse the variant sim data file names to construct mappings from experiments to variants to variant metadata and variant sim_data paths.

Parameters:
  • experiment_id (list[str]) – List of experiment IDs

  • variant_data_dir (list[str]) – List of directories containing output from create_variants.py, one for each experiment ID, in order

Returns:

Tuple containing three dictionaries:

(
    {experiment_id: {variant_id: variant_metadata, ...}, ...},
    {experiment_id: {variant_id: variant_sim_data_path, ...}, ...}
    {experiment_id: variant_name, ...}
)

Return type:

tuple[dict[str, dict[int, Any]], dict[str, dict[int, str]], dict[str, str]]

runscripts.analysis.run_analysis_loop(config, conn, history_sql, config_sql, success_sql, duckdb_filter, variant_metadata, sim_data_dict, variant_names)[source]

Run the main analysis loop for all configured analysis types.

Parameters:
  • config (dict) – Configuration dictionary with analysis_types and options

  • conn – DuckDB connection

  • history_sql (str) – SQL query for history data

  • config_sql (str) – SQL query for config data

  • success_sql (str) – SQL query for success data

  • duckdb_filter (str) – DuckDB WHERE clause for filtering data

  • variant_metadata (dict) – Variant metadata dictionary

  • sim_data_dict (dict) – Sim data dictionary

  • variant_names (dict) – Variant names dictionary

Returns:

{“total_runs”: N, “skipped”: M, “errors”: K}

Return type:

Dictionary with statistics about analyses run