ecoli.analysis.multivariant.new_gene_translation_efficiency_heatmaps

Plot one value per index via heatmap for new_gene_expression_and_translation_efficiency variant.

Possible Plots:

  • Percent of sims that successfully reached a given generation number

  • Average doubling time

  • Average cell volume, mass, dry cell mass, mRNA mass, protein mass

  • Average translation efficiency, weighted by cistron count

  • Average mRNA count, monomer count, mRNA mass fraction, protein mass fraction, RNAP portion, and ribosome portion for a capacity gene to measure burden on overall host expression

  • Average new gene copy number

  • Average new gene mRNA count

  • Average new gene mRNA mass fraction

  • Average new gene mRNA counts fraction

  • Average new gene NTP mass fraction

  • Average new gene protein count

  • Average new gene protein mass fraction

  • Average new gene protein counts fraction

  • Average new gene initialization rate for RNAP and ribosomes

  • Average new gene initialization probabilities for RNAP and ribosomes

  • Average count and portion of new gene ribosome initialization events per time step

  • Average number and proportion of RNAP on new genes at a given time step

  • Average number and proportion of ribosomes on new gene mRNAs at a given time step

  • Average number and proportion of RNAP making rRNAs at a given time step

  • Average number and proportion of RNAP and ribosomes making RNAP subunits at a given time step

  • Average number and proportion of RNAP and ribosomes making ribosomal proteins at a given time step

  • Average fraction of time new gene is overcrowded by RNAP and Ribosomes

  • Average overcrowding probability ratio for new gene RNA synthesis and polypeptide initiation

  • Average max_p probabilities for RNA synthesis and polypeptide initiation

  • Average number of overcrowded genes for RNAP and Ribosomes

  • Average number of total, active, and free ribosomes

  • Average number of ribosomes initialized at each time step

  • Average number of total active, and free RNA polymerases

  • Average ppGpp concentration

  • Average rate of glucose consumption

  • Average new gene monomer yields - per hour and per fg of glucose

ecoli.analysis.multivariant.new_gene_translation_efficiency_heatmaps.COUNT_INDEX = 23

Plot data from generations [MIN_CELL_INDEX, MAX_CELL_INDEX) Note that early generations may not be representative of dynamics due to how they are initialized

ecoli.analysis.multivariant.new_gene_translation_efficiency_heatmaps.DASHBOARD_FLAG = 2

Standard Deviations Flag

  • True: Plot an additional copy of all plots with standard deviation displayed insted of the average

  • False: Plot no additional plots

ecoli.analysis.multivariant.new_gene_translation_efficiency_heatmaps.FONT_SIZE = 9

Dashboard Flag

  • 0: Separate Only (Each plot is its own file)

  • 1: Dashboard Only (One file with all plots)

  • 2: Both Dashboard and Separate

ecoli.analysis.multivariant.new_gene_translation_efficiency_heatmaps.GENE_COUNTS_SQL = '\n    WITH unnested_counts AS (\n        SELECT unnest(gene_counts) AS gene_counts,\n            generate_subscripts(gene_counts, 1)\n            AS gene_idx, experiment_id, variant, lineage_seed, generation,\n            agent_id\n        FROM ({subquery})\n    ),\n    avg_per_cell AS (\n        SELECT avg(gene_counts) AS avg_count,\n            experiment_id, variant, gene_idx\n        FROM unnested_counts\n        GROUP BY experiment_id, variant, lineage_seed,\n            generation, agent_id, gene_idx\n    ),\n    avg_per_variant AS (\n        SELECT log10(avg(avg_count) + 1) AS avg_count,\n            log10(stddev(avg_count) + 1) AS std_count,\n            experiment_id, variant, gene_idx\n        FROM avg_per_cell\n        GROUP BY experiment_id, variant, gene_idx\n    )\n    SELECT variant, list(avg_count ORDER BY gene_idx) AS mean,\n        list(std_count ORDER BY gene_idx) AS std,\n    FROM avg_per_variant\n    GROUP BY experiment_id, variant\n    '

Generic SQL query for calculating average of a 1D-array column per cell, aggregates that per variant into log10(mean + 1) and log10(std + 1) columns.

ecoli.analysis.multivariant.new_gene_translation_efficiency_heatmaps.MAX_CELL_INDEX = 24

Specify which subset of heatmaps should be made Completed_gens heatmap is always made, because it is used to create the other heatmaps, and should not be included here. The order listed here will be the order of the heatmaps in the dashboard plot.

ecoli.analysis.multivariant.new_gene_translation_efficiency_heatmaps.STD_DEV_FLAG = True

Count number of sims that reach this generation (remember index 7 corresponds to generation 8)

ecoli.analysis.multivariant.new_gene_translation_efficiency_heatmaps.avg_1d_array_over_scalar_sql(array_column, scalar_column)[source]

Create generic SQL query that calculates the average per cell of each element in a 1D array column divided by a scalar column, and aggregates those ratios per variant into mean and std columns.

Parameters:
  • array_column (str) – Name of 1D list column to aggregate

  • scalar_column (str) – Name of scalar column to divide array_column cell averages by

Return type:

str

ecoli.analysis.multivariant.new_gene_translation_efficiency_heatmaps.avg_1d_array_sql(column)[source]

Create generic SQL query that calculates the average per cell of each element in a 1D array column and aggregates that per variant into mean and std columns.

Parameters:

column (str) – Name of 1D list column to aggregate

Return type:

str

ecoli.analysis.multivariant.new_gene_translation_efficiency_heatmaps.avg_ratio_of_1d_arrays_sql(numerator, denominator)[source]

Create generic SQL query that calculates the average per cell of each element in two 1D list columns divided elementwise and aggregates those] ratios per variant into mean and std columns.

Parameters:
  • numerator (str) – Name of 1D list column that will be numerator in ratio

  • denominator (str) – Name of 1D list column that will be denominator in ratio

Return type:

str

ecoli.analysis.multivariant.new_gene_translation_efficiency_heatmaps.avg_sum_1d_array_over_scalar_sql(array_column, scalar_column)[source]

Create generic SQL query that calculates the average per cell of the sum of elements in a 1D array column divided by a scalar column, and aggregates those ratios per variant as mean and std columns.

Parameters:
  • array_column (str) – Name of 1D list column to aggregate

  • scalar_column (str) – Name of scalar column to divide array_column cell averages by

Return type:

str

ecoli.analysis.multivariant.new_gene_translation_efficiency_heatmaps.avg_sum_1d_array_sql(column)[source]

Create generic SQL query that calculates the average per cell of the sum of elements in a 1D array column and aggregates that per variant into mean and std columns.

Parameters:

column (str) – Name of 1D list column to aggregate

Return type:

str

ecoli.analysis.multivariant.new_gene_translation_efficiency_heatmaps.get_gene_count_fraction_sql(gene_indices, column, index_type)[source]

Construct generic SQL query that gets the average per cell of a select set of indices from a 1D list column divided by the total of all elements per row of that list column, and aggregates those ratios per variant into mean and std columns.

Parameters:
  • gene_indices (list[int] | list[list[int]]) – Indices to extract from 1D list column to get ratios for

  • column (str) – Name of 1D list column

  • index_type (str) – Can either be monomer or mRNA. For monomer, function works exactly as described above. For mRNA, gene_indices will be a list of lists of mRNA indices. This is because one gene can have to multiple mRNAs (transcription units). Therefore, we sum the elements corresponding to each gene before proceeding (see get_rnas_combined_as_genes_projection()).

Return type:

str

ecoli.analysis.multivariant.new_gene_translation_efficiency_heatmaps.get_gene_mass_prod_func(sim_data, index_type, gene_ids)[source]

Create a function to be passed as the post_func argument to get_mean_and_std_matrices() which multiplies the average and standard deviation 1D array columns by the mass of the gene ID for each element.

Parameters:
  • sim_data (SimulationDataEcoli) – Simulation data

  • index_type (str) – Either mRNA or monomer. If mRNA, gene_ids is list of lists of mRNA IDs, where inner lists correspond to mRNAs for each gene. Therefore, we sum the masses for the mRNAs of each inner list and multiply the input mean and std by this sum per gene.

  • gene_ids (list[str] | list[list[str]]) – IDs of genes in the order they appear in the 1D arrays of the query result

Return type:

Callable[[Table], Table]

ecoli.analysis.multivariant.new_gene_translation_efficiency_heatmaps.get_indexes(conn, config_sql, index_type, ids)[source]

Retrieve DuckDB indices of a given type for a set of IDs. Note that DuckDB lists are 1-indexed.

Parameters:
  • conn (DuckDBPyConnection) – DuckDB database connection

  • config_sql (str) – DuckDB SQL query for sim config data (see get_dataset_sql())

  • index_type (str) – Type of indices to return (one of cistron, RNA, mRNA, or monomer)

  • ids (list[str] | list[list[str]]) – List of IDs to get indices for (must be monomer IDs if index_type is monomer, else mRNA IDs)

Returns:

List of requested indexes

Return type:

list[int | None] | list[list[int | None]]

ecoli.analysis.multivariant.new_gene_translation_efficiency_heatmaps.get_mRNA_ids_from_monomer_ids(sim_data, target_monomer_ids)[source]

Map monomer IDs back to the mRNA IDs that they were translated from.

Parameters:
Returns:

List of mRNA ID lists, one for each monomer ID

Return type:

list[list[str]]

ecoli.analysis.multivariant.new_gene_translation_efficiency_heatmaps.get_mean_and_std_matrices(conn, variant_mapping, variant_matrix_shape, history_sql, columns, projections=None, remove_first=False, func=None, order_results=False, custom_sql=None, post_func=None, num_digits_rounding=None, default_value=None)[source]

Reads one or more columns and calculates mean and std. dev. for each variant. If no custom SQL query is provided, this defaults to averaging per cell, then calculating the averages and standard deviations of all cells per variant.

Parameters:
Returns:

Tuple of Numpy matrices with first two dimensions variant_matrix_shape. Each cell in first matrix has the mean for that variant. Each cell in the second matrix has the std. dev. for that variant. These values can be Numpy arrays instead of scalar values (e.g. when calculating aggregates for many genes at once), in which case the matrices have shapes variant_matrix_shape + (num_genes,)

Return type:

tuple[ndarray, ndarray]

ecoli.analysis.multivariant.new_gene_translation_efficiency_heatmaps.get_new_gene_mRNA_NTP_fraction_sql(sim_data, new_gene_mRNA_idx, ntp_ids)[source]

Construct SQL query that gets, for each NTP, the fraction used by the mRNAs of each new gene, averages that per cell, and aggregate those fractions per variant into mean and std columns where each row is a 2D list with shape (# NTPs, # new genes).

Parameters:
  • sim_data (SimulationDataEcoli) – Simulation data

  • new_gene_mRNA_idx (list[list[int]]) – List of lists of mRNA indices for each new gene

  • ntp_ids (list[str]) – IDs for NTPs in same order that they appear in sim_data.process.transcription.rna_data["counts_ACGU"]

Return type:

str

ecoli.analysis.multivariant.new_gene_translation_efficiency_heatmaps.get_overcrowding_sql(target_col, actual_col)[source]

Create generic SQL query that calculates for average number of genes for each cell that are overcrowded, then aggregate that per variant into mean and std columns.

Parameters:
  • target_col (str) – Name of 1D list column with target values

  • actual_col (str) – Name of 1D list column with actual values. If the per-cell average of an element in target_col is greater than the per-cell average of the corresponding element in actual_col, we say that the gene for that element is overcrowded.

Return type:

str

ecoli.analysis.multivariant.new_gene_translation_efficiency_heatmaps.get_ribosome_counts_projection(sim_data, bulk_ids)[source]

Return SQL projection to selectively read bulk inactive ribosome count (defined as minimum of free 30S and 50S subunits at any given moment)

Parameters:
Return type:

str

ecoli.analysis.multivariant.new_gene_translation_efficiency_heatmaps.get_rnap_counts_projection(sim_data, bulk_ids)[source]

Return SQL projection to selectively read bulk inactive RNAP count.

Parameters:
Return type:

str

ecoli.analysis.multivariant.new_gene_translation_efficiency_heatmaps.get_rnas_combined_as_genes_projection(column, rna_idx, name, cast_type=None)[source]

Create generic SQL projection that evaluates to a list column where each element is the sum of a subset of elements from the original list column. This is mainly used to sum up all RNA data that corresponds to a single gene / cistron / monomer.

Parameters:
ecoli.analysis.multivariant.new_gene_translation_efficiency_heatmaps.get_variant_mask(conn, config_sql, variant_to_row_col, variant_matrix_shape)[source]

Get a boolean matrix where the rows represent the different translation efficiencies and the columns represent the different expression factors that were used to create variants. The matrix is True for each combination that was actually simulated and False otherwise.

Parameters:
Return type:

ndarray[Any, dtype[bool]]

ecoli.analysis.multivariant.new_gene_translation_efficiency_heatmaps.plot(params, conn, history_sql, config_sql, sim_data_dict, validation_data_paths, outdir, variant_metadata, variant_names)[source]

Create either a single multi-heatmap plot or 1+ separate heatmaps of data for a grid of new gene variant simulations with varying expression and translation efficiencies.

Parameters:
ecoli.analysis.multivariant.new_gene_translation_efficiency_heatmaps.plot_heatmaps(heatmap_data, heatmap_details, new_gene_cistron_ids, ntp_ids, capacity_gene_common_names, total_heatmaps_to_make, is_dashboard, variant_mask, heatmap_x_label, heatmap_y_label, new_gene_expression_factors, new_gene_translation_efficiency_values, summary_statistic, figsize_x, figsize_y, plotOutDir, plot_suffix)[source]

Plots all heatmaps in order given by HEATMAPS_TO_MAKE_LIST.

Parameters:
  • is_dashboard – Boolean flag for whether we are creating a dashboard of heatmaps or a number of individual heatmaps

  • variant_mask – np.array of dimension (len(new_gene_translation_efficiency_values), len(new_gene_expression_factors)) with entries set to True if variant was run, False otherwise.

  • heatmap_x_label – Label for x axis of heatmap

  • heatmap_y_label – Label for y axis of heatmap

  • new_gene_expression_factors – New gene expression factors used in these variants

  • new_gene_translation_efficiency_values – New gene translation efficiency values used in these variants

  • summary_statistic – Specifies whether average (mean) or standard deviation (std_dev) should be displayed on the heatmaps

  • figsize_x – Horizontal size of each heatmap

  • figsize_y – Vertical size of each heatmap

  • plotOutDir – Output directory for plots

  • plot_suffix – Suffix to add to plot file names, usually specifying which generations were plotted