ecoli.analysis.multivariant.new_gene_translation_efficiency_heatmaps
Plot one value per index via heatmap for new_gene_expression_and_translation_efficiency variant.
Possible Plots:
Percent of sims that successfully reached a given generation number
Average doubling time
Average cell volume, mass, dry cell mass, mRNA mass, protein mass
Average translation efficiency, weighted by cistron count
Average mRNA count, monomer count, mRNA mass fraction, protein mass fraction, RNAP portion, and ribosome portion for a capacity gene to measure burden on overall host expression
Average new gene copy number
Average new gene mRNA count
Average new gene mRNA mass fraction
Average new gene mRNA counts fraction
Average new gene NTP mass fraction
Average new gene protein count
Average new gene protein mass fraction
Average new gene protein counts fraction
Average new gene initialization rate for RNAP and ribosomes
Average new gene initialization probabilities for RNAP and ribosomes
Average count and portion of new gene ribosome initialization events per time step
Average number and proportion of RNAP on new genes at a given time step
Average number and proportion of ribosomes on new gene mRNAs at a given time step
Average number and proportion of RNAP making rRNAs at a given time step
Average number and proportion of RNAP and ribosomes making RNAP subunits at a given time step
Average number and proportion of RNAP and ribosomes making ribosomal proteins at a given time step
Average fraction of time new gene is overcrowded by RNAP and Ribosomes
Average overcrowding probability ratio for new gene RNA synthesis and polypeptide initiation
Average max_p probabilities for RNA synthesis and polypeptide initiation
Average number of overcrowded genes for RNAP and Ribosomes
Average number of total, active, and free ribosomes
Average number of ribosomes initialized at each time step
Average number of total active, and free RNA polymerases
Average ppGpp concentration
Average rate of glucose consumption
Average new gene monomer yields - per hour and per fg of glucose
- ecoli.analysis.multivariant.new_gene_translation_efficiency_heatmaps.COUNT_INDEX = 23
Plot data from generations [MIN_CELL_INDEX, MAX_CELL_INDEX) Note that early generations may not be representative of dynamics due to how they are initialized
- ecoli.analysis.multivariant.new_gene_translation_efficiency_heatmaps.DASHBOARD_FLAG = 2
Standard Deviations Flag
True: Plot an additional copy of all plots with standard deviation displayed insted of the average
False: Plot no additional plots
- ecoli.analysis.multivariant.new_gene_translation_efficiency_heatmaps.FONT_SIZE = 9
Dashboard Flag
0: Separate Only (Each plot is its own file)
1: Dashboard Only (One file with all plots)
2: Both Dashboard and Separate
- ecoli.analysis.multivariant.new_gene_translation_efficiency_heatmaps.GENE_COUNTS_SQL = '\n WITH unnested_counts AS (\n SELECT unnest(gene_counts) AS gene_counts,\n generate_subscripts(gene_counts, 1)\n AS gene_idx, experiment_id, variant, lineage_seed, generation,\n agent_id\n FROM ({subquery})\n ),\n avg_per_cell AS (\n SELECT avg(gene_counts) AS avg_count,\n experiment_id, variant, gene_idx\n FROM unnested_counts\n GROUP BY experiment_id, variant, lineage_seed,\n generation, agent_id, gene_idx\n ),\n avg_per_variant AS (\n SELECT log10(avg(avg_count) + 1) AS avg_count,\n log10(stddev(avg_count) + 1) AS std_count,\n experiment_id, variant, gene_idx\n FROM avg_per_cell\n GROUP BY experiment_id, variant, gene_idx\n )\n SELECT variant, list(avg_count ORDER BY gene_idx) AS mean,\n list(std_count ORDER BY gene_idx) AS std,\n FROM avg_per_variant\n GROUP BY experiment_id, variant\n '
Generic SQL query for calculating average of a 1D-array column per cell, aggregates that per variant into
log10(mean + 1)
andlog10(std + 1)
columns.
- ecoli.analysis.multivariant.new_gene_translation_efficiency_heatmaps.MAX_CELL_INDEX = 24
Specify which subset of heatmaps should be made Completed_gens heatmap is always made, because it is used to create the other heatmaps, and should not be included here. The order listed here will be the order of the heatmaps in the dashboard plot.
- ecoli.analysis.multivariant.new_gene_translation_efficiency_heatmaps.STD_DEV_FLAG = True
Count number of sims that reach this generation (remember index 7 corresponds to generation 8)
- ecoli.analysis.multivariant.new_gene_translation_efficiency_heatmaps.avg_1d_array_over_scalar_sql(array_column, scalar_column)[source]
Create generic SQL query that calculates the average per cell of each element in a 1D array column divided by a scalar column, and aggregates those ratios per variant into mean and std columns.
- ecoli.analysis.multivariant.new_gene_translation_efficiency_heatmaps.avg_1d_array_sql(column)[source]
Create generic SQL query that calculates the average per cell of each element in a 1D array column and aggregates that per variant into mean and std columns.
- ecoli.analysis.multivariant.new_gene_translation_efficiency_heatmaps.avg_ratio_of_1d_arrays_sql(numerator, denominator)[source]
Create generic SQL query that calculates the average per cell of each element in two 1D list columns divided elementwise and aggregates those] ratios per variant into mean and std columns.
- ecoli.analysis.multivariant.new_gene_translation_efficiency_heatmaps.avg_sum_1d_array_over_scalar_sql(array_column, scalar_column)[source]
Create generic SQL query that calculates the average per cell of the sum of elements in a 1D array column divided by a scalar column, and aggregates those ratios per variant as mean and std columns.
- ecoli.analysis.multivariant.new_gene_translation_efficiency_heatmaps.avg_sum_1d_array_sql(column)[source]
Create generic SQL query that calculates the average per cell of the sum of elements in a 1D array column and aggregates that per variant into mean and std columns.
- ecoli.analysis.multivariant.new_gene_translation_efficiency_heatmaps.get_gene_count_fraction_sql(gene_indices, column, index_type)[source]
Construct generic SQL query that gets the average per cell of a select set of indices from a 1D list column divided by the total of all elements per row of that list column, and aggregates those ratios per variant into mean and std columns.
- Parameters:
gene_indices (list[int] | list[list[int]]) – Indices to extract from 1D list column to get ratios for
column (str) – Name of 1D list column
index_type (str) – Can either be
monomer
ormRNA
. Formonomer
, function works exactly as described above. FormRNA
,gene_indices
will be a list of lists of mRNA indices. This is because one gene can have to multiple mRNAs (transcription units). Therefore, we sum the elements corresponding to each gene before proceeding (seeget_rnas_combined_as_genes_projection()
).
- Return type:
- ecoli.analysis.multivariant.new_gene_translation_efficiency_heatmaps.get_gene_mass_prod_func(sim_data, index_type, gene_ids)[source]
Create a function to be passed as the
post_func
argument toget_mean_and_std_matrices()
which multiplies the average and standard deviation 1D array columns by the mass of the gene ID for each element.- Parameters:
sim_data (SimulationDataEcoli) – Simulation data
index_type (str) – Either
mRNA
ormonomer
. IfmRNA
,gene_ids
is list of lists of mRNA IDs, where inner lists correspond to mRNAs for each gene. Therefore, we sum the masses for the mRNAs of each inner list and multiply the input mean and std by this sum per gene.gene_ids (list[str] | list[list[str]]) – IDs of genes in the order they appear in the 1D arrays of the query result
- Return type:
Callable[[Table], Table]
- ecoli.analysis.multivariant.new_gene_translation_efficiency_heatmaps.get_indexes(conn, config_sql, index_type, ids)[source]
Retrieve DuckDB indices of a given type for a set of IDs. Note that DuckDB lists are 1-indexed.
- Parameters:
conn (DuckDBPyConnection) – DuckDB database connection
config_sql (str) – DuckDB SQL query for sim config data (see
get_dataset_sql()
)index_type (str) – Type of indices to return (one of
cistron
,RNA
,mRNA
, ormonomer
)ids (list[str] | list[list[str]]) – List of IDs to get indices for (must be monomer IDs if
index_type
ismonomer
, else mRNA IDs)
- Returns:
List of requested indexes
- Return type:
- ecoli.analysis.multivariant.new_gene_translation_efficiency_heatmaps.get_mRNA_ids_from_monomer_ids(sim_data, target_monomer_ids)[source]
Map monomer IDs back to the mRNA IDs that they were translated from.
- ecoli.analysis.multivariant.new_gene_translation_efficiency_heatmaps.get_mean_and_std_matrices(conn, variant_mapping, variant_matrix_shape, history_sql, columns, projections=None, remove_first=False, func=None, order_results=False, custom_sql=None, post_func=None, num_digits_rounding=None, default_value=None)[source]
Reads one or more columns and calculates mean and std. dev. for each variant. If no custom SQL query is provided, this defaults to averaging per cell, then calculating the averages and standard deviations of all cells per variant.
- Parameters:
conn (DuckDBPyConnection) – DuckDB connection
variant_mapping (dict[int, tuple[int, int]]) – Mapping of variant IDs to row and column in matrix of new gene translation efficiency and expression factor variants
variant_matrix_shape (tuple[int, int]) – Number of rows and columns in variant matrix
history_sql (str) – SQL subquery from
ecoli.library.parquet_emitter.get_dataset_sql()
columns (list[str]) – See
ecoli.library.parquet_emitter.read_stacked_columns()
projections (list[str] | None) – See
ecoli.library.parquet_emitter.read_stacked_columns()
remove_first (bool) – See
ecoli.library.parquet_emitter.read_stacked_columns()
func (Callable | None) – See
ecoli.library.parquet_emitter.read_stacked_columns()
order_results (bool) – See
ecoli.library.parquet_emitter.read_stacked_columns()
custom_sql (str | None) – SQL string containing a placeholder with name
subquery
where the result of read_stacked_columns will be placed. Final query result must only have two columns in order:variant
and a value for each variant. If not provided, defaults to average of averagespost_func (Callable | None) – Function that is called on PyArrow table resulting from query. Should return a PyArrow table with exactly three columns:
variant
for the variant IDs,mean
for some mean aggregate value (can be N-D list column), andstd
for some standard deviation aggregate.num_digits_rounding (int | None) – Number of decimal places to round to
default_value (Any | None) – Default value to put in output variant matrices if variant ID not included in query result (e.g. if variant failed in first generation and had no completed sims)
new_gene_NTP_fraction – Set to True for NTP fraction heatmap so query output is properly handled
- Returns:
Tuple of Numpy matrices with first two dimensions
variant_matrix_shape
. Each cell in first matrix has the mean for that variant. Each cell in the second matrix has the std. dev. for that variant. These values can be Numpy arrays instead of scalar values (e.g. when calculating aggregates for many genes at once), in which case the matrices have shapesvariant_matrix_shape + (num_genes,)
- Return type:
- ecoli.analysis.multivariant.new_gene_translation_efficiency_heatmaps.get_new_gene_mRNA_NTP_fraction_sql(sim_data, new_gene_mRNA_idx, ntp_ids)[source]
Construct SQL query that gets, for each NTP, the fraction used by the mRNAs of each new gene, averages that per cell, and aggregate those fractions per variant into mean and std columns where each row is a 2D list with shape
(# NTPs, # new genes)
.
- ecoli.analysis.multivariant.new_gene_translation_efficiency_heatmaps.get_overcrowding_sql(target_col, actual_col)[source]
Create generic SQL query that calculates for average number of genes for each cell that are overcrowded, then aggregate that per variant into mean and std columns.
- Parameters:
target_col (str) – Name of 1D list column with target values
actual_col (str) – Name of 1D list column with actual values. If the per-cell average of an element in
target_col
is greater than the per-cell average of the corresponding element inactual_col
, we say that the gene for that element is overcrowded.
- Return type:
- ecoli.analysis.multivariant.new_gene_translation_efficiency_heatmaps.get_ribosome_counts_projection(sim_data, bulk_ids)[source]
Return SQL projection to selectively read bulk inactive ribosome count (defined as minimum of free 30S and 50S subunits at any given moment)
- Parameters:
sim_data (SimulationDataEcoli) – Simulation data
- Return type:
- ecoli.analysis.multivariant.new_gene_translation_efficiency_heatmaps.get_rnap_counts_projection(sim_data, bulk_ids)[source]
Return SQL projection to selectively read bulk inactive RNAP count.
- Parameters:
sim_data (SimulationDataEcoli) – Simulation data
- Return type:
- ecoli.analysis.multivariant.new_gene_translation_efficiency_heatmaps.get_rnas_combined_as_genes_projection(column, rna_idx, name, cast_type=None)[source]
Create generic SQL projection that evaluates to a list column where each element is the sum of a subset of elements from the original list column. This is mainly used to sum up all RNA data that corresponds to a single gene / cistron / monomer.
- ecoli.analysis.multivariant.new_gene_translation_efficiency_heatmaps.get_variant_mask(conn, config_sql, variant_to_row_col, variant_matrix_shape)[source]
Get a boolean matrix where the rows represent the different translation efficiencies and the columns represent the different expression factors that were used to create variants. The matrix is True for each combination that was actually simulated and False otherwise.
- ecoli.analysis.multivariant.new_gene_translation_efficiency_heatmaps.plot(params, conn, history_sql, config_sql, sim_data_dict, validation_data_paths, outdir, variant_metadata, variant_names)[source]
Create either a single multi-heatmap plot or 1+ separate heatmaps of data for a grid of new gene variant simulations with varying expression and translation efficiencies.
- ecoli.analysis.multivariant.new_gene_translation_efficiency_heatmaps.plot_heatmaps(heatmap_data, heatmap_details, new_gene_cistron_ids, ntp_ids, capacity_gene_common_names, total_heatmaps_to_make, is_dashboard, variant_mask, heatmap_x_label, heatmap_y_label, new_gene_expression_factors, new_gene_translation_efficiency_values, summary_statistic, figsize_x, figsize_y, plotOutDir, plot_suffix)[source]
Plots all heatmaps in order given by HEATMAPS_TO_MAKE_LIST.
- Parameters:
is_dashboard – Boolean flag for whether we are creating a dashboard of heatmaps or a number of individual heatmaps
variant_mask – np.array of dimension (len(new_gene_translation_efficiency_values), len(new_gene_expression_factors)) with entries set to True if variant was run, False otherwise.
heatmap_x_label – Label for x axis of heatmap
heatmap_y_label – Label for y axis of heatmap
new_gene_expression_factors – New gene expression factors used in these variants
new_gene_translation_efficiency_values – New gene translation efficiency values used in these variants
summary_statistic – Specifies whether average (mean) or standard deviation (std_dev) should be displayed on the heatmaps
figsize_x – Horizontal size of each heatmap
figsize_y – Vertical size of each heatmap
plotOutDir – Output directory for plots
plot_suffix – Suffix to add to plot file names, usually specifying which generations were plotted