wholecell.utils.fitting

wholecell.utils.fitting.calcProteinCounts(sim_data, monomerMass)[source]
wholecell.utils.fitting.calcProteinDistribution(sim_data)[source]
wholecell.utils.fitting.calcProteinTotalCounts(sim_data, monomerMass, monomerExpression)[source]
wholecell.utils.fitting.cosine_similarity(samples)[source]

Finds the cosine similarity between samples.

samples is a matrix of size (n_samples, sample_size)

The output is a matrix of size (n_samples, n_samples).

The cosine similarity is the normalized dot product between two vectors. The name originates from the fact that the normalized dot product between two vectors is equal to the cosine of the angle formed by the two vectors.

wholecell.utils.fitting.countsFromMassAndExpression(mass, mws, relativeExpression, nAvogadro)[source]
Parameters:
  • mass (float) – Total mass you want counts to sum to

  • mws (ndarray[Any, dtype[float64]]) – Molecular weights of each species

  • relativeExpression (ndarray[Any, dtype[float64]]) – Relative expression of each species

  • nAvogadro (float) – Avogadro’s number

Returns:

Counts of each molecule

Return type:

ndarray[Any, dtype[float64]]

wholecell.utils.fitting.fit_linearized_transforms(x, y, x_fun=None, y_fun=None, r_tol=0.99, p_tol=0.01, verbose=None)[source]

Transforms x and y data based on a set of functions and finds the transforms that lead to the best linear fit of the data. Can use the return values as args to interpolate_linearized_fit in order to interpolate new x values.

Parameters:
  • x (ndarray) – x data to fit

  • y (ndarray) – y data to fit

  • x_fun (List | None) – list of functions to try for transforming x data

  • y_fun (List | None) – list of functions to try for transforming y data

  • r_tol (float) – best fit r value needs to be higher than this value

  • p_tol (float) – best fit p value needs to be lower than this value

  • verbose (float | None) – if given, prints the r and p value for each function pair that results in an r value higher than this

Returns:

4-element tuple containing

  • x_transform: name of the best transformation function for x data

  • y_transform: name of the best transformation function for y data

  • slope: best fit slope for the transformed data

  • intercept: best fit intercept for the transformed data

Return type:

Tuple[str, str, float, float]

wholecell.utils.fitting.interpolate_linearized_fit(x, x_transform, y_transform, slope, intercept)[source]

Interpolate one or more values based on the linearized fit parameters.

wholecell.utils.fitting.masses_and_counts_for_homeostatic_target(dry_mass_of_non_small_molecules, concentrations, weights, cell_density, avogadros_number)[source]

Computes the dry mass fractions and counts associated with small molecules to maintain concentrations consistent with targets. (Also includes water.)

The cell is composed of a number of ‘mass fractions’ i.e. DNA, RNA, protein, water, and the less specific “small molecules” which includes both inorganic and organic molecular species that form part of a cell. While we take many of the former calculations as ground truth, we chose to adjust (recompute) the small molecule mass fraction according to per-molecule observations of small molecule concentrations (compiled from various sources).

However, this creates a potential issue: we need the small molecule mass to compute the volume, and the volume in turn is used to compute the counts (and therefore masses) of the small molecules. We denote the first small molecule mass as Ms, and the second as Ms’.

The total mass of the cell, Mt, is the sum of the small and non-small molecule masses:

Mt = Ms + Mns

The volume of the cell V times the density of the cell rho is Mt, and therefore

rho * V = Ms + Mns

Ms = rho * V - Mns

This gives us our first calculation of the small molecule mass. For the second calculation, we first find the abundance of each small molecule species (count n_i) as

n_i = V * c_i

where c_i is the concentration of each species. Then the mass associated with each species is

m_i = V * w_i * c_i

where w_i is the molecular weight of a given species,. Finally, the total small molecule mass, estimated from small molecule counts, is

Ms’ = sum_i m_i = V * w^T c

where w^T c is the dot-product between the two vectors.

Equating Ms’ and Ms, and solving for V:

V = Mns / (rho - w^Tc)

This allows us to compute the new volume, from which we can also compute and return all n_i and m_i.

Parameters:
  • dry_mass_of_non_small_molecules (Unum) – float unit’d scalar, dimensions of mass The total mass of the cell, minus the ‘wet’ mass (water) and the dry mass of other small molecules.

  • concentrations (Unum) – 1-D float unit’d array, with dimensions of concentration The target concentrations of the small molecules.

  • weights (Unum) – 1-D float unit’d array, with dimensions of mass per mol The molecular weights of the small molecules.

  • cell_density (Unum) – float unit’d scalar, dimensions of mass per volume The total density of the cell (wet and dry mass).

  • avogadros_number (Unum) – float unit’d scalar, dimensions of per mol The number of molecules per mole.

Returns:

2-element tuple containing

  • masses: The mass associated with each molecular species,

  • counts: The counts associated with each molecular species

Return type:

tuple[Unum, Unum]

wholecell.utils.fitting.normalize(array)[source]