wholecell.utils.polymerize

polymerize.py

Polymerizes sequences based on monomer and energy limitations.

Run kernprof -lv wholecell/tests/utils/profile_polymerize.py to get a line profile. It @profile-decorates polymerize().

TODO: - document algorithm/corner cases (should already exist somewhere…)

wholecell.utils.polymerize.buildSequences(base_sequences, indexes, positions, elongation_rates)
wholecell.utils.polymerize.computeMassIncrease(sequences, elongations, monomerMasses)
class wholecell.utils.polymerize.polymerize(sequences, monomerLimits, reactionLimit, randomState, elongation_rates, variable_elongation=False)[source]

Bases: object

Polymerize the given DNA/RNA/protein sequences as far as possible within the given limits.

Parameters:
  • sequences – ndarray of integer, shape (num_sequences, num_steps), the sequences of needed monomer types, containing PAD_VALUE for all steps after sequence completion.

  • monomerLimits – ndarray of integer, shape (num_monomers,), the available number of each monomer type.

  • reactionLimit – max number of reactions (monomers to use); the energy limit.

  • randomState – random number generator to pick winners in shortages.

Returns:

ndarray of integer, shape (num_sequences,)

indicating how far the sequences proceeded,

monomerUsages: ndarray of integer, shape (num_monomers,) counting how

many monomers of each type got used,

nReactions: total number of reactions (monomers used), sequences_limited_elongation: ndarray of bool, shape (num_sequences,),

mask indicating whether the sequences were actually elongated to the max lengths expected from the current step.

Return type:

sequenceElongation

PAD_VALUE = -1
_clamp_elongation_to_sequence_length()[source]

A post-iteration clean-up operation. Restricts the elongation of a sequence to at most its total (unpadded) length.

TODO: explain why we do this here instead of during each iteration

_elongate()[source]

Iteratively elongates sequences up to resource limits.

_elongate_to_limit()[source]

Elongate as far as possible without hitting any resource limitations.

_finalize()[source]

Clean up iteration results.

_finalize_resource_limited_elongations()[source]
_gather_input_dimensions()[source]

Collect information about the size of the inputs.

_gather_sequence_data()[source]

Collect static data about the input sequences.

_prepare_outputs()[source]

Running values that ultimately compose the output of the ‘polymerize’ operation.

_prepare_running_values()[source]

Sets up the variables that will change throughout iteration, including both intermediate calculations and outputs.

_sanitize_inputs()[source]

Enforce array typing, and copy input arrays to prevent side-effects.

_setup()[source]

Extended initialization procedures.

_update_elongation_resource_demands()[source]

After updating the active sequences (initialization and culling), recalculate resource demands for the remaining steps given what sequences remain.

wholecell.utils.polymerize.sum_monomers_reference_implementation(sequenceMonomers, activeSequencesIndexes)

Sum up the total number of monomers of each type needed to continue building the active sequences through currentStep. (This is the Python reference implementation, compiled by Cython.)

Arguments: sequenceMonomers – bool[monomer #, sequence #] indicating whether

a given monomer gets used in a step of a sequence

activeSequencesIndexes – an array of sequences that are still active, i.e.

have not yet run out of source monomers

Result: count[monomer #] indicating how many of each monomer will be needed

by the combined active sequences in the currentStep