wholecell.utils.parallelization

Parallelization utilities.

class wholecell.utils.parallelization.ApplyResult(result)[source]

Bases: object

A substitute for multiprocessing.ApplyResult() to return with apply_async. Will get created after a successful function call so ready() and successful() are always True.

get(timeout=None)[source]
ready()[source]
successful()[source]
wait(timeout=None)[source]
class wholecell.utils.parallelization.InlinePool[source]

Bases: object

A substitute for multiprocessing.Pool() that runs the work inline in the current process. This is important because (1) a regular Pool worker cannot construct a nested Pool (even with processes=1) since daemon processes are not allowed to have children, and (2) it’s easier to debug code running in the main process.

apply_async(func, args=(), kwds=None, callback=None)[source]

Apply the function to the args serially (not asynchronously since only one process available).

Parameters:
Return type:

ApplyResult

close()[source]
join()[source]
map(func, iterable, chunksize=None)[source]

Map the function over the iterable.

Parameters:
Return type:

list

terminate()[source]
class wholecell.utils.parallelization.NoDaemonContext[source]

Bases: ForkContext

Process

alias of NoDaemonProcess

class wholecell.utils.parallelization.NoDaemonPool(*args, **kwargs)[source]

Bases: Pool

A substitute for multiprocessing.Pool() that creates a pool that is not a daemonic process. This allows for nesting pool calls that would otherwise be prevented with an assertion error (AssertionError: daemonic processes are not allowed to have children).

class wholecell.utils.parallelization.NoDaemonProcess(group=None, target=None, name=None, args=(), kwargs={}, *, daemon=None)[source]

Bases: Process

property daemon
wholecell.utils.parallelization.cpus(requested_num_processes=None)[source]

Return the usable number of worker processes for a multiprocessing Pool, up to requested_num_processes (default = max available), considering SLURM and any other environment-specific limitations. 1 means do all work in-process rather than forking subprocesses.

On SLURM: This reads the environment variable ‘SLURM_CPUS_PER_TASK’ containing the number of CPUs requested per task but since that’s only set if the –cpus-per-task option was specified, this falls back to ‘SLURM_JOB_CPUS_PER_NODE’ containing the number of processors available to the job on this node.

By default, srun sets:

SLURM_CPUS_ON_NODE=1 SLURM_JOB_CPUS_PER_NODE=1

srun -p mcovert –cpus-per-task=2:

SLURM_CPUS_PER_TASK=2 SLURM_CPUS_ON_NODE=2 SLURM_JOB_CPUS_PER_NODE=2

srun –ntasks-per-node=3 –cpus-per-task=4:

SLURM_CPUS_PER_TASK=4 SLURM_CPUS_ON_NODE=12 SLURM_JOB_CPUS_PER_NODE=12

srun –ntasks-per-node=3:

SLURM_CPUS_ON_NODE=3 SLURM_JOB_CPUS_PER_NODE=3

Parameters:

requested_num_processes (int | None) – the requested number of worker processes; None or 0 means return the max usable number

Returns:

the usable number of worker processes for a Pool, as limited

by the hardware, OS, SLURM, and requested_num_processes.

==> 1 means DO NOT CREATE WORKER SUBPROCESSES.

Return type:

num_cpus

See also pool().

See https://slurm.schedmd.com/sbatch.html

See https://github.com/CovertLab/wcEcoli/issues/392

wholecell.utils.parallelization.is_macos()[source]

Return True if this is running on macOS.

Return type:

bool

wholecell.utils.parallelization.pool(num_processes=None, nestable=False)[source]

Return an InlinePool if cpus(num_processes) == 1, else a multiprocessing Pool(cpus(num_processes)), as suitable for the current runtime environment.

This uses the ‘spawn’ process start method to create a fresh python interpreter process, avoiding threading problems and cross-platform inconsistencies.

nestable can create a pool of non-daemon worker processes that can spawn nested processes and have have nested pools.

See cpus() on figuring the number of usable processes. See InlinePool about why running in-process is important.

Parameters:
  • num_processes (int | None)

  • nestable (bool)

Return type:

Pool | InlinePool