`wholecell.io.ingestion`

Utilities for ingesting experimental data (e.g. RNA-seq transcriptomes) using the canonical Pandera schemas in wholecell.io.schemas.

This module is intentionally narrow for now: - Load TSVs into pandas DataFrames. - Validate them against the RNA-seq schemas. - Provide a small convenience wrapper to fetch a single transcriptome

given a manifest and dataset_id.

wholecell.io.ingestion._read_tsv(path)[source]

Read a tab-delimited file into a DataFrame.

Parameters:: path (str | Path)
Return type:: DataFrame

wholecell.io.ingestion.ingest_rnaseq_manifest(path)[source]

Load and validate an RNA-seq samples manifest.

Relative file_path entries are resolved relative to the manifest directory for convenience.

Parameters:: path (str | Path) – Path to the manifest TSV file.
Returns:: Validated manifest with file_path normalized to absolute paths.
Return type:: pandas.DataFrame

wholecell.io.ingestion.ingest_rnaseq_tpm_table(path)[source]

Load and validate a single RNA-seq TPM table.

Parameters:: path (str | Path) – Path to a TSV file with columns matching RnaseqTpmTableSchema.
Returns:: Validated DataFrame; extra columns are preserved but only the required/optional schema columns are validated.
Return type:: pandas.DataFrame

wholecell.io.ingestion.ingest_transcriptome(manifest_path, dataset_id)[source]

Ingest a single transcriptome (TPM table) specified by dataset_id.

This is a convenience wrapper that: 1) Validates the manifest. 2) Looks up the row with the given dataset_id. 3) Loads and validates the corresponding TPM table.

Parameters:

manifest_path (str | Path) – Path to the RNA-seq samples manifest TSV.
dataset_id (str) – Identifier of the dataset to load (must match a dataset_id row).

Returns:

Validated TPM table for the requested dataset.
Metadata dict for the selected manifest row.

Return type:

(pandas.DataFrame, dict)

Raises:

KeyError – If dataset_id is not found in the manifest.
ValueError – If multiple rows share the same dataset_id.

wholecell.io.ingestion

`wholecell.io.ingestion`