wholecell.io.ingestion
Utilities for ingesting experimental data (e.g. RNA-seq transcriptomes)
using the canonical Pandera schemas in wholecell.io.schemas.
This module is intentionally narrow for now: - Load TSVs into pandas DataFrames. - Validate them against the RNA-seq schemas. - Provide a small convenience wrapper to fetch a single transcriptome
given a manifest and
dataset_id.
- wholecell.io.ingestion.ingest_rnaseq_manifest(path)[source]
Load and validate an RNA-seq samples manifest.
Relative
file_pathentries are resolved relative to the manifest directory for convenience.- Parameters:
- Returns:
Validated manifest with
file_pathnormalized to absolute paths.- Return type:
- wholecell.io.ingestion.ingest_rnaseq_tpm_table(path)[source]
Load and validate a single RNA-seq TPM table.
- wholecell.io.ingestion.ingest_transcriptome(manifest_path, dataset_id)[source]
Ingest a single transcriptome (TPM table) specified by
dataset_id.This is a convenience wrapper that: 1) Validates the manifest. 2) Looks up the row with the given
dataset_id. 3) Loads and validates the corresponding TPM table.- Parameters:
- Returns:
Validated TPM table for the requested dataset.
Metadata dict for the selected manifest row.
- Return type:
- Raises:
KeyError – If
dataset_idis not found in the manifest.ValueError – If multiple rows share the same
dataset_id.