Turn bulk RNA-seq cohorts into synthetic single-cell datasets using omicverse's Bulk2Single workflow for cell fraction estimation, beta-VAE generation, and quality control comparisons against reference scRNA-seq.
Installation
Details
Usage
After installing, this skill will be available to your AI coding assistant.
Verify installation:
skills listSkill Instructions
name: bulk-rna-seq-deconvolution-with-bulk2single title: Bulk RNA-seq deconvolution with Bulk2Single description: Turn bulk RNA-seq cohorts into synthetic single-cell datasets using omicverse's Bulk2Single workflow for cell fraction estimation, beta-VAE generation, and quality control comparisons against reference scRNA-seq.
Bulk RNA-seq deconvolution with Bulk2Single
Overview
Use this skill when a user wants to reconstruct single-cell profiles from bulk RNA-seq together with a matched reference scRNA-seq atlas. It follows t_bulk2single.ipynb, which demonstrates how to harmonise PDAC bulk replicates, train the beta-VAE generator, and benchmark the output cells against dentate gyrus scRNA-seq.
Instructions
- Load libraries and data
- Import
omicverse as ov,scanpy as sc,scvelo as scv,anndata, andmatplotlib.pyplot as plt, then callov.plot_set()to match omicverse styling. - Read the bulk counts table with
ov.read(...)/ov.utils.read(...)and harmonise gene identifiers viaov.bulk.Matrix_ID_mapping(<df>, 'genesets/pair_GRCm39.tsv'). - Load the reference scRNA-seq AnnData (e.g.,
scv.datasets.dentategyrus()) and confirm the cluster labels (stored inadata.obs['clusters']).
- Import
- Initialise the Bulk2Single model
- Instantiate
ov.bulk2single.Bulk2Single(bulk_data=bulk_df, single_data=adata, celltype_key='clusters', bulk_group=['dg_d_1', 'dg_d_2', 'dg_d_3'], top_marker_num=200, ratio_num=1, gpu=0). - Explain GPU selection (
gpu=-1forces CPU) and howbulk_groupnames align with column IDs in the bulk matrix.
- Instantiate
- Estimate cell fractions
- Call
model.predicted_fraction()to run the integrated TAPE estimator, then plot stacked bar charts per sample to validate proportions. - Encourage saving the fraction table for downstream reporting (
df.to_csv(...)).
- Call
- Preprocess for beta-VAE
- Execute
model.bulk_preprocess_lazy(),model.single_preprocess_lazy(), andmodel.prepare_input()to produce matched feature spaces. - Clarify that the lazy preprocessing expects raw counts; skip if the user has already log-normalised data and instead provide aligned matrices manually.
- Execute
- Train or load the beta-VAE
- Train with
model.train(batch_size=512, learning_rate=1e-4, hidden_size=256, epoch_num=3500, vae_save_dir='...', vae_save_name='dg_vae', generate_save_dir='...', generate_save_name='dg'). - Mention early stopping via
patienceand how to resume by reloading weights withmodel.load('.../dg_vae.pth'). - Use
model.plot_loss()to monitor convergence.
- Train with
- Generate and filter synthetic cells
- Produce an AnnData using
model.generate()and reduce noise throughmodel.filtered(generate_adata, leiden_size=25). - Store the filtered AnnData (
.write_h5ad) for reuse, noting it contains PCA embeddings inobsm['X_pca'].
- Produce an AnnData using
- Benchmark against the reference atlas
- Plot cell-type compositions with
ov.bulk2single.bulk2single_plot_cellprop(...)for both generated and reference data. - Assess correlation using
ov.bulk2single.bulk2single_plot_correlation(single_data, generate_adata, celltype_key='clusters'). - Embed with
generate_adata.obsm['X_mde'] = ov.utils.mde(generate_adata.obsm['X_pca'])and visualise viaov.utils.embedding(..., color=['clusters'], palette=ov.utils.pyomic_palette()).
- Plot cell-type compositions with
- Troubleshooting tips
- If marker selection fails, increase
top_marker_numor provide a curated marker list. - Alignment errors typically stem from mismatched
bulk_groupnames—double-check column IDs in the bulk matrix. - Training on CPU can take several hours; advise switching
gputo an available CUDA device for speed.
- If marker selection fails, increase
Examples
- "Estimate cell fractions for PDAC bulk replicates and generate synthetic scRNA-seq using Bulk2Single."
- "Load a pre-trained Bulk2Single model, regenerate cells, and compare cluster proportions to the dentate gyrus atlas."
- "Plot correlation heatmaps between generated cells and reference clusters after filtering noisy synthetic cells."
References
- Tutorial notebook:
t_bulk2single.ipynb - Example data and weights:
omicverse_guide/docs/Tutorials-bulk2single/data/ - Quick copy/paste commands:
reference.md
More by Starlitnightly
View allCreate professional PDF reports with text, tables, and embedded images using reportlab. Works with ANY LLM provider (GPT, Gemini, Claude, etc.).
bulk-rna-seq-batch-correction-with-combat: Use omicverse's pyComBat wrapper to remove batch effects from merged bulk RNA-seq or microarray cohorts, export corrected matrices, and benchmark pre/post correction visualisations.
Map scRNA-seq atlases onto spatial transcriptomics slides using omicverse's Single2Spatial workflow for deep-forest training, spot-level assessment, and marker visualisation.
Create publication-quality plots and visualizations using matplotlib and seaborn. Works with ANY LLM provider (GPT, Gemini, Claude, etc.).