Quick-reference sheet for OmicVerse tutorials spanning MOFA, GLUE pairing, SIMBA integration, TOSICA transfer, and StaVIA cartography.
Installation
Details
Usage
After installing, this skill will be available to your AI coding assistant.
Verify installation:
skills listSkill Instructions
name: single-cell-multi-omics-integration title: Single-cell multi-omics integration description: Quick-reference sheet for OmicVerse tutorials spanning MOFA, GLUE pairing, SIMBA integration, TOSICA transfer, and StaVIA cartography.
Single-Cell Multi-Omics Tutorials Cheat Sheet
This skill walk-through summarizes the OmicVerse notebooks that cover paired and unpaired multi-omic integration, multi-batch embedding, reference transfer, and trajectory cartography.
MOFA on paired scRNA + scATAC (t_mofa.ipynb)
- Data preparation: Load preprocessed AnnData objects for RNA (
rna_p_n_raw.h5ad) and ATAC (atac_p_n_raw.h5ad) withov.utils.read, and initialisepyMOFAwith matchingomicsandomics_namelists. - Model training: Call
mofa_preprocess()to select highly variable features and run the factor model withmofa_run(outfile=...), which exports the learned MOFA+ factors to an HDF5 model file. - Result inspection: Reload downstream AnnData, append factor scores via
ov.single.factor_exact, and explore factor–cluster associations usingfactor_correlation,get_weights, and the plotting helpers inpyMOFAART(plot_r2,plot_cor,plot_factor,plot_weights, etc.). - Export workflow: Persist factors and weights through the MOFA HDF5 artifact and reuse them by instantiating
pyMOFAART(model_path=...)for later annotation or visualisation sessions. - Dependencies & hardware: Requires
mofapy2; plots optionally rely onpymde/scvi-toolsbut run on CPU.
MOFA after GLUE pairing (t_mofa_glue.ipynb)
- Data preparation: Start from GLUE-derived embeddings (
rna-emb.h5ad,atac.emb.h5ad), build aGLUE_pairobject, and runcorrelation()to align unpaired cells before subsetting to highly variable features. - Model training: Instantiate
pyMOFAwith the aligned AnnData objects, runmofa_preprocess(), and save the joint factors throughmofa_run(outfile='models/chen_rna_atac.hdf5'). - Result inspection: Use
pyMOFAARTplus AnnData that now contains the GLUE embeddings to compute factors (get_factors) and visualise variance explained, factor–cluster correlations, and ranked feature weights. - Export workflow: Reuse the saved MOFA HDF5 model for downstream inspection; GLUE embeddings can be embedded with
scvi.model.utils.mde(GPU-accelerated MDE is optional,sc.tl.umapworks on CPU). - Dependencies & hardware: Requires both
mofapy2and the GLUE tooling (scglue,scvi-tools,pymde); GPU acceleration only affects optional MDE visualisation.
SIMBA batch integration (t_simba.ipynb)
- Data preparation: Fetch the concatenated AnnData (
simba_adata_raw.h5ad) derived from multiple pancreas studies and pass it, alongside a results directory, topySIMBA. - Model training: Execute
preprocess(...)to bin features and build a SIMBA-compatible graph, then callgen_graph()followed bytrain(num_workers=...)to launch PyTorch-BigGraph optimisation (can scale with CPU workers) andload(...)to resume trained checkpoints. - Result inspection: Apply
batch_correction()to obtain the harmonised AnnData with SIMBA embeddings (X_simba) and visualise usingmde/sc.tl.umapcoloured by cell type or batch. - Export workflow: Training outputs reside in the workdir (e.g.,
result_human_pancreas/pbg/graph0); reuse them withsimba_object.load(...)for later analyses. - Dependencies & hardware: Requires installing
simbaandsimba_pbg(PyTorch BigGraph backend). GPU is optional; make sure adequate CPU threads and memory are available for graph training.
TOSICA reference transfer (t_tosica.ipynb)
- Data preparation: Download demo AnnData references (
demo_train.h5ad,demo_test.h5ad) and required gene-set GMT files viaov.utils.download_tosica_gmt(); confirm datasets are log-normalised before training. - Model training: Create
pyTOSICAwith the reference AnnData, chosen pathway mask, label key, project directory, and batch size; train withtrain(epochs=...), then persist weights withsave()and optionally reload viaload(). - Result inspection: Generate predictions on query AnnData through
predicted(pre_adata=...), embed with OmicVerse preprocessing and GPU-enabledmde(UMAP fallback available), and explore pathway attention to interpret transformer heads. - Export workflow: Saved project folder keeps model checkpoints and attention summaries; reuse the exported assets to annotate future datasets without retraining from scratch.
- Dependencies & hardware: Needs TOSICA (PyTorch transformer) plus downloaded gene-set masks; avoid setting
depth=2if memory is constrained. GPU acceleration improves embedding (mde) but training runs on standard PyTorch (CPU/GPU depending on environment).
StaVIA trajectory cartography (t_stavia.ipynb)
- Data preparation: Load example dentate gyrus velocity data via
scvelo.datasets.dentategyrus(), preprocess with OmicVerse (preprocess,scale,pca, neighbours, UMAP) to populate the AnnData matrices used by VIA. - Model training: Configure VIA hyperparameters (components, neighbours, seeds, root selection) and instantiate/run
VIA.core.VIAon the chosen representation (adata.obsm['scaled|original|X_pca']). - Result inspection: Store outputs such as pseudotime (
single_cell_pt_markov), cluster graph abstractions, trajectory curves, atlas views, and stream plots through VIA plotting helpers. - Export workflow: Persist derived visualisations and animations (e.g.,
animate_streamplot_ov,animate_atlas) to files (.gif) for reporting; recompute edge bundles viamake_edgebundle_milestonewhen needed. - Dependencies & hardware: Relies on
scvelo,pyVIA, and OmicVerse plotting; computations are CPU-bound though producing large stream/animation outputs benefits from ample memory.
More by Starlitnightly
View allCreate professional PDF reports with text, tables, and embedded images using reportlab. Works with ANY LLM provider (GPT, Gemini, Claude, etc.).
bulk-rna-seq-batch-correction-with-combat: Use omicverse's pyComBat wrapper to remove batch effects from merged bulk RNA-seq or microarray cohorts, export corrected matrices, and benchmark pre/post correction visualisations.
Map scRNA-seq atlases onto spatial transcriptomics slides using omicverse's Single2Spatial workflow for deep-forest training, spot-level assessment, and marker visualisation.
Create publication-quality plots and visualizations using matplotlib and seaborn. Works with ANY LLM provider (GPT, Gemini, Claude, etc.).