Methods & Models

Comprehensive overview of models, datasets, and evaluation methodology

Systematic evaluation framework covering 12 models + 3 baselines, 25 datasets, and 24 metrics

Evaluated Models

Gene Regulatory Network-Based Models

GEARS

Strong R² Performance

Graph-enhanced autoencoder using regulatory information to predict perturbation effects.

  • Graph autoencoder architecture
  • Gene regulatory network integration
  • Best R² = 0.866 (Task 1)

scELMO

Consistent Performance

Ensemble learning approach leveraging gene regulatory priors.

  • Gene regulatory network integration
  • Ensemble learning method
  • Reliable baseline performance

Foundation Models

scGPT

Transformer-Based

Large language model adapted for single-cell genomics with gene tokenization.

  • GPT architecture for genomics
  • Gene-level tokenization
  • Pre-trained on large corpora

scFoundation

Large-Scale Pretraining

Foundation model trained on massive single-cell datasets with variance compression effects.

  • Large-scale pretraining
  • Shows variance compression
  • Mixed performance across tasks

Deep Learning & Generative Models

scGEN

Best Cell Transfer

Generative adversarial network for perturbation transfer learning.

trVAE

Transfer Learning

Transfer learning variational autoencoder for cross-context generalization.

CPA

Compositional

Compositional perturbation autoencoder for interaction modeling.

Other Deep Learning Models

PerturbNet

Best DE Recovery

Distribution-aware architecture capturing full perturbation effects. Achieves 56% DE overlap accuracy.

  • Conditional normalizing flow
  • Perturbation-cell embedding separation
  • Excels at differential expression recovery

scPreGAN

Generative Approach

Generative adversarial network for perturbation response prediction.

  • GAN-based architecture
  • Variable performance across tasks
  • Evaluated primarily in Task 3

Additional Models

Biolord

Best Delta Accuracy

Biologically-informed representation learning. Achieves 83.9% delta accuracy in Task 1.

scVIDR

Cell Transfer

Variational inference for differential response. Evaluated in Task 3.

scPRAM

Transfer Learning

Perturbation response analysis method. Evaluated in Task 3.

Benchmark Datasets

25 carefully curated datasets spanning different cell types, perturbation types, and experimental conditions

High-Quality References

  • Kang et al. - Immune cell perturbations
  • Hagai et al. - Cross-species responses
  • Srivatsan et al. - Optical pooled screens
  • Perturb_KHP - Kinase inhibitors

Diverse Contexts

  • Cell Types: Immune, stem, cancer, primary
  • Perturbations: Genetic, chemical, optical
  • Scales: Single-cell to population-level
  • Conditions: Various culture and treatment protocols

Evaluation Framework

Absolute Accuracy

  • MSE: Mean Squared Error
  • RMSE: Root Mean Squared Error
  • MAE: Mean Absolute Error
  • L2: L2 norm distance

Relative Effect Capture

  • Pearson Correlation: Linear relationships
  • R²: Variance explained
  • Cosine Similarity: Direction alignment
  • Delta Agreement Accuracy: Direction prediction accuracy

Distribution Similarity

  • MMD: Maximum Mean Discrepancy
  • Wasserstein: Earth mover's distance
  • Pearson Correlation Delta: Correlation of expression changes
  • Cosine Similarity Delta: Direction of expression changes

Gene Subset Analysis

Evaluation across different gene sets to assess model focus and biological relevance

Differentially Expressed Genes
  • Top20 DE: Most significantly changed
  • Top50 DE: Expanded significant set
  • Top100 DE: Broader response genes
Comprehensive Analysis
  • All Genes: Genome-wide predictions
  • Common Genes: Shared across datasets
  • Combined Sets: Union of significant genes

Evaluation Methodology

Data Preprocessing

  • Normalization: Library size and log transformation
  • Quality Control: Cell and gene filtering
  • Batch Correction: Technical artifact removal
  • Feature Selection: Highly variable genes

Training Setup

  • Cross-Validation: Stratified splits by perturbation
  • Hyperparameters: Grid search optimization
  • Early Stopping: Validation-based convergence

Statistical Analysis

  • Effect Size: Perturbation magnitude analysis
  • Significance Testing: Multiple comparison correction
  • Confidence Intervals: Bootstrap-based estimates
  • Ranking: Multi-metric performance scores

Reproducibility

  • Random Seeds: Fixed for consistent results
  • Code Availability: Open-source implementations
  • Data Sharing: Standardized benchmark datasets

Limitations & Future Directions

Current Limitations

  • Limited temporal dynamics modeling
  • Single-cell vs. population-level effects
  • Dataset-specific biases and artifacts
  • Computational scalability challenges

Future Improvements

  • Time-series perturbation responses
  • Multi-modal data integration
  • Causal inference frameworks
  • Real-time prediction systems