Predicting cellular responses to perturbations not seen during training
This task evaluates model generalization capabilities by testing predictions on completely novel perturbations, simulating real-world scenarios where new treatments need response prediction.
In Task 1, models are trained on a subset of perturbations and tested on their ability to predict responses to completely unseen perturbations. This represents a critical real-world scenario where researchers need to predict the effects of new drugs or genetic modifications.
Figure 2: Task 1 performance results showing model comparison across different metrics and effect sizes
GEARS
R² = 0.866 average
PerturbNet
56% DE overlap accuracy (paper)
Biolord
83.9% direction accuracy
Pearson Correlation Delta vs Delta Agreement Accuracy
Performance strongly correlates with perturbation effect size. Stronger perturbations are easier to predict accurately.
Graph-based models (GEARS, PerturbNet) show superior performance compared to simpler baselines.
Performance varies significantly across different datasets, highlighting the importance of dataset characteristics.
Models that perform well on correlation metrics often struggle with absolute error measures.