The importance of small data in drug discovery
One of the most technically interesting biology and AI stories from the last two weeks is a Nature Machine Intelligence paper on a system called PrePR CT, published in 2026, that focuses on a problem drug discovery keeps running into: how to predict cell type specific responses to small molecules when the data are limited, uneven, and full of distribution shifts. The paper frames this as a “small data regime” problem and proposes a graph based deep learning approach that uses cell type specific co expression networks as an inductive bias, rather than relying only on scale and brute force pattern extraction. That matters because a lot of the current AI conversation in biology still assumes that bigger is always better. Bigger models, bigger pretraining corpora, bigger perturbation atlases. But much of real translational biology does not look like that. In practice, researchers often care about a specific cell type, a specific disease context, or a specific perturba...