
Abstract
In small molecule drug discovery, early lead selection and optimization can benefit from having a holistic assessment of the biological activity and potential liabilities of the compounds of interest. This complements traditional assay cascades, which are designed around a limited number of targets and risks, and in-vivo testing, which is limited to a small number of compounds.
Janssen set up the Biosignature platform to characterize entire compound libraries in information-rich, target-agnostic assays. High content imaging and high-throughput transcriptomics are applied to a standardized panel of cell lines, primary cells, and stem-cell derived models. Cloud-based workflows process large data volumes. Strict quality control is applied to ensure that these biosignatures, which contain hundreds to thousands of feature columns, remain stable over time.
The resulting high-dimensional feature space is not readily interpretable, and we apply supervised and unsupervised machine learning techniques. Multitask supervised learning on large data sets builds predictive models for all assays for which we have sufficient experimental data, including pre-clinical safety assays and project related assays. Inference using the latter models can enrich the chemical matter for a target. Unsupervised clustering of small and medium-sized data sets permits a target-agnostic assessment of biological activity. The integration of both approaches with chemistry information provides therapeutic area project teams with an actionable assessment of the biological activity of their compounds.