
Abstract
Machine learning (ML) has been used for decades to leverage the data generated early in a project to inspire later decisions. In pharma discovery, advances in ML have helped to leverage all Janssen-accessible data to inform scientist on the most attractive small molecules to test and to make and test.
This implies first, the consolidation of data of conventional types (like small molecule structures and their dose-response activities in validated assays) and mining with multi-task and transfer learning. Then, unlocking less conventional data types, like HTS datapoints, images and transcriptomics. And finally, even the leveraging of data across collaborators in a privacy-preserving context (which is being studied at an unprecedent data scale by ten pharma and seven solution and knowledge partners in the IMI project MELLODDY).
The resulting empowered models are being applied to portfolio projects, where they inform hit identication, extension and triaging, provide guidance during hit-to-lead and in combination with generative modelling, enable more efficient intelligent sampling of chemical space to enrich for more attractive molecules during lead optimization.