Why AI and In Silico Models Will Not Revolutionize Drug Discovery
The rise of artificial intelligence (AI) has sparked tremendous enthusiasm, with many predicting profound disruption across industries, including healthcare and drug discovery. Much of this excitement, however, is focused on large language models (LLMs) such as ChatGPT, which have gained widespread attention. While AI has long been a key tool for researchers, engineers, and scientists, the arrival of LLMs has catalyzed advancements in computing hardware and efficiency. However, despite their potential, LLMs and AI in general are unlikely to revolutionize drug discovery in the way some anticipate.
LLMs are trained on vast amounts of textual data, often incorporating nearly all publicly available text on the internet. This has led to the belief that such models could transform drug discovery by shortening development timelines, reducing reliance on animal testing, and accelerating clinical trials. While these ideas are promising, they oversimplify the realities of biology and drug development. Here are three fundamental reasons why the application of AI to drug discovery faces significant limitations:
1. Defined Rules
Language operates within well-defined and predictable patterns. For example, with just 26 letters and a handful of special characters, the structure of English allows for high predictability (e.g., filling in the blank: “The cat in the ___”). This inherent order makes language relatively straightforward to model.
Biology, on the other hand, is vastly more complex. Biological systems involve intricate interdependencies between countless variables, making them inherently unpredictable. Modeling such complexity is far more challenging than understanding linguistic patterns, as biological processes lack the fixed rules that govern language.
2. Data Availability
LLMs thrive on enormous quantities of data, which is readily available for language. However, modeling biological systems requires exponentially more data due to the vast degrees of freedom and complexity involved. For instance, predicting how a single cell responds to a new drug involves understanding the interplay of countless molecular processes. The amount of data required to accurately model such interactions is orders of magnitude greater than what is needed for language models.
Compounding this issue is the scarcity and quality of biological data. Unlike language, where errors (e.g., typos) can be easily identified and corrected, biological data is often noisy, incomplete, and difficult to interpret. Even with access to all proprietary data from pharmaceutical companies and contract research organizations (CROs), the available data is insufficient for comprehensive modeling.
3. Unknown Rules
While we have made significant progress in understanding biochemical pathways and developing tools like structure-activity relationship (SAR) models, much of biology remains a mystery. Many fundamental processes within cells are not fully understood or discretely defined. This creates a fundamental challenge: how can we model systems we do not fully comprehend? AI can provide incremental insights and improvements, but it cannot replace foundational knowledge or solve problems that remain poorly understood.
AI and in silico models offer valuable tools to support drug discovery, but their impact will likely be incremental rather than transformative. While substantial improvements in AI models, hardware, and software are paving the way for advancements, the complexity of biology sets inherent limits on what can be achieved. As we embrace the potential of AI, it is crucial to temper expectations and remain grounded in the realities of drug discovery.