Deep neural networks (DNNs) have emerged as highly promising tools for analyzing vast amounts of data, holding the potential to significantly expedite research across various scientific fields. With success in analyzing chemical data and identifying prospective chemicals for diverse applications, computer scientists have leveraged DNNs to develop models in recent years. Massachusetts Institute of Technology (MIT) researchers conducted a study investigating the neural scaling behavior of large DNN-based models trained to generate advantageous chemical compositions and learn interatomic potentials, shedding light on the significant improvements in these models’ performance as their size and training data increase. This study aimed to explore the reach of “neural scaling” in models trained on chemistry data, particularly in the domain of drug discovery.
The research group, led by Nathan Frey, embarked on this project in 2021, prior to the release of renowned AI-based platforms like ChatGPT and Dall-E 2. Given the perceived relevance of upscaling DNNs in certain fields, the team felt that studies examining their scaling in the physical and life sciences were lacking. Consequently, this research project focused on investigating the neural scaling of two types of models employed for chemical data analysis: a large language model (LLM) and a graph neural network (GNN)-based model. These models offer the ability to generate chemical compositions and learn the potentials between different atoms in chemical substances, respectively.
The research team, comprising Frey and his colleagues, sought to understand the effects of a model’s size and dataset size on relevant metrics to assess the scalability of the ChemGPT model and GNNs. By doing so, they aimed to determine the rate at which these models improve as they scale up and receive more data. ChemGPT, inspired by ChatGPT, was designed to predict the next token in a molecular string. On the other hand, GNNs were trained to predict the energy and forces of a molecule. Analyzing the scalability of these models, the team revealed intriguing “neural scaling behavior” for chemical models, akin to the scaling behavior found in LLM and vision models for diverse applications. Moreover, the researchers discovered that there is still substantial room for investigation as no fundamental limit for scaling chemical models has been reached. The incorporation of physics into GNNs through “equivariance,” a property that exhibits a dramatic effect on enhancing scaling efficiency, proved to be an exciting finding since algorithms that alter scaling behavior are challenging to find.
The findings from this study offer crucial insights into the potential of two distinct types of AI models in advancing chemistry research. They illustrate the considerable improvements in performance that can be achieved by scaling up these models, suggesting the need for further exploration to harness their full capabilities. Additionally, these findings hold significance in informing future studies examining the promise and enhancement opportunities present in these models, as well as other DNN-based techniques for specific scientific applications.
DNNs have emerged as a highly promising tool in the realm of chemistry research, facilitating the analysis of large datasets and expediting advancements in the field. The research carried out by the MIT team sheds light on the scalability of DNN-based models for chemical data analysis, highlighting the advancements achieved by increasing model size and training data. These findings serve as a launchpad for further investigations and underscore the potential for significant improvements in these models’ performance. As the scientific community delves deeper into the capabilities of DNNs, these models will continue to play a pivotal role in various scientific domains, revolutionizing research practices and uncovering new insights.