Trust Me, There’s a Method to This Madness

academia
llm
Why I work on LLMs in a chemistry department
Published

January 5, 2025

Even though I work in a chemistry department, much of the recent work in my team has been focused on Large Language Models (LLMs) - or, more generally, frontier models. This isn’t a departure from chemistry; rather, we believe these could be crucial building blocks for solving some of the most fundamental problems in chemistry and materials science.

Sam Rodrigues (as so often) put it best: Science is about doing things for the first time. What’s remarkable is that recent frontier models show sparks of an ability to perform impressive tasks they weren’t explicitly trained for. More importantly, they’re showing promising capabilities in developing what scientists have long considered crucial: good taste in choosing what is interesting. (Zhang et al. 2024) This intuition, traditionally developed through years of experience, can now be augmented by models that have synthesized patterns from vast amounts of scientific literature and data.

One of the most striking inefficiencies in academic research is how knowledge dissipates: when a PhD student leaves after four years in the lab, their accumulated experience often vanishes with them. Imagine if we could capture and share all this tacit knowledge - the failed experiments, the subtle technique adjustments, the unwritten rules - through training models on lab notes and conversations (Jablonka, Patiny, and Smit 2022).

While recent research suggests that language isn’t necessarily used for reasoning (Fedorenko, Piantadosi, and Gibson 2024), its flexibility makes it an unparalleled tool for communicating ideas, methods, and observations (just look at how synthesis protocols are reported). Yes, schemas, figures, and equations are crucial, but language remains our most versatile medium - and with multimodal approaches, we’re pushing to combine the best of all worlds (Alampara et al. 2024). (And there are tons of things for which we will need to go beyond naively treating everything as text (Alampara, Miret, and Jablonka 2024)).

However, the practical impact is already visible: tasks that once required a PhD thesis can now be accomplished within a Master’s project. During my PhD, training a model for a novel application without existing datasets would have consumed my entire PhD. Now, our team routinely collects custom datasets for new applications (Schilling-Wilhelmi et al. 2025). This scalability is crucial because science is inherently long-tailed: breakthrough innovations often emerge from unexpected corners of research and we have so many different instrument, techniques, questions that only a scalable technique can have a shot at capturing any of it.

Similarly, there have been tons of efforts in developing ontologies, defining APIs, and how to talk between different systems, and I have been involved in those efforts. But, I more and more come to the belief that we might be better off (at least for the long tail) just by letting models figure out how to talk to different things and build new tools in this way. Tools, are the way science progresses. As Sydney Brenner noted, “Progress in science depends on new techniques, new discoveries and new ideas, probably in that order” (Robertson 1980; Dyson 2012).

However, working with these models daily also raises concerns. While there’s significant potential upside, we who develop these tools bear responsibility for ensuring they benefit society. Beyond immediate concerns about bio- and chemical weapons (Peppin et al. 2025), I worry about information overflow and the proliferation of bullshit (Frankfurt 2005) and disinformation of all sorts (Europol 2023) along with a possibility to further increase inequalities (with some dominant players accumulating nation-state-like power and Orwellian centralization of “truth”).

The relative lack of some governments investment in building AI expertise is concerning, as is the potential erosion of critical thinking skills in some quarters. “We live in a society exquisitely dependent on science and technology, in which hardly anyone knows anything about science and technology” (Sagan 1990). And, clearly, the scope researches beyond knowing things about science and technology and perhaps even makes a general liberal arts education more valuable then ever.

For progress there is no cure. Any attempt to find automatically safe channels for the present explosive variety of progress must lead to frustration. The only safety possible is relative, and it lies in an intelligent exercise of day-to-day judgement… these transformations are not a priori predictable and… most contemporary “first guesses” concerning them are wrong…

CAN WE SURVIVE TECHNOLOGY? by John von Neumann

References

Alampara, Nawaf, Santiago Miret, and Kevin Maik Jablonka. 2024. “MatText: Do Language Models Need More Than Text & Scale for Materials Modeling?” https://arxiv.org/abs/2406.17295.
Alampara, Nawaf, Mara Schilling-Wilhelmi, Martiño Ríos-García, Indrajeet Mandal, Pranav Khetarpal, Hargun Singh Grover, N. M. Anoop Krishnan, and Kevin Maik Jablonka. 2024. “Probing the Limitations of Multimodal Language Models for Chemistry and Materials Research.” https://arxiv.org/abs/2411.16955.
Dyson, Freeman J. 2012. “Is Science Mostly Driven by Ideas or by Tools?” Science 338 (6113): 1426–27. https://doi.org/10.1126/science.1232773.
Europol. 2023. “Criminal Use of ChatGPT: A Cautionary Tale about Large Language Models.” 2023. https://www.europol.europa.eu/media-press/newsroom/news/criminal-use-of-chatgpt-cautionary-tale-about-large-language-models.
Fedorenko, Evelina, Steven T. Piantadosi, and Edward A. F. Gibson. 2024. “Language Is Primarily a Tool for Communication Rather Than Thought.” Nature 630 (8017): 575–86. https://doi.org/10.1038/s41586-024-07522-w.
Frankfurt, Harry G. 2005. On Bullshit. Princeton University Press.
Jablonka, Kevin Maik, Luc Patiny, and Berend Smit. 2022. “Making the Collective Knowledge of Chemistry Open and Machine Actionable.” Nature Chemistry 14 (4): 365–76. https://doi.org/10.1038/s41557-022-00910-7.
Peppin, Aidan, Anka Reuel, Stephen Casper, Elliot Jones, Andrew Strait, Usman Anwar, Anurag Agrawal, et al. 2025. “The Reality of AI and Biorisk.” https://arxiv.org/abs/2412.01946.
Robertson, Miranda. 1980. “Biology in the 1980s, Plus or Minus a Decade.” Nature 285 (5764): 358–59. https://doi.org/10.1038/285358a0.
Sagan, Carl. 1990. Why We Need to Understand Science. Vol. 14. 3.
Schilling-Wilhelmi, Mara, Martiño Ríos-García, Sherjeel Shabih, María Victoria Gil, Santiago Miret, Christoph T. Koch, José A. Márquez, and Kevin Maik Jablonka. 2025. “From Text to Insight: Large Language Models for Chemical Data Extraction.” Chemical Society Reviews. https://doi.org/10.1039/d4cs00913d.
Zhang, Jenny, Joel Lehman, Kenneth Stanley, and Jeff Clune. 2024. “OMNI: Open-Endedness via Models of Human Notions of Interestingness.” https://arxiv.org/abs/2306.01711.