Team portrait, from left to right: Manu Suvarna, Prof. Javier Pérez-Ramírez, Dr. Sharon Mitchell, Dr. Teodoro Laino, Dr. Alain Claude Vaucher. © Javier Pérez-Ramírez
In a recent collaborative publication, a team of experimental and computational experts from two NCCR Catalysis member groups have explored language models and protocol standardization guidelines to speed up synthesis planning for heterogeneous catalysts. Learn more about the work from Manu Suvarna, Dr. Sharon Mitchell and Prof. Javier Pérez-Ramírez (aCe lab, ETHZ) and Dr. Alain Vaucher and Dr. Teodoro Laino (IBM Research Zurich)!
Can you briefly introduce the concept of the project and your role?
Manu: Our project aimed at developing an automated pipeline via machine learning (ML) for the extraction of experimental synthesis procedures of single atom heterogeneous catalysts (SACs) from published literature. We used the compiled data to statistically analyze trends in SAC synthesis procedure. Eventually, based on the learning experience through the project, we identified current limitations in the way SAC-based synthesis procedures are reported, and provided guidelines to write them effectively in machine readable formats. As a PhD researcher, I wore various hats during the project, but my most important role was to translate a catalysis-oriented task into an ML problem, and serve as a bridge for exchange of insights and inferences between the two communities.
How did the collaboration between ML experts and experimental researchers come about?
Teo: The collaboration between ML experts and experimental researchers was a seamless interaction of individuals fluent in different languages - algorithms and empirical experimentation. This partnership thrived on a constant feedback loop, where ML insights informed experimental design and vice versa, elevating the quality of our research. Manu played a pivotal role by fostering curiosity and openness to learning from the ML perspective, creating an environment where diverse viewpoints flourished. This collaboration exemplifies the power of NCCR Catalysis and consequently of interdisciplinary teamwork in driving innovative breakthroughs at the intersection of technology and empirical research.
How did your experience working with experimental researchers help advance your text mining model?
Alain: The model's success was determined by its usefulness for the experimental researchers, not merely by its numerical performance. As such, we focused on tweaking the model to effectively extract what is relevant in SAC syntheses. With Manu's valuable insights, we identified and iteratively refined the model's weaknesses, ensuring it evolved to meet the precise needs of experimental researchers in this domain.
Were there any unexpected challenges that arose during the collaboration, and how did you address them?
Manu: From a technical point of view, the biggest challenge we faced was the impact of lack of data standards on model performance. Based on our experience through this project, we believe that current catalysis and chemistry literature may not be sufficiently ready to capitalize on all the benefits of language models and ML tools. In our study, we address these limitations by recommending guidelines for protocol standardization to improve machine readability. As Teo mentioned earlier, this was made possible by working in an environment where we seamlessly exchanged knowledge and shared our learning curves, to come up with an innovative solution by leveraging our respective expertise in experimental catalysis and ML.
What message or insight do you hope readers take away from your collaborative work?
Sharon: A significant takeaway is the need for experimentalists to reconsider traditional approaches in reporting synthetic protocols. In light of the crucial aspect of reproducibility of experimental results, embracing machine readable formats, like tables submitted as data files alongside articles, could greatly facilitate the integration of ML approaches to accelerate research.
Where do you see the intersection of ML and experimental research heading in the field of heterogeneous catalysis?
Teo & Manu: ML algorithms are increasingly being employed to predict catalytic properties, optimize catalyst designs, and accelerate the discovery of novel materials. The intersection of the two fields will lead to the creation of better datasets, predictive models, which will help researchers identify promising catalysts, reducing time and costs involved in experimentation. However, the field is still in its infancy and there is significant potential for the greater adoption of data-driven workflows in routine experimental works with enormous implications for chemical industry and the environment.
What role has NCCR Catalysis played in developing this work?
Javier: When we conceived NCCR Catalysis at the end of the last decade, we dreamed of studies like this one, at the interface of experimental and digital catalysis. With a cutting-edge program embracing experts in different domains, the dream has become a reality.
Language models and protocol standardization guidelines for accelerating synthesis planning in heterogeneous catalysis. M. Suvarna, A.C. Vaucher, S. Mitchell, T. Laino, J. Pérez-Ramírez. Nat. Commun. 2023, 14, 7964. DOI: 10.1038/s41467-023-43836-5.