Empirical principles and structure–property relations derived from chemical intuition have for centuries driven the design and synthesis of materials and molecules. In recent years, thanks to the compilation of curated experimental and computational databases of compounds and reactions and to the general advances in the application of machine-learning techniques to all fields of science, the design of molecules and materials has been increasingly augmented or even led by data-driven approaches.
In the field of chemical physics, there have been an increasing number of novel methods and breakthrough applications to design chemical compounds and materials with improved properties and synthetic routes to obtain them. These have been achieved by screening existing databases, by actively exploring chemical space, and by combining computational approaches with automated chemistry. These advances have touched upon, but are not limited to, discovery, computational or experimental characterization of catalysts and materials for energy storage and production, and novel synthetic routes for molecules and materials.
In this special issue, 29 articles describe recent advances from machine learned potentials for the efficient exploration of chemical and materials space to development of new algorithms for quantified uncertainty and feature selection in machine learning. A significant number of articles describe new approaches and algorithms applied to the discovery of new materials. Still others demonstrate how multi-fidelity modeling and novel descriptors can be used to improve models for the accurate prediction of properties otherwise challenging to obtain from physics-based models, including those in solution or catalytic activation energies. Finally, a critical domain is the synthetic accessibility of designed materials, and several works touch upon how to realize computationally designed materials.
Much more work is required before machine learning approaches are as universally applicable as some physics-based models. Nevertheless, present developments in machine learning are enabling the design and discovery of new molecules and materials that have the potential to be experimentally verified. These techniques are also inspiring the construction of novel workflows. It is almost certainly true that as developments continue at ever-increasing speeds over the next few years, the face of chemical physics will continue to transform as machine learning and artificial approaches mature into useful tools alongside more traditional modeling to enable chemical design by artificial intelligence. Below, we summarize the contributions in the 29 articles in this special issue.
SUMMARY OF AREAS COVERED
Machine learning models have the promise of enlarging the space of compounds that can be feasibly screened by direct calculation. Highlighting this promise, Ramprasad and co-workers1 trained data-driven models on literature data regarding ionic liquid (IL) conductivity, greatly reducing the time required to screen for new ILs. Diaz and co-workers2 developed quantitative-structure–property relationships to predict the tensile strength of polymers. Ramakrishnan and co-workers3 developed data-driven models for the accelerated screening of excited state properties of substituted dyes. Flanagan and Cole explored a database of 4591 optically absorbed organic molecules and clustered those molecules using bespoke hierarchical fingerprinting schemes identifying chemical and structural trends in the data.4 Froudakis and co-workers used machine learning to identify top-performing MOFs for methane storage, with an emphasis on the generation of artificial MOFs beyond those reported to ensure it is possible to find MOFs that are higher-performing than those already known.5 Often the design of molecules and materials necessitates identification of properties in the condensed phase. Corminboeuf and co-workers report an advancement on the identification and accurate description of non-covalent interactions by developing baselined neural network potentials to reproduce hybrid DFT energies and forces in a solute-chloride–THF mixture.6
Accelerated molecular dynamics and global optimization have been one of the most common themes in the recent application of machine learning. Nevertheless, much remains to be developed in order to ensure that machine learned potentials are predictive and efficient to train. Barati Farimani and co-workers presented a graph neural network accelerated molecular dynamics model that predicts atomic forces without explicit calculation of the energy and demonstrated its use on Lennard-Jones particles and water.7 Eckhoff and Behler developed a neural network potential to allow simulation of lithium manganese oxide–water interfaces several orders of magnitude faster than DFT, providing insight into key reactions that influence performance and durability of batteries and catalysts.8 Peterson and co-workers report robust machine learning potentials that require minimal single-point calculations to enable the calculation of much larger molecules.9 Liu and co-workers report a method for the automated search of optimal surface phases facilitated by the stochastic surface walking global optimization based on a global neural network potential. This methodology was applied to silver surface oxide.10
The need to identify both selective and active catalysts as well as the high computational cost of traditional electronic structure modeling in reaction pathway elucidation has motivated data-driven model development to address these multiple trade-offs. Sunoj and co-workers showcase a machine learning-based workflow to predict enantioselectivity for a relay Heck reaction and make predictions on thousands of new reactions.11 Habershon and co-workers explore the performance of machine-learned activation energies for the kinetic simulation of formamide decomposition, aldol reactions, and decomposition of 3-hydroperoxypropanol, providing guidelines for where ML-activation energies are reliable and where they are not.12 von Lilienfeld and co-workers report a machine learning model that predicts organic reaction activation energies and transition state geometries from a molecular graph of the reactant and the information of the reaction type.13 Transfer learning to generalize from one domain to a new domain remains a difficult task in machine learning for the design structures. To examine domain-to-domain transfer learning, Ulissi and co-workers examined a machine learning trained by the Open Catalyst Dataset to a new domain involving the *CO adsorbate dataset.14
A growing area of importance for machine learning is in understanding how materials that are discovered can also be realized synthetically. Nguyen and Tsuda developed generative models to create reaction trees for synthesis planning.15 Schrier and co-workers16 devised a meta-learning approach to predicting optimal crystallization conditions for perovskites.
For machine learning models to be predictive, especially in the low data regime, accurate feature selection is key, and an understanding of the most essential features also informs why models work the way they do. Senftle and co-workers17 introduced a novel strategy for combined feature selection and feature engineering to reveal the most essential features for a given problem and demonstrate it in catalysis. Kulik and co-workers18 devised new representations that naturally encode relationships among elements in the same group in the Periodic Table, improving model transferability across the Periodic Table. To enable more effective prediction of molecular properties, Ceriotti and co-workers generalize the atom-centered density correlations framework to include multi-centered information.19 To bridge the gap between structure-based and descriptor-based representations, Stuyver and Coley examined the prediction of regioselectivity (e.g., organic SN2 vs E2 mechanisms) activation barriers using quantum mechanics-augmented graph neural network methods.20 Simon and co-workers used support vector machines to classify the experimentally known toxicity of pesticides to honey bees testing two different representations and interpreting which subgraph patterns in the molecules contribute to their toxicity classification.21
Outstanding challenges remain in quantifying uncertainty in machine learning model predictions as well as in addressing the quality of the data (i.e., typically from low-cost electronic structure calculations) used to train machine learning models. Thiyagalingam and co-workers presented an active learning approach based on convolution-fed Gaussian processes to accelerate the rate of property prediction for materials alongside a measure of the uncertainty in the property.22 Morgan and co-workers reported a multi-fidelity graph network model for bandgap prediction, with their data published in a cloud-based computing environment.23 Walsh and co-workers used machine learning to calibrate a high throughput technique (xTB-sTDA) for excited state calculations against higher accuracy TDDFT, with a sixfold decrease in in-domain error and threefold in out-of-domain.24
Machine learning also motivates the development of tools and workflows that can rapidly optimize or predict properties of materials and molecules. Mallikarjun Sharada and co-workers25 demonstrated how a genetic algorithm can be used to search for new organic catalysts for photoredox CO2 reduction. Liu and co-workers introduced26 the AutoSolvate toolkit as an automated approach for screening solvated systems and introduced a machine learning model to aid in structure generation. Jensen and co-workers developed a strategy27 to screen a space of 1025 substituted molecules for solar thermal energy storage. In work that makes machine learning model based predictions, Hutchison and co-workers used genetic algorithms to discover unfused non-fullerene acceptors for use in organic solar cells.28 Understanding and learning from 3D chemical structure models has historically been a major part of designing new molecules. To facilitate this effort, Martinez and co-workers combined augmented reality, machine learning, and computational chemistry to create an open-source mobile application for visualizing and interacting with molecules.29
This special issue has provided a broad survey of the many ways in which machine learning is poised to overhaul chemical design of new materials and molecules. These advances have come about through development of novel representations, quantified uncertainty, and multi-fidelity modeling as well as in the application to ever more complex materials spaces and materials design problems. One critical role for the chemical physics community will be in deciding how to best infer guiding chemical principles from intrinsically data-fitting techniques. It will also be necessary to develop strategies to validate machine learned potentials as they are applied to physical system sizes intractable for traditional techniques that they are fit to reproduce. Given the breathtaking progress that the community has seen over the past decade, we have no doubt that researchers will rise to these challenges and devise ever more creative and innovative approaches to fully realize chemical design by artificial intelligence.
The editors of this Special Topic issue thank all the authors who contributed and the staff who assisted.