Designing polymers with high intrinsic thermal conductivity (TC) is critically important for the thermal management of organic electronics and photonics. However, this is a challenging task owing to the diversity of the chemical space and the barriers to advanced synthetic experiments/characterization techniques for polymers. In this Tutorial, the fundamentals and implementation of combining classical molecular dynamics simulation and machine learning (ML) for the development of polymers with high TC are comprehensively introduced. We begin by describing the core components of a universal ML framework, involving polymer data sets, property calculators, feature engineering, and informatics algorithms. Then, the process of constructing interpretable regression algorithms for TC prediction is introduced, aiming to extract the underlying relationships between microstructures and TCs for polymers. We also explore the design of sequence-ordered polymers with high TC using lightweight and mainstream active learning algorithms. Lastly, we conclude by addressing the current limitations and suggesting potential avenues for future research on this topic.

Polymers have attracted extensive attention in various fields such as energy,1,2 environment,3,4 electronics,5 biologies,6 medicine,7 and engineering8 thanks to their light weight, low cost, excellent mechanical ductility, superior biocompatibility, and good chemical and thermal stability.9–11 However, intrinsic bulk polymers are thermally insulating and have a low thermal conductivity (TC) in a narrow range of 0.1–0.3 W m−1 K−1,12–14 which restricts the heat dissipation of organic equipment and severely obstructs the process of miniaturization and integration of flexible electronic and optoelectronic devices.15–22 Achieving high TC in polymers for industrial applications is an urgent demand, and some progress has been realized recently.23,24 Different fabrication techniques such as micromechanical stretching,25–28 electrostatic spinning,29–31 and nanoscale templating32,33 have been employed in the exploitation of polymers, which effectively improve the ordering and crystallinity of the polymer chains and, thus, exhibit high TC. Moreover, the construction of thermal networks by engineering interchain interactions such as hydrogen bonding,34, π π stacking,35 and side chain modifying36 in polymer blends and copolymers has also been demonstrated to be beneficial for enhanced TC. Taking polyethylene (PE) as an example, the TC of PE films27 and nanofibers26 by mechanical stretching was found to be as high as ∼62 and ∼104 W m−1 K−1, respectively, over two or three orders-of-magnitude greater than that of typical polymers.

Despite the engineered polymers that can be produced experimentally to achieve increased TC, this requires a strong chemical background from investigators and is limited by process and characterization instruments. Further, the applicability of different techniques is restricted, e.g., micromechanical stretching is unsuitable for brittle polymers.37 Accompanied by the evolution of high-performance computers and the revolution of multiscale simulation methodology, in silico experiments are playing an important role in the study of thermal transport in polymers.38–42 Computational approaches including first-principles calculations43,44 and molecular dynamics (MD) simulations45 have led the way in revealing the effect of polymer nanostructures on TC. The first-principles calculation to TC is based on the computation of interatomic force constants via density-functional theory (DFT). On this basis, all relevant phonon properties can be calculated using lattice dynamics and the Boltzmann transport equation.44 This method has been successfully applied to molecular crystals such as PE,43 polyvinylidene fluoride,43 and polythiophene,44 but is challenging to apply in amorphous systems, owing to quantum nuclear motion and their complex primitive cells.

MD simulations employ classical force fields combined with Newtonian mechanics and statistical physics to derive the macroscopic properties of systems.46 MD simulations can handle polymer systems containing tens of thousands of atoms, which are widely used in polymer thermal transport studies, not only to predict the TC of hierarchical structures, but also to probe the linkages between microstructures and TCs. Using MD simulations, an individual PE chain exhibits a very high TC of 350 W m−1 K−1, even divergent in some cases.47–49 These results are encouraging and inspire researchers to make further efforts to develop polymers with high TC. Moreover, properties of polymers, such as molecular weight,50,51 chain length,52–56 side chains,36,57–59 and chain conformation,51,60–63 intra-chain effects, such as bonds,64–66 angles,67,68 and dihedrals,69–71 inter-chain behaviors, such as molecular cross-linking,72–77 hydrogen bonding networks,78–81 and π π stacking,58,82–84 on TC were extracted in some separate MD simulations. However, current investigations on the thermal transport mechanisms of polymers have mainly focused on common polymers such as PE, polytetrafluoroethylene (PTFE), polyvinylidene fluoride, and other conjugated structures.85 

For a long time, researchers have been working on exploring quantitative structure–activity relationships (QSAR) from chemical data, which, in turn, has enabled the rational design of innovative materials, including polymers and inorganic and biological components.86,87 QSAR models reveal empirical, linear, or non-linear relationships between descriptors extracted from chemical structures and computational/experimental properties or activities.88 As modern research methods have facilitated the proliferation in the amount of chemical data, the data-driven research paradigm is of critical importance for QSAR modeling.89–91 In terms of polymers, the chemical space is enormous,92 corresponding to potential candidates of small organic molecules as many as 1060 whereas the known organic compounds are more than 108 recorded in the PubChem database.93–95 Moreover, virtual chemical reactions96 or generative algorithms97 for small molecules can create a nearly infinite chemical space. Polymer informatics is a data-centric technology equipped with artificial intelligence and machine learning (ML) as powerful engines to accelerate the optimization of organic materials and the development of novel macromolecules.98–101 Polymer informatics has been effective in facilitating polymer innovation, and has achieved a series of successful applications, involving optical,102–104 electrical,105–108 thermal,109–112 mechanical,113,114 and other properties.115–118 Over the past five years, several efforts have been made to apply ML to the exploitation of polymers with high TC (TC > 0.30 W m−1 K−1), with essential contributions for expanding the potential candidates and revealing underlying physical mechanisms.85,94,119–124

In this Tutorial, we introduce some development paradigms combining high-throughput MD simulations and ML for high TC polymers in the hope of inspiring new researchers who are interested in becoming involved in this field. We start with a description of three core components in polymer informatics of polymer datasets, TC simulation methods, and polymer representations in Sec. II. Following this, interpretable regression models are constructed for mapping polymer microstructures to TCs in Sec. III. Next, active optimization algorithms are utilized for the design of polymers with high thermal conductivity in Sec. IV, including single- and multi-objective cases. Our conclusions and outlooks for this area are provided in Sec. V.

The principle of polymer informatics is to establish patterns from a sufficient amount of existing or generated polymer data, thus facilitating the design/discovery of new functional polymers with improved target properties.92  Figure 1 illustrates a mainstream informatics framework for the development of polymers with high TC, consisting of four elements: (1) polymer data sets; (2) polymer modeling and TC calculations; (3) feature engineering; and (4) informatics algorithms. In the following, we explain the implementation of the design framework in these four aspects.

FIG. 1.

Schematic of machine learning for high thermal conductivity polymer exploitation.

FIG. 1.

Schematic of machine learning for high thermal conductivity polymer exploitation.

Close modal

Well-organized and clean data are a fundamental prerequisite for the accomplishment of high-fidelity informatics algorithms. To promote the openness and sharing of scientific data, the findable, accessible, interoperable, and reusable (FAIR) principle has been proposed for data storage and management,125–127 which also greatly contributed to the progression of material genetic engineering.128 Inorganic materials have a more extensive and accessible database than polymeric materials, such as Materials Project,129 Atomly,130 ICSD,131 and AFLOW.132 These inorganic databases usually follow the FAIR guidelines and provide application programming interfaces (APIs) for access and download. However, the development process of polymer databases is relatively slow and is mostly limited to access. This is attributed to the fact that large polymer primitive cells make experimentation/computation difficult and costly,133 and empirical nomenclature134 and diverse forms of expression (strings and graphs) of polymers prevent text-mining techniques from obtaining property data from the published scientific literature.135 

A few representative databases include PoLyInfo,136 Khazana,137 Polymers: A Property Database,138 Polymer Property Predictor and Database,139 CAMPUS,140 and PI1M,97 which are listed in Table I. PoLyInfo136 is one of the largest polymer experimental databases, with over 18 000 homopolymers and 7000 copolymers from about 20 000 scientific literature, including hundreds of properties such as refractive index, dielectric constant, and glass transition temperature. Nevertheless, the abundance of different properties is quite variable, for example, more than 8000 homopolymers contain recorded glass transition temperature, while only 84 homopolymers expose TC (accessed 14 December, 2023). Moreover, PoLyInfo is contributed by the National Institute for Materials Science of Japan and is prohibited for downloading large amounts of data. To address this concern, Hayashi et al.141 selected 1070 amorphous polymers from the PoLyInfo database and calculated the associated 15 properties, including TC, using all-atom classical MD simulations. Ma and Luo97 trained a recurrent neural network based on ∼12 000 polymers collected from PoLyInfo, and then generated ∼1 × 106 polymers to form a benchmark database of PI1M, which covers a similar chemical space as the training data sets.

TABLE I.

Available organic databases, including mainstream polymer and small molecule data sets.

No.NameDescriptionType
PoLyInfo136  PoLyInfo provides various data collected from scientific literature for polymeric material design, including more than 18 000 homopolymers and 7000 copolymers 



Polymer data sets 
Khazana137  Computational materials knowledgebase to store structures and property data created by atomistic simulations 
Polymers: A Property Database138  A scientific and commercial information platform for polymers, which contains almost 1000 polymers and 1500 monomers with various properties of mechanical, electrical, thermal, and so on 
Polymer Property Predictor and Database139  Polymer structural and property data sets were collected from the literature through an automated information extraction pipeline 
CAMPUS140  High-quality and comparable material information database with online datasheets for resins from participating material producers 
PI1M97  PI1M contains ∼1 million polymers with structural information generated by a recurrent neural network model, which was trained on ∼12,000 polymers from the PoLyInfo database. 
PubChem93  One of the world's largest open chemical databases, which mostly contains small molecules from hundreds of data sources Molecule datasets 
eMolecules143  An open search-and-fulfilment platform for commercial chemical and biological reagents, which covers over 50 × 106 unique structures and 76 × 106 part numbers 
Material Project144  A well-known material database, it recently integrated more than 170 000 molecules studied using density-functional theory and can be queried through an OpenAPI-compliant application programming interface. 
No.NameDescriptionType
PoLyInfo136  PoLyInfo provides various data collected from scientific literature for polymeric material design, including more than 18 000 homopolymers and 7000 copolymers 



Polymer data sets 
Khazana137  Computational materials knowledgebase to store structures and property data created by atomistic simulations 
Polymers: A Property Database138  A scientific and commercial information platform for polymers, which contains almost 1000 polymers and 1500 monomers with various properties of mechanical, electrical, thermal, and so on 
Polymer Property Predictor and Database139  Polymer structural and property data sets were collected from the literature through an automated information extraction pipeline 
CAMPUS140  High-quality and comparable material information database with online datasheets for resins from participating material producers 
PI1M97  PI1M contains ∼1 million polymers with structural information generated by a recurrent neural network model, which was trained on ∼12,000 polymers from the PoLyInfo database. 
PubChem93  One of the world's largest open chemical databases, which mostly contains small molecules from hundreds of data sources Molecule datasets 
eMolecules143  An open search-and-fulfilment platform for commercial chemical and biological reagents, which covers over 50 × 106 unique structures and 76 × 106 part numbers 
Material Project144  A well-known material database, it recently integrated more than 170 000 molecules studied using density-functional theory and can be queried through an OpenAPI-compliant application programming interface. 

Databases of computational properties of polymers are rarer than experimental databases, of which Khazana137 is a typical one. The Khazana database was supported by the Ramprasad's group, including DFT computed refractive index, dielectric constant, and bandgap, and was used as training data sets to build an open informatics platform of Polymer Genome.142 In Table I, we also describe three databases containing small molecule data sets, namely, PubChem,93 eMolecules,143 and Material Project.144 With a grasp of the laws of chemical reactions, virtual reactions can convert small molecules into polymers through ML145,146 or specific grammar rules.96,116,147 Yang et al.116 established a polymer data set containing more than 8 × 106 hypothetical polyimides formed by known dianhydride and diamine/diisocyanate pairs from PubChem.93 Kim et al.96 released a vast polymer database known as the Open Macromolecular Genome, which began with approximately 24 × 106 potential reactant molecules of eMolecules143 and then formed synthesizable polymer chemistries compatible with 17 polymerization reactions that cover a variety of step growth, chain growth, ring opening, and metathesis polymerization reactions.

Polymer simulation includes modeling, force field assignment, equilibrium simulation, and TC calculations. While some discrete software such as RDKit,148 Materials Studio,149 CHARMM-GUI,150 Packmol,151 Moltemplate,152 Pysimm,153 Polymer Structure Predictor (PSP),154 Enhanced Monte Carlo (EMC),155 and Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS)156 enable the above processes to be realized collaboratively, open-source toolkits that facilitate the building of the entire workflow are of great significance in generating the training data required for polymer informatics. The Polymer Molecular Dynamics (PMD) package157 integrates with PSP, EMC, and LAMMPS software to realize polymer modeling and high-throughput MD simulations for various properties, including glass transition temperature, viscosity, TC, and so on. RadonPy141 is a robust open-source Python library that fully automates the calculation of various properties of polymers using all-atom classical MD simulations. The entire simulation process, including modeling and equilibrium and non-equilibrium MD simulations, can be automated by taking only simplified molecular-input line-entry system (SMILES)158 string of the polymer repeating unit as input. Furthermore, a computational database has been released containing more than 1000 unique amorphous polymers with various thermophysical properties calculated using RadonPy. Given the powerful and convenient capability of RadonPy, the polymer simulation and ML training data for this Tutorial relies primarily on RadonPy and its derived computational data set.

1. Polymer modeling

The polymer modeling procedure is illustrated in Fig. 2(a). The SMILES string was given as a unique identifier to distinguish between different polymer structures, where two asterisks denote the connection points, and this was fed as an input parameter to RadonPy. Then, the repeating unit was linked by a self-avoiding random-walk algorithm to form an individual polymer chain.159,160 The degree of polymerization of the polymer chain is controlled by the total number of atoms, which was uniformly set to around 1000 to ignore the dependence of the physical properties on the molecular weight. After that, the second generation of General AMBER Force Field (GAFF2) force field was assigned to the polymer chain, which is expressed as161–165 
E = bonds K b ( r r 0 ) 2 + angles K a ( θ θ 0 ) 2 + dihedrals K d [ 1 + cos ( n d φ δ ) ] + impropers K i ( χ χ 0 ) 2 + i , j q i q j 4 π ε 0 r ij + i , j 4 ε ij [ ( σ ij r ij ) 12 ( σ ij r ij ) 6 ] ,
(1)
where K b , K a , K d, and K i are the force constants of the bond, bond angle, dihedral angle, and improper angle, respectively; r , θ , φ, and χ are the bond length, bond angle, dihedral angle, and improper angle, respectively; r 0 , θ 0, and χ 0 are the equilibration structural parameters of the bond, bond angle, and improper angle, respectively; n d is the multiplicity, and δ is the phase angle; q i and q j are the charges of i-th and j-th atoms; r ij is the distance between atoms i and j; and ε ij and σ ij are the depth of the energy potential and equilibrium distance for Lennard–Jones potential, respectively. In RadonPy, if the bond angle parameters of K a and θ 0 are missing from the predefined parameter, they were automatically estimated in the same manner as GAFF2.
FIG. 2.

Molecular dynamics simulation for polymer TC calculations. (a) Polymer modeling pipeline. (b) Example snapshot of the relaxed amorphous system. (c) and (d) Heat flux autocorrelation function and TC curves calculated using the Green−Kubo formula. (e) Simulation setup for TC calculation in the non-equilibrium molecular dynamics simulations, where the relaxed system was triple replicated along the heat transport direction and was then divided equally into N slabs, with the red and blue slabs corresponding to the hottest and coldest region. (f) Temperature profile.

FIG. 2.

Molecular dynamics simulation for polymer TC calculations. (a) Polymer modeling pipeline. (b) Example snapshot of the relaxed amorphous system. (c) and (d) Heat flux autocorrelation function and TC curves calculated using the Green−Kubo formula. (e) Simulation setup for TC calculation in the non-equilibrium molecular dynamics simulations, where the relaxed system was triple replicated along the heat transport direction and was then divided equally into N slabs, with the red and blue slabs corresponding to the hottest and coldest region. (f) Temperature profile.

Close modal

For obtaining a simulation cell, the single polymer chain was duplicated into ten copies by translational and rotational operations to prevent overlap with each other and was placed in a large box with a density of ∼0.05 g/cm3. The packing simulation was performed to increase the density of the amorphous systems, to be adjusted to a suitable level. An NVT (constant number of atoms, volume, and temperature) simulation with a Nosé−Hoover thermostat was applied to the system in three sequential stages at a temperature of 300 K, from 300 to 700 K, and held at 700 K, under periodic boundary conditions (PBC) and a time step of 1 fs. Each of these stages took 1 ns, and all the bonds and angles were constrained by the SHAKE algorithm, resulting in a packaged cell with a density of around 0.80 g/cm3.

Equilibrium simulation was executed for the structural relaxation of amorphous polymers, which follows the 21-step compression/relaxation scheme.166 During the simulation, by combining NVT and NPT (constant number of atoms, pressure, and temperature) simulations with a Noose–Hoover thermostat, the temperature rise to 600 K and fall to 300 K was repeated for about 1.5 ns while the system was compressed to 50 000 atm and then depressurized to 1 atm. In RadonPy, the amorphous system was considered to be in equilibrium when it satisfies the following conditions: the fluctuations in the total, kinetic, bonding, bond angle, dihedral, van der Waals, and long-range Coulomb energies with relative standard deviations (RSDs) of less than 0.05%, 0.05%, 0.1%, 0.1%, 0.2%, 0.2%, and 0.1%, respectively. At the same time, the RSDs for the fluctuations in density and the radius of gyration were less than 0.1% and 1%, respectively. NPT simulation was run at 300 K and 1 atm with a time step of 1 fs. The simulated system was checked for equilibrium states every 50 ns until the equilibrium requirements were realized.141 

The density ρ of the equilibrated system can be denoted as
ρ = m / V ,
(2)
where m is the sum of atomic masses and V is the time-averaged system volume.
The number density n is calculated using the atoms number N and volume V in the equilibrated system,
n = N / V .
(3)
The radius of gyration R g is given as
R g = 1 Λ i = 1 Λ ( r i r m ) 2 ,
(4)
where r i is the position of a repeating unit, Λ is the number of repeating units in the polymer chain, and r m is the mean position of these repeating units.
The persistence length ξ can be further obtained as53 
ξ = R g 2 × 6 2 ( 2 p 1 ) × l + l 2 ,
(5)
where p is the degree of polymerization of a polymer chain and l is approximated as the length of the repeating unit.

2. Equilibrium molecular dynamics methods

The equilibrium molecular dynamics (EMD) simulation is executed in an equilibrium state without temperature gradients, so reasonable structural configurations, careful relaxation, and optimization are essential for the accuracy of TC estimation. The TC of polymers in EMD simulation is calculated by the Green–Kubo formalism,47,167–171
λ = V k B T 2 0 J x ( 0 ) J x ( τ ) d τ ,
(6)
where V is the volume of the amorphous system, k B is the Boltzmann constant, T is the temperature, J x is the heat flux in the x-direction, τ is the correlation delay time, and J x ( 0 ) J x ( τ ) denotes the heat autocorrelation function (HACF). Figures 2(b)2(d) provide a case study of TC calculation for PTFE using EMD. The optimized system contains ∼10 000 atoms, and we additionally performed the NVE simulation 20 ps for obtaining ten HACFs with a sampling interval of 2 fs. After the HACFs decayed to 0, the TC of PTFE also stabilized with an average value of 0.27 W m−1 K−1.

3. Non-equilibrium molecular dynamics methods

Non-equilibrium molecular dynamics (NEMD) methods involve imposing thermostats (normal NEMD) or swapping the kinetic energy of atoms between two regions (reverse NEMD) in the form of a temperature gradient and the formation of a heat flux. Once the system reaches a steady state, the TC can be derived by Fourier's law,
λ = J d T / d x ,
(7)
where J is the heat flux and d T / d x is the temperature gradient in the thermal transport direction.
In RadonPy, the TC was calculated by reverse NEMD simulation proposed by Müller-Plathe.172 As shown in Fig. 2(e), the equilibrated system was replicated along the x-direction by triplication and then was divided into N (N = 20) slabs. By exchanging the velocity between the coldest atom in slab N/2 and the hottest atom in slab 0, the temperature gradients were formed and recorded for TC evaluation. To prevent temperature shifts after cell replication, the system was initially run at 300 K for 2 ps. After that, the reverse NEMD simulation was run at 300 K for 1 ns with the velocity swapping frequency of 200 fs.141 According to the exchanged energy ΔE obtained using the Müller-Plathe algorithm and the temperature gradient d T / d x output by reverse NEMD simulation [Fig. 2(f)], the TC λ can be expressed by
λ = Δ E 2 A Δ t ( d T / d x ) ,
(8)
where A is the cross-sectional area and Δ t is the simulation time.
Additionally, the NEMD simulation was implemented for 100 ps for a thermal conductivity decomposition analysis. The energy flux along the direction of unit vectors J u can be expressed as the contribution of convection (first term) and interatomic interactions (second term),161,173
J u = 1 V { i V e i v i , u + i V ( S i v i ) u } ,
(9)
where v i is the velocity of the atom, e i is the potential and kinetic energy of each atom, i is the index of atoms, and S i is the stress tensor. For components (a and b), the stress tensor S ab can be detailed as141 
S a b = n = 1 N p r i 0 , a F i , b + n = 1 N b r i 0 , a F i , b + n = 1 N a r i 0 , a F i , b + n = 1 N d r i 0 , a F i , b + n = 1 N i r i 0 , a F i , b + K s p a c e ( n i , a , F i , b ) .
(10)
The stress tensor was divided into six parts of the contributions pairwise, bond, angle, dihedral, improper, and K-space, respectively, where F i is the force acting on atom i due to the interaction, r i 0 stands for the relative position of atom i to the geometric center of its interacting atoms, and N p, N b, N a, N d, and N i are the number of interactions of pairs, bonds, bond angles, dihedral angles, and improper angles, respectively. The normalized TC contribution η from the equivalent of the heat flux terms is
η = J p a r / J t o t ,
(11)
where J t o t is the total heat flux calculated by Eq. (9) and J p a r is the partial heat flux solved by Eqs. (10) and (11).

Polymer descriptors translate structural and chemical information about polymers into a machine-readable form for ML model training.174 Successful descriptors are required to uniquely, completely, and minimally express polymer information and are a prerequisite and important condition for guaranteeing high accuracy of ML models. To date, many polymer representations have emerged to be exploited for polymer informatics, which can be categorized into three categories:175 string-based descriptors, graph-based descriptors, and physical-based descriptors, as illustrated in Fig. 3.

FIG. 3.

Feature engineering for polymer informatics. It shows the three different options for polymer representation, including string-based, graph-based, and physical-based descriptors.

FIG. 3.

Feature engineering for polymer informatics. It shows the three different options for polymer representation, including string-based, graph-based, and physical-based descriptors.

Close modal

1. String-based descriptors

String-based descriptors are efficient and convenient line notations, such as SMILES158,176 and self-referencing embedded strings (SELFIES).177,178 SMILES is a popular polymer representation that uniquely encodes atoms, bonds, rings, and branches in polymer monomers by ASCII string. SMILES allows a relatively uniform expression of polymers, does not depend on a strong chemical background, and has good readability for both humans and computers. Thus, SMILES is widely used in data storage,142 modeling,141,153,154,179 and ML174,180 of polymers and has become a standard tool in computational chemistry. However, along with the progress of polymer informatics, SMILES has also exposed some limitations. For instance, in some inverse design tasks based on evolutionary or deep learning algorithms, a lot of the generated SMILES cannot correspond to valid polymers.181 An improved version of SELFIES96 was proposed to address this issue, which can represent every polymer and be directly applied in arbitrary ML models without the adaptation of the models. Moreover, SMILES cannot usually be directly fed into the regression models and need to be further transformed through one-hot encoding110 or chemical language models.182–184 

2. Graph-based descriptors

Graph-based descriptors are organic chemical representations based on topological information, involving substructure statistics, interatomic connections, and relative positional relationships. Molecular access system (MACCS) keys185 are one of the most commonly used structural keys and have been integrated into some open-source cheminformatics software, including RDKit,148 CDK,186 and OpenBabel.187 The generated MACCS keys contain 167 bits, the first bit determines whether the molecule has a predefined feature (if exists is set to 1, else is set to 0), and the last 166 bits correspond to 166 substructures. Morgan fingerprints, also known as extended connectivity fingerprints, are adapted from the Morgan algorithm and are one of the most popular fingerprints in chemical information.188–190 The generation of Morgan fingerprints of a polymer monomer requires three steps.191 (1) Initialization: initializes each atom to be encoded as a unique integer identifier. (2) Iteration: in each iteration, the identifier of each atom is updated to a combination of its own and its neighbors' identifiers. The emerging identifiers are hashed to yield a fixed-length bit vector. Once all atoms have been given new identifiers, the old ones are replaced and the new identifiers are (3) post-processing: after a prespecified number of iterations, duplicate atom identifiers are removed, and the Morgan fingerprints are formed by the retained identifiers. Mogan fingerprints for polymer representation have the advantages of high efficiency, convenience, and absence of a pre-training process, but their feature dimensions are very high and sparse, and possibly introduce bit collisions caused by the hashing process. Mol2vec192 is an unsupervised method inspired by natural language processing techniques, that considers compound substructures derived from the Morgan algorithm as words and compounds as sentences. Ma et al.120,193 trained the Mol2vec model on more than 1 × 106 monomer structures from a combination of PoLyInfo and PI1M databases and obtained a pre-trained model called polymer embedding. Polymer embedding is a 300-dimensional continuous-valued vector and has been successfully used for the predictions of polymer density, melting temperature, glass transition temperature, and TC. Morgan fingerprint with frequency (MFF)116 is an expansion of Morgan fingerprints, which captures the frequency of chemical moieties (substructures) present in monomers. Across the entire explored chemical space, substructures (identifiers) with a frequency of occurrences larger than a predefined threshold are preserved to compose descriptor vectors. MFF has much lower feature dimensions compared to Morgan fingerprints, which can effectively suppress the overfitting of ML models and is widely utilized in the prediction of various properties of polymers.114,115,194

3. Physical-based descriptors

Exploring the collection of physically independent descriptors for characterizing molecular structures is important for qualitative structure–property relationship building and provides a more intuitive guide to molecular performance assessment.195 Benefiting from the advances in the feature engineering of drug-like molecules, some chemoinformatics software196–198 is available for the automatic calculation of molecular descriptors. For example, Mordred198 is a mainstream descriptor-calculation software that can calculate more than 1800 descriptors, including constructional descriptors, 2D topological descriptors, and 3D geometric descriptors.174 However, polymer monomers are different from small molecules due to the presence of connection site information, which prevents the creation of 3D structures and leads to the inability to obtain some geometric descriptors. To compensate for the lack of 3D information, we incorporated molecular force field (FF) parameters as added descriptors and designed automated physical feature engineering for polymer descriptor optimization in our previous work.85,124 The initial descriptors are a joint collection, which are calculated by the Mordred software and extracted from the parameters of the polymer structural data file after the FF assignment. Afterward, feature down-selection technology is employed to acquire the optimized descriptors, which consists of three stages.

  1. Evaluate the numerical fluctuations of each physical descriptor itself using a variance indicator and remove features with low variance, since these features have less impact on target properties. This is beneficial in reducing the complexity and improving the performance of ML models. For a specific physical descriptor, the variance can be denoted as
    S 2 = 1 n u m 1 i = 1 n u m ( x i x ¯ ) 2 ,
    (12)
    where n is the number of candidates, x i is the value of the feature for each candidate, and x ¯ is the average of the feature values.
  2. Filtering and removal of features with poor correlation to target property by four correlation coefficients of the four metrics of Pearson, Spearman, distance, and maximum information coefficients (MIC). For two variables x and y, the Pearson coefficient can be solved by
    P e a x , y = i = 1 n u m ( x i μ x ) ( y i μ y ) ( n u m 1 ) S x S y ,
    (13)
    μ k = 1 n u m i = 1 n u m k ,
    (14)
    S k = 1 n u m 1 i = 1 n u m ( k i k ¯ ) 2 ,
    (15)
    where μ x and μ y are the mean values of the two variables x and y, respectively. S x and S y are the standard deviations of the two variables x and y, respectively. k can indicate any of the variables x and y.
    The Spearman coefficient is defined as
    r s = 1 6 i = 1 n u m d i 2 n u m ( n u m 2 1 ) ,
    (16)
    where d i is the difference in the ranks of the corresponding variables, i.e., the difference in the positions (ranks) of pairs of variables after the two variables have been individually ranked. In addition, the distance correlation coefficient199 and MIC200 are used to measure the correlation of possible nonlinear variables.
  3. Feature optimization using ML models to better match model performance. Recursive feature elimination (RFE)201 is a widely employed technique for feature selection, which selects features by removing features with smaller absolute weights while ensuring accuracy in repetitive training of the ML model.202 

Establishing the mapping from microstructures to TCs using interpretable ML contributes to the understanding of the intrinsic thermal transport mechanism and guides the design of novel promising structures. Shapley additive explanations (SHAP)203 is a powerful tool for explaining ML models to alleviate black-box challenges, which links the optimal credit allocation of the model's input features to local explanations. Besides, the symbolic regression (SR)204–207 technique can construct analytical models of key physical parameters for TC prediction and intuitively assists in capturing the underlying physical correlations. Some open-source frameworks for SR, such as gplearn,205 PySR,208 and SISSO,209 facilitate the discovery of optimal combinations between features and arithmetic symbols, enabling the creation of interpretable explicit mathematical models. Moreover, these tools are user-friendly and provide helpful step-by-step guidance.210–212 Here, we presented three case studies of interpretable TC prediction models constructed using the SHAP or SR approaches, respectively.

The training of the regression model started with 1051 polymers sourced from a computational database, and all candidates were labeled TCs by NEMD simulations in RadonPy.141 We then calculated 325 initial descriptors, of which 294 are Mordred-based descriptors and 31 are MD-based descriptors. The down-selection process is displayed in Fig. 4(a). The threshold for variance assessment was 0.10 and a total of 202 descriptors were reserved. A weight assignment mechanism was developed for filtering descriptors, each metric was assigned a factor of 0.25, and descriptors with a cumulative weighting factor of 1 were retained. That is, the descriptor is valid only if all of the four correlation coefficients reach the corresponding thresholds. We filtered out 53 descriptors with a cumulative factor of 1 using the thresholds of 0.050, 0.050, 0.213, and 0.186 for Pearson, Spearman, Distance, and maximum information coefficients. Ultimately, the random forest (RF) model combined with RFE was employed in the Scikit-learn213 for descriptor optimization, where 25 descriptors were determined based on the evaluation of the model's accuracy, as listed in Table II.

FIG. 4.

Interpretable machine learning model on physical descriptors and TC. (a) The polymer descriptor down-selection process. The initial descriptors (Init.) dimensionality reduction by removing features with low variance (Var.), correlation coefficients filtering (Cor.), and feature recursion elimination to obtain the optimized (Opt.) descriptors. (b) Comparison of MD calculated and RF model predicted TC. (c) Average SHAP importances for optimized descriptors. The blue and red bars indicate positive and negative importance. (d) Impact of each optimized descriptor on TC. (e) and (f) SHAP value for the MW_ratio and K_bond_ave of the train set polymer as the functions of descriptor value. The MW_ratio represents the ratio of the molecular weight of the backbone to the total molecular weight of the monomer, and the K_bond_ave indicates the average of different bond force constants. Reproduced with permission from Huang et al., J. Mater. Chem. A 11(38), 20539–20548 (2023). Copyright 2023 Royal Society of Chemistry.124 

FIG. 4.

Interpretable machine learning model on physical descriptors and TC. (a) The polymer descriptor down-selection process. The initial descriptors (Init.) dimensionality reduction by removing features with low variance (Var.), correlation coefficients filtering (Cor.), and feature recursion elimination to obtain the optimized (Opt.) descriptors. (b) Comparison of MD calculated and RF model predicted TC. (c) Average SHAP importances for optimized descriptors. The blue and red bars indicate positive and negative importance. (d) Impact of each optimized descriptor on TC. (e) and (f) SHAP value for the MW_ratio and K_bond_ave of the train set polymer as the functions of descriptor value. The MW_ratio represents the ratio of the molecular weight of the backbone to the total molecular weight of the monomer, and the K_bond_ave indicates the average of different bond force constants. Reproduced with permission from Huang et al., J. Mater. Chem. A 11(38), 20539–20548 (2023). Copyright 2023 Royal Society of Chemistry.124 

Close modal
TABLE II.

Description of 25 optimized descriptors. Reproduced with permission from Huang et al., J. Mater. Chem. A 11(38), 20539–20548 (2023). Copyright 2023 Royal Society of Chemistry.124 

No.LabelsDescriptionSource
BCUTZ-1h First highest eigenvalue of Burden matrix weighted by atomic number Mordred 
AATS0d Averaged Moreau–Broto autocorrelation of lag 0 weighted by sigma electrons Mordred 
MW_ratio Ratio of mainchain molecular weight to monomer molecular weight MD 
K_bond_ave Average of different bond force constants in monomer MD 
BCUTd-1h First highest eigenvalue of Burden matrix weighted by sigma electrons Mordred 
AATS0Z Averaged Moreau–Broto autocorrelation of lag 0 weighted by atomic number Mordred 
Mass_max Maximum atomic mass in a monomer MD 
Monomer_length Monomer length after relaxation MD 
Mor02 3D-MoRSE (distance = 2) Mordred 
10 ATSC5Z Centered Moreau–Broto autocorrelation of lag 5 weighted by atomic number Mordred 
11 nHBDon Number of hydrogen bond donor Mordred 
12 Mor19 3D-MoRSE (distance = 19) Mordred 
13 Kier3 Kappa shape index 3 Mordred 
14 ATSC2Z Centered Moreau–Broto autocorrelation of lag 2 Weighted by atomic number Mordred 
15 Mor14 3D-MoRSE (distance = 14) Mordred 
16 Mass_ave Average atomic mass in a monomer MD 
17 AATSC2Z Averaged and centered Moreau–Broto autocorrelation of lag 2 weighted by atomic number Mordred 
18 K_ang_ave Average of different bond angle force constants in monomer MD 
19 AATSC0Z Averaged and centered Moreau–Broto autocorrelation of lag 0 weighted by atomic number Mordred 
20 SMR_VSA3 MOE MR VSA Descriptor 3 (1.82 < = x < 2.24) Mordred 
21 MIC0 0-ordered modified information content Mordred 
22 SMR_VSA1 MOE MR VSA Descriptor 1 (-inf < x < 1.29) Mordred 
23 VSA_EState4 VSA EState Descriptor 4 (5.41 < = x < 5.74) Mordred 
24 MIC1 One-ordered modified information content Mordred 
25 nH Number of H atoms Mordred 
No.LabelsDescriptionSource
BCUTZ-1h First highest eigenvalue of Burden matrix weighted by atomic number Mordred 
AATS0d Averaged Moreau–Broto autocorrelation of lag 0 weighted by sigma electrons Mordred 
MW_ratio Ratio of mainchain molecular weight to monomer molecular weight MD 
K_bond_ave Average of different bond force constants in monomer MD 
BCUTd-1h First highest eigenvalue of Burden matrix weighted by sigma electrons Mordred 
AATS0Z Averaged Moreau–Broto autocorrelation of lag 0 weighted by atomic number Mordred 
Mass_max Maximum atomic mass in a monomer MD 
Monomer_length Monomer length after relaxation MD 
Mor02 3D-MoRSE (distance = 2) Mordred 
10 ATSC5Z Centered Moreau–Broto autocorrelation of lag 5 weighted by atomic number Mordred 
11 nHBDon Number of hydrogen bond donor Mordred 
12 Mor19 3D-MoRSE (distance = 19) Mordred 
13 Kier3 Kappa shape index 3 Mordred 
14 ATSC2Z Centered Moreau–Broto autocorrelation of lag 2 Weighted by atomic number Mordred 
15 Mor14 3D-MoRSE (distance = 14) Mordred 
16 Mass_ave Average atomic mass in a monomer MD 
17 AATSC2Z Averaged and centered Moreau–Broto autocorrelation of lag 2 weighted by atomic number Mordred 
18 K_ang_ave Average of different bond angle force constants in monomer MD 
19 AATSC0Z Averaged and centered Moreau–Broto autocorrelation of lag 0 weighted by atomic number Mordred 
20 SMR_VSA3 MOE MR VSA Descriptor 3 (1.82 < = x < 2.24) Mordred 
21 MIC0 0-ordered modified information content Mordred 
22 SMR_VSA1 MOE MR VSA Descriptor 1 (-inf < x < 1.29) Mordred 
23 VSA_EState4 VSA EState Descriptor 4 (5.41 < = x < 5.74) Mordred 
24 MIC1 One-ordered modified information content Mordred 
25 nH Number of H atoms Mordred 

The 1051 polymers were represented by optimized physical descriptors and randomly split by training/test set as 80%/20%. We constructed an RF model using those optimized descriptors, where the hyperparameters for RF were optimized with Bayesian optimization with R2 as a target.214 The Gaussian regression process and acquisition function with ten random pairs of parameters were selected for initial training, and the ideal parameters for each ML model were determined after 100 optimization iterations.215, Figure 4(b) compares the TCs predicted by RF with those calculated by NEMD simulations, with the training and test R2 of 0.88 and 0.78, respectively. Moreover, we conducted 20 evaluations for RF models with optimized descriptors. During each evaluation, the training and test data were randomly sampled from a total of 1051 benchmark data at a ratio of 80/20%. The test R2 of 20 RFs is 0.72 ± 0.02 (mean value and one standard deviation). The prediction error of the RF model in the high TC region (TC > 0.40 W m−1 K−1) is relatively large, since the percentage of these candidates among 1051 polymers is small (∼2.28%).124 In addition, we verified the robust performance of the optimization descriptors and on other ML models such as multilayer perceptron and kernel ridge regression, and confirmed their superiority over graph descriptors of MACCS keys and Morgan fingerprints in our previous work.124 

The trained RF model was explained by the SHAP approach, and the importance of the features was analyzed based on the output SHAP values. Figure 4(c) exhibits the eight most important physical descriptors related to properties such as atomic number, atomic mass, bond connection, bond strength, sigma electrons, and monomer length. Combined with the distribution of SHAP values for the first eight physical descriptors of training candidates in Fig. 4(d), the promotion/inhibition of those key descriptors for TC can be recognized in general. The most significant descriptor is BCUTZ-1 h, which is the first highest eigenvalue of the Burden matrix weighted by atomic number and is associated with atomic and bonding properties.216 BCUTZ-1 h was observed to have a strong positive correlation with the maximum atomic mass (Mass_max) in the monomer.124 Typically, the presence of large masses of atoms such as chlorine, bromine, and iodine in the system suppresses lattice vibrations, resulting in small phonon group velocities and low TC. We also analyzed two MD-inspired descriptors of MW_ratio and K_bond_ave in detail, as shown in Figs. 4(e) and 4(f). The MW_ratio is the ratio of the molecular weight of the backbone to the total molecular weight of the repeating unit, and the K_bond_ave indicates the average of different bond force constants. Both descriptors have a positive effect on TC, as polymers with fewer side chains and stronger chain stiffness are favorable for efficient thermal transport.

The utilization of substructures of polymers as descriptors is more intuitive than physical descriptors in revealing structure–TC relationships and facilitates the design of new structures. We give a guide to building a deep neural network (DNN) model using 1144 polymer data with MFF. Among the 1144 polymers, the TCs of most of the structures were collected in a computational database,141 and the rest were calculated in Radonpy using the same setup.124 Those polymers were expressed in the form of the SMILES and converted to MFFs. The MFF has 194 dimensions, mapped to the counts of the 194 most frequent substructures in the entire training data set. For DNN model training, the hyperparameters were optimized by Keras Tuner217 Toolkit with Adam optimizer and mean squared error loss in TensorFlow,218 using the 1144 polymers with a training/testing ratio of 80%/20%. After optimization of hyperparameters, the trained DNN model has four hidden layers with 416, 256, 244, and 256 nodes, respectively; ReLU activation; and dropout of 0.5.

The pairs of DNN-predicted and MD-calculated TCs are plotted in Fig. 5(a), with good consistency and a test root mean square error (RMSE) of 0.040 W m−1 K−1 and test R2 of 0.79. We performed an additional fivefold cross-validation (CV) to evaluate the accuracy of the DNN model. In the fivefold CV, the test R2 of the DNN models is 0.72 ± 0.03 (mean value and one standard deviation), reflecting that the trained DNN model is reliable.219 Further, the trained DNN model was interpreted by SHAP, and the role of the most important 16 substructures on TC is shown in Fig. 5(b). When the descriptor values follow the same trend as the SHAP values, it suggests that the substructure is contributing to the realization of high TC. Thus, eight substructures were found to have a positive effect on TC, while six substructures suppressed TC, which are marked in Fig. 5(c) in blue and red text, respectively. The insights that can be extracted from these substructures coincide with the conclusions previously gained from the RF model trained with physical descriptors in Sec. III A, namely, that conjugated, linear side-chain-free polymers are beneficial for maintaining large chain stiffness and high TC. Additionally, when the polymer system contains heavy atoms such as fluorine (F), it hinders the efficient heat flux transport, preventing the achievement of high TC.

FIG. 5.

ML model performance and feature importance evaluation. (a) Predicted results for DNN. (b) The interpretations of the DNN model for TC prediction by the SHAP evaluation. (c) The key sub-structures that act on TC, where blue text indicates a positive effect and red indicates an inhibitory effect. Reproduced with permission from Huang et al., Mater. Today Phys. 44, 101438 (2024). Copyright 2024 Elsevier.219 

FIG. 5.

ML model performance and feature importance evaluation. (a) Predicted results for DNN. (b) The interpretations of the DNN model for TC prediction by the SHAP evaluation. (c) The key sub-structures that act on TC, where blue text indicates a positive effect and red indicates an inhibitory effect. Reproduced with permission from Huang et al., Mater. Today Phys. 44, 101438 (2024). Copyright 2024 Elsevier.219 

Close modal

SR is another interpretable ML method for discovering specific mathematical expressions to match the fit of a data set.206 SR simultaneously searches for a set of parameters and the optimal mathematical formula of a function.220 Reliable training data are critical for SR without requiring massive amounts of data. We calculated the TC of 104 promising amorphous polymers recommended by the RF model124 and found that their R g extracted from equilibrium systems have a strong linear positive correlation with the TCs, as shown in Fig. 6(a). Besides the radius of gyration, we additionally acquired ten parameters from the equilibrium amorphous systems for SR, as listed in Table III. The SR was implemented in gplearn,205 and the hyperparameters settings are listed in Table IV. gplearn is a proven tool based on genetic programming that provides a scheme for optimizing mathematical expressions using genetic algorithms. Thus, the main input parameters of the genetic algorithm in gplearn include the optimization generations, the number of formulas produced in each generation, the crossover probability, and the probability of mutation at each node. The formulas were selected based on the criterion of simultaneously having low complexity and high fitting accuracy. The complexity was defined as the length of the formula, with each constant, variable, or operation symbol being represented as a unit of length. Figure 6(b) shows statistics of the 3364 mathematical formulae with complexity within 15 and R2 over 0.70 by density plot. The six formulas F1–F6 at the Pareto front are marked by stars, and their corresponding analytic functions are listed in Table V. The Pareto front is the total set of all feasible and Pareto optimal solutions and is often considered the optimal trade-off between various objectives.221,222 Overall, formulas with large complexity are more likely to yield large prediction accuracies. Of the six formulas, the smallest complexity is only 5, and the highest precision is 0.876, and their predicted TCs are both in good agreement with those calculated by NEMD, as depicted in Figs. 6(c) and 6(d). All six formulas capture the positive correlation between R g and TC, and three of them reflect the fact that it is unfavorable to have large masses of atoms in the systems for TC. R g was calculated to express the spatial extent of the molecular chains. When the molecular chains in the amorphous system have a high R g, it is beneficial to maintain large rigid chain segments and enhance the heat transport along the chain backbone through intra-chain bonding interactions, thus increasing the TC.60,124 In addition, it is worth mentioning that since some of the variables utilized for SR, such as R g, were extracted from the equilibrated amorphous systems and are closely related to the TC, resulting in higher prediction accuracy of the obtained analytic mathematical models than that of the RF model in Fig. 4(b).

FIG. 6.

Symbolic regression for analytic model construction. (a) The relationships between the radius of gyration R g and TC of 104 polymers in this work. (b) Accuracy R2 vs complexity of 3364 mathematical formulas shown via density plot. The six points of F1–F6 were picked by Pareto search. (c) and (d) Calculated TC vs fitted TC of formulas F1 and F6, respectively. Reproduced with permission from Huang et al., J. Mater. Chem. A 11(38), 20539–20548 (2023). Copyright 2023 Royal Society of Chemistry.124 

FIG. 6.

Symbolic regression for analytic model construction. (a) The relationships between the radius of gyration R g and TC of 104 polymers in this work. (b) Accuracy R2 vs complexity of 3364 mathematical formulas shown via density plot. The six points of F1–F6 were picked by Pareto search. (c) and (d) Calculated TC vs fitted TC of formulas F1 and F6, respectively. Reproduced with permission from Huang et al., J. Mater. Chem. A 11(38), 20539–20548 (2023). Copyright 2023 Royal Society of Chemistry.124 

Close modal
TABLE III.

Symbol of parameters in symbolic regression. Reproduced with permission from Huang et al., J. Mater. Chem. A 11(38), 20539–20548 (2023). Copyright 2023 Royal Society of Chemistry.124 

No.DescriptionSymbol
x0 MW_ratio M 
x1 K_bond_ave kbavg 
x2 K_ang_ave kaavg 
x3 Mass_max mmax 
x4 nHBDon nH 
x5 Density ρ 
x6 Number density n 
x7 Radius of gyration Rg 
x8 Persistence length ξ 
x9 Specific heat capacity at constant pressure CP 
x10 Specific heat capacity at constant volume CV 
No.DescriptionSymbol
x0 MW_ratio M 
x1 K_bond_ave kbavg 
x2 K_ang_ave kaavg 
x3 Mass_max mmax 
x4 nHBDon nH 
x5 Density ρ 
x6 Number density n 
x7 Radius of gyration Rg 
x8 Persistence length ξ 
x9 Specific heat capacity at constant pressure CP 
x10 Specific heat capacity at constant volume CV 
TABLE IV.

Hyperparameters setup for symbolic regression. Reproduced with permission from Huang et al., J. Mater. Chem. A 11(38), 20539–20548 (2023). Copyright 2023 Royal Society of Chemistry.124 

ParameterValue
Generations 300 
Population size in every generation 5000 
Probability of crossover (pc) 
[0.30,0.90], step = 0.05 
Probability of subtree mutation (ps) 
[(1-pc)/3, (1-pc)/2] (step = 0.01) 
Probability of hoist mutation (ph) 
[(1-pc)/3, (1-pc)/2] (step = 0.01) 
Probability of point mutation (pp) 
1-pc-ps-ph 
Function set  [ + , , × , ÷ , x , ln x , | x | , x , 1 / x ] 
Init_depth [2, 6], [4, 8], [6, 10], [2, 10] 
Parsimony coefficient 0.003, 0.005 
Metric R2 
Stopping criteria 0.900 
Random_state 0, 1, 2, 3, 4 
ParameterValue
Generations 300 
Population size in every generation 5000 
Probability of crossover (pc) 
[0.30,0.90], step = 0.05 
Probability of subtree mutation (ps) 
[(1-pc)/3, (1-pc)/2] (step = 0.01) 
Probability of hoist mutation (ph) 
[(1-pc)/3, (1-pc)/2] (step = 0.01) 
Probability of point mutation (pp) 
1-pc-ps-ph 
Function set  [ + , , × , ÷ , x , ln x , | x | , x , 1 / x ] 
Init_depth [2, 6], [4, 8], [6, 10], [2, 10] 
Parsimony coefficient 0.003, 0.005 
Metric R2 
Stopping criteria 0.900 
Random_state 0, 1, 2, 3, 4 
TABLE V.

The six mathematical formulas at the Pareto front in Fig. 6(b). Reproduced with permission from Huang et al., J. Mater. Chem. A 11(38), 20539–20548 (2023). Copyright 2023 Royal Society of Chemistry.124 The key materials parameters include the K_bond_ave (kbavg), Mass_max (mmax), nHBDon (nH), number density (n), radius of gyration (Rg), and persistence length (ξ).

NumberFormulasR2Complexity
F1 0.132nRg 0.775 
F2  0.01 R g + 0.598 / ξ 0.837 
F3  0.043 R g ln ( ξ / 0.27 ) 0.843 
F4  0.085 R g / ( m m a x ξ ) 0.852 
F5  0.081 R g m m a x ξ n H 0.867 11 
F6  R g m m a x 0.024 k b a v g ξ n H 0.876 13 
NumberFormulasR2Complexity
F1 0.132nRg 0.775 
F2  0.01 R g + 0.598 / ξ 0.837 
F3  0.043 R g ln ( ξ / 0.27 ) 0.843 
F4  0.085 R g / ( m m a x ξ ) 0.852 
F5  0.081 R g m m a x ξ n H 0.867 11 
F6  R g m m a x 0.024 k b a v g ξ n H 0.876 13 

Active learning is oriented to the design of new polymers driven by target properties, which breaks through the limitations of regression tasks restricted to a fixed exploration chemical space. The inverse design of polymers with high TC can be achieved by some lightweight and smart optimization algorithms, such as genetic algorithm, Bayesian optimization, and quantum annealing. In this section, we not only present some cases of polymer design with a single target of high TC, but also additionally consider the synthesizability of polymers in multi-objective optimization trials.

Inspired by the knowledge of the interpretable DNN model outcomes in Sec. III B and the seen high TC polymers, we constructed a library containing 32 polymer motifs, as listed in Table VI. To ensure the uniqueness of the identification for each fragment, these motifs were binary coded from [00000] to [11111]. Theoretically, we can construct an infinite number of multi-block polymers by controlling the number and order of fragments. However, in consideration of the computational cost and hardware capabilities, we built a benchmark data set of triblock polymers. Figure 7(a) depicts the process of producing a triblock polymer, which was characterized by a binary sequence of 15 bits in length. The 15-bit binary sequence was divided equally into three equal parts, each one corresponding to a fragment. The composition of the polymer fragments is directionless, i.e., two polymers consisting of the fragments [0, 2, 29] and [29, 2, 0] are equivalent. The entire benchmark data set contains 16 896 triblock polymers, which were classified into 13 categories referring to the same classification method as PoLyInfo, such as polyolefins, polyethers, and polyimides [in Fig. 7(c)]. The TC of emerging polymers was estimated by a high-fidelity DNN model trained in Sec. III B and ranged from 0.16 to 1.03 W m−1 K−1, with 42.6% of the TC greater than 0.40 W m−1 K−1 [in Fig. 7(d)]. Moreover, the synthesizability of these polymers was evaluated by the synthetic accessibility (SA) score.223 The SA score, which ranges from 1 (easy) to 10 (hard), is calculated by considering both fragment contribution and complexity penalties. This score is used to evaluate the synthesizability of molecules or polymer repeating units. The SA scores for polymers are within a range of 2.28–6.21, with 6.3% of them having scores less than 3.0. Out of all 16 896 generated polymers, only 4.5% meet the predefined criteria for both TC and SA, which are known as ideal polymers.

FIG. 7.

Construction of triblock polymers data set. (a) Example of the generation of a triblock polymer. (b) SA score vs TC of all 16 896 triblock polymers, where stars indicate candidates at the Pareto front. (c) and (d) Distributions of the TC and SA for the whole triblock polymers. The gray strips highlight the statistics of polymers with TC > 0.4 Wm−1 K−1 or SA < 3.0. Reproduced with permission from Huang et al., Mater. Today Phys. 44, 101438 (2024). Copyright 2024 Elsevier.219 

FIG. 7.

Construction of triblock polymers data set. (a) Example of the generation of a triblock polymer. (b) SA score vs TC of all 16 896 triblock polymers, where stars indicate candidates at the Pareto front. (c) and (d) Distributions of the TC and SA for the whole triblock polymers. The gray strips highlight the statistics of polymers with TC > 0.4 Wm−1 K−1 or SA < 3.0. Reproduced with permission from Huang et al., Mater. Today Phys. 44, 101438 (2024). Copyright 2024 Elsevier.219 

Close modal
TABLE VI.

Polymer fragments as basic units for high thermal conductivity polymer design. Each fragment was binary encoded according to serial numbers (No.). Reproduced with permission from Huang et al., Mater. Today Phys. 44, 101438 (2024). Copyright 2024 Elsevier.219 

No.SMILES of fragmentsCodeNo.SMILES of fragmentsCode
[*]C = C[*] [00000] 16 [*]c1nc2cc3nc([*])[nH]c3cc2[nH]1 [10000] 
[*]CCCCCC[*] [00001] 17 [*]CC(=O)N[*] [10001] 
[*]C#CC = C[*] [00010] 18 [*]CNC(=O)N[*] [10010] 
[*]c1ccc([*])cc1 [00011] 19 [*]C(=O)NNC([*])=O [10011] 
[*]c1ccc([*])[nH]1 [00100] 20 [*]NNC(=O)C([*])=O [10100] 
[*]c1ccc2cc([*])ccc2c1 [00101] 21 [*]c1ccc2oc([*])nc2c1 [10101] 
[*]c1ccc-2c[Cc3cc([*])ccc-23]c1 [00110] 22 [*]c1nc2ccc([*])cc2o1 [10110] 
[*]CO[*] [00111] 23 [*]NC(=O)C = CC(=O)N[*] [10111] 
[*]OC([*]) = O [01000] 24 [*]C(=O)C = CC(=O)N-[*] [11000] 
[*]c1ccc([*])o1 [01001] 25 [*]NC(=O)c1ccc([*])cc1 [11001] 
10 [*]C(=O)C = CC([*])=O [01010] 26 [*]Nc1ccc[C([*])=O]cc1 [11010] 
11 [*]C(=O)c1ccc(cc1)C([*])=O [01011] 27 [*]N1C(=O)c2ccc([*])cc2C1 = O [11011] 
12 [*]c1cnc([*])nc1 [01100] 28 [*]NC(=O)c1ccc(cc1)C([*])=O [11100] 
13 [*]Nc1ccc(N[*])cc1 [01101] 29 [*]C(=O)Nc1ccc[NC([*])=O]cc1 [11101] 
14 [*]c1nc2cc([*])ccc2[nH]1 [01110] 30 [*]n1c(=O)c2cc3c(cc2c1 = O)c(=O)n([*])c3 = O [11110] 
15 [*]c1nc2ccc([*])cc2[nH]1 [01111] 31 [*]N1C(=O)c2cc3cc4Cc5cc6cc7C(=O)N([*])C(=O)c7cc6cc5Cc4cc3cc2C1 = O [11111] 
No.SMILES of fragmentsCodeNo.SMILES of fragmentsCode
[*]C = C[*] [00000] 16 [*]c1nc2cc3nc([*])[nH]c3cc2[nH]1 [10000] 
[*]CCCCCC[*] [00001] 17 [*]CC(=O)N[*] [10001] 
[*]C#CC = C[*] [00010] 18 [*]CNC(=O)N[*] [10010] 
[*]c1ccc([*])cc1 [00011] 19 [*]C(=O)NNC([*])=O [10011] 
[*]c1ccc([*])[nH]1 [00100] 20 [*]NNC(=O)C([*])=O [10100] 
[*]c1ccc2cc([*])ccc2c1 [00101] 21 [*]c1ccc2oc([*])nc2c1 [10101] 
[*]c1ccc-2c[Cc3cc([*])ccc-23]c1 [00110] 22 [*]c1nc2ccc([*])cc2o1 [10110] 
[*]CO[*] [00111] 23 [*]NC(=O)C = CC(=O)N[*] [10111] 
[*]OC([*]) = O [01000] 24 [*]C(=O)C = CC(=O)N-[*] [11000] 
[*]c1ccc([*])o1 [01001] 25 [*]NC(=O)c1ccc([*])cc1 [11001] 
10 [*]C(=O)C = CC([*])=O [01010] 26 [*]Nc1ccc[C([*])=O]cc1 [11010] 
11 [*]C(=O)c1ccc(cc1)C([*])=O [01011] 27 [*]N1C(=O)c2ccc([*])cc2C1 = O [11011] 
12 [*]c1cnc([*])nc1 [01100] 28 [*]NC(=O)c1ccc(cc1)C([*])=O [11100] 
13 [*]Nc1ccc(N[*])cc1 [01101] 29 [*]C(=O)Nc1ccc[NC([*])=O]cc1 [11101] 
14 [*]c1nc2cc([*])ccc2[nH]1 [01110] 30 [*]n1c(=O)c2cc3c(cc2c1 = O)c(=O)n([*])c3 = O [11110] 
15 [*]c1nc2ccc([*])cc2[nH]1 [01111] 31 [*]N1C(=O)c2cc3cc4Cc5cc6cc7C(=O)N([*])C(=O)c7cc6cc5Cc4cc3cc2C1 = O [11111] 

1. Genetic algorithm

The genetic algorithm (GA) is a heuristic search algorithm that is inspired by the biological evolutionary process and based on the mechanics of natural genetics and selection.224–226 The operation of the GA algorithm involves various stages of population initialization, fitness evaluation, selection, crossover, and mutation. The core operators of crossover and mutation are illustrated in Fig. 8(a). Once the parents are selected according to the fitness function, the crossover operator combines parents into one or several offspring. After that, the mutation will execute with a predefined probability to increase the diversity in the population.

FIG. 8.

Performance evaluation of genetic algorithm (GA). (a) Illustration of crossover and mutation behaviors in GA. (b) and (c) Convergence of GAs for a single and 20 parallel runs.

FIG. 8.

Performance evaluation of genetic algorithm (GA). (a) Illustration of crossover and mutation behaviors in GA. (b) and (c) Convergence of GAs for a single and 20 parallel runs.

Close modal

In our case, the GA was realized in the pymoo227 package with simulated binary crossover and polynomial mutation.228  Figure 8(b) depicts the convergence curve of the GA in a single optimization run, using the setups of ten initial random structures, 200 iterations × 10 candidates per batch, and both crossover and mutation probabilities of 100%. Thanks to the inheritance of excellent fragments from the parents, GA can efficiently explore the optimized polymers by only simulating a few candidates (64 non-repeating structures). To probe the effect of the initial structures on the convergence capacity, 20 GAs with different initial structures were executed, and the results are exposed in Fig. 8(c). The TC of the best structures for 20 runs ranges from 0.72 to 1.03 W m−1 K−1, within the top 0.3% of all possible candidates. A total of 6 rounds out of 20 optimizations yielded polymers with the global optimal TC, which indicates that the initial population has a significant influence on the optimization performance of the GA.

2. Bayesian optimization

Bayesian optimization (BO) is a method for finding the optimal solution to black-box functions by using sequential design strategies that rely on the probabilistic surrogate models and acquisition functions.229,230 Previously, we have released a tutorial on the design of thermal functional materials by coupling thermal transport calculations and BO.231 Moreover, additional instruction is available to assist in understanding the core components of BO more intuitively.232 The general process of BO can be described as225 (1) training a surrogate probabilistic model using several structures with the labeled property; (2) fitting the function using the surrogate model and recommending new structures by the acquisition function; (3) observing the property of the emerging structures and adding them to the training data set; and (4) repeating the above process until the predefined iteration times are reached.

As shown in Fig. 9(a), the single-task Bayesian optimization with a Gaussian process regression model and the Monte Carlo acquisition function of qEI was implemented in BoTorch.233  Figures 9(b) and 9(c) count the convergence curves of TCs for single and 20 parallel BO runs, using the hyperparameters set of initial random structures, 200 iterations × 10 candidates per batch, respectively. Along with optimization iterations, BO does search for optimized structures with enhanced TCs. However, many candidates with low TC are invalidly selected due to the uncertainty of Monte Carlo sampling. The best candidates in 20 runs have TC changed from 0.77 to 1.03 W m−1 K−1, of which 14 runs reach the global optimal structure.

FIG. 9.

Performance evaluation of Bayesian optimization (BO) algorithm. (a) Gaussian process regression and acquisition functions in BO. (b) and (c) Convergence of BO algorithms for a single and 20 parallel runs.

FIG. 9.

Performance evaluation of Bayesian optimization (BO) algorithm. (a) Gaussian process regression and acquisition functions in BO. (b) and (c) Convergence of BO algorithms for a single and 20 parallel runs.

Close modal

3. Quantum annealing in a quantum virtual machine

Quantum annealing (QA) is an optimization algorithm assisted by Ising machines to search for the global minimum of a given problem over a given set of candidate solutions (candidate states),234,235 which are implemented with superconducting qubits, ASICs, GPUs, and so on.236 QA a has been successfully applied to the design of inorganic and organic materials.236,237 Ising machine was developed specifically for solving quadratic unconstrained binary optimization (QUBO), which is adapted to the binary sequence encoding of polymers. QUBO with N bits is described as238,
H = i = 1 N j = 1 N Q i j q i q j ,
(17)
where q i and Q i j are real-valued parameters and Q i j = Q j i. From Fig. 10(a), the factorization machine (FM) was employed as the surrogate model, and the optimal binary solution was identified by trained FM with a quantum annealer. The form of FM is given by238,
f ( q ) = i = 1 N w i q i + i = 1 N j = 1 N k = 1 K v i k v j k q i q j ,
(18)
where w i and v i k are real-valued parameters, and the rank K was set to 8 as in the previous literature.238 Due to hardware limitations, we utilized a sampler called dimod239 in the Python framework instead of Ising machines to simulate the real quantum annealing. In Figs. 10(c) and 10(d), although the simulated QA leads to an increase in the TC of the polymers, the TC of optimal polymers is below 0.80 W m−1 K−1 and fails to search for the global optimal structure in the 20 parallel optimizations with various initial candidates.
FIG. 10.

Performance evaluation of quantum annealing (QA) algorithm in a quantum virtual machine. (a) Factorization machine and Ising machine. (b) and (c) Convergence of simulated QA algorithms for a single and 20 parallel runs.

FIG. 10.

Performance evaluation of quantum annealing (QA) algorithm in a quantum virtual machine. (a) Factorization machine and Ising machine. (b) and (c) Convergence of simulated QA algorithms for a single and 20 parallel runs.

Close modal

4. Performance comparison of three optimization algorithms

Figure 11(a) compares the optimization performance of the three algorithms based on the averaged TCs from 20 separate optimization runs at different random states, and the shadows correspond to a standard deviation. Since BO has a robust Gaussian kernel and acquisition function that comprehensively evaluates the TC and uncertainty of the candidates, it has the strongest global optimization capability and the best overall performance. GA is able to inherit some fragments that have a positive effect on TC; thus, it enables a rapid increase in the TC of the optimized structures in the early optimization stages, but its ability is confined to the initial populations. Limited to the accuracy of the FM-trained surrogate model and the hardware of Ising machines, the simulated QA performs worse. However, affected by the stochasticity and uncertainty of the Monte Carlo sampler, BO simulated significantly more structures (after de-weighting) in a single optimization run compared to GA and QA [Fig. 11(b)].

FIG. 11.

Comparison of different optimization algorithms. (a) Convergence curves of three optimization algorithms, each curve was averaged by the outcomes of 20 runs, and the shading indicates one standard deviation. (b) Number of candidates designed by three optimization algorithms after de-duplication in 20 runs, where the diamonds are the mean values and the error bars represent a standard deviation.

FIG. 11.

Comparison of different optimization algorithms. (a) Convergence curves of three optimization algorithms, each curve was averaged by the outcomes of 20 runs, and the shading indicates one standard deviation. (b) Number of candidates designed by three optimization algorithms after de-duplication in 20 runs, where the diamonds are the mean values and the error bars represent a standard deviation.

Close modal

Two state-of-the-art multi-objective optimization algorithms of unified non-dominated sorting genetic algorithm III (U-NSGA-III) and q-noisy expected hypervolume improvement (qNEHVI) were employed for the design of triblock polymers with both high TC and synthetic possibility. As depicted in Fig. 12(a), the U-NSGA-III is a multi-objective evolutionary algorithm (MOEA) that is an updated version of NSGA-III.240,241 It generalizes different dimensional objective problems by increasing the selection pressure by a scalar selection operator. U-NSGA-III was implemented in the pymoo227 and kept all hyperparameters with default values. qNEHVI is a multi-objective Bayesian optimization (MOBO) that extends the acquisition function of expected improvement to hypervolume (HV) as an objective. It evaluates samples collected by the Quasi-Monte Carlo (QMC) sampler from the model posterior and identifies the candidate with the largest objective value. qNEHVI was operated in BoTorch,233 with the base and raw sampling set at 256 and 128, respectively, to enhance computational efficiency.

FIG. 12.

Evaluation of multi-objective optimization algorithms. (a) Core components for U-NSGA-III and qNEHVI. (b) and (c) Optimization trajectories for a single run of MOEA and MOBA with ten random initial structures and 200 iterations × 10 candidates per batch. Reproduced with permission from Huang et al., Mater. Today Phys. 44, 101438 (2024). Copyright 2024 Elsevier.219 

FIG. 12.

Evaluation of multi-objective optimization algorithms. (a) Core components for U-NSGA-III and qNEHVI. (b) and (c) Optimization trajectories for a single run of MOEA and MOBA with ten random initial structures and 200 iterations × 10 candidates per batch. Reproduced with permission from Huang et al., Mater. Today Phys. 44, 101438 (2024). Copyright 2024 Elsevier.219 

Close modal

Figures 12(b) and 12(c) illustrate the optimization trajectories for a single run of MOEA and MOBO with ten random initial structures and 200 iterations × 10 candidates per batch. Nine gray stars indicate the sites of global optimal polymers, while the polymer dots are color-coded according to the generations. The distribution of searched non-duplicated polymer structures in a MOBO run is much denser than those in a MOEA run. qNEHVI integrates HV into the expected improvement acquisition function as an objective to evaluate the randomized QMC samples sourced from the model posterior, generating non-duplicated candidates in almost every generation. This enables the models to break out of local optimal solutions and further increases HV. In contrast, the optimization strategy of U-NSGA-III is inspired by the behavior of genes in organisms that crossover and mutate during evolution. The optimal polymers are designed by randomly selecting parents for matching and introducing a tournament operator. However, the performance of U-NSGA-III is affected by the initial polymer structures, as the optimization process primarily accumulates previous polymer units with positive contributions, making it easy to become trapped in locally optimal solutions.

To obtain statistical outcomes, we performed 20 runs of the MOEA and MOBO algorithms with different initial candidates, respectively. The HV convergence curves are displayed in Figs. 13(a) and 13(b). The HVs of U-NSGA-III can rapidly rise to a certain level within 20 generations, but it is difficult to increase again in subsequent generations. However, three qNEHVI runs identified nine global optimal polymers within 200 generations, and almost all of the HVs experienced a secondary boost after reaching a certain level for the first time. This enhancement difference depends on the stochastic nature of QMC sampling.221 All the HVs of optimization algorithms reach a referred value calculated by the five ideal global optimal Pareto polymers (TC > 0.4 W m−1 K−1 and SA < 3.0) and the referred point ([0,−10] for TC and SA), although the mean HV of MOBO is greater than that of MOEA [see Figs. 13(c) and 13(d)].

FIG. 13.

Convergence of multi-objective evolutionary algorithm (MOEA) and multi-objective Bayesian optimization (MOBO) in triblock high TC polymers inverse design. (a) and (b) Convergence curves for 20 runs of MOEA and MOBO. Each optimization run with ten random initial structures and 200 iterations × 10 candidates per batch. (c) and (d) Mean hypervolume curves for 20 MOEA and MOBO runs. The upper edge of the blue strip or the red dashed line corresponds to the global optimal HV, and the lower edge of the blue strip or the blue dashed line indicates the HV computed from the five ideal global optimal Pareto polymers with the reference points. Reproduced with permission from Huang et al., Mater. Today Phys. 44, 101438 (2024). Copyright 2024 Elsevier.219 

FIG. 13.

Convergence of multi-objective evolutionary algorithm (MOEA) and multi-objective Bayesian optimization (MOBO) in triblock high TC polymers inverse design. (a) and (b) Convergence curves for 20 runs of MOEA and MOBO. Each optimization run with ten random initial structures and 200 iterations × 10 candidates per batch. (c) and (d) Mean hypervolume curves for 20 MOEA and MOBO runs. The upper edge of the blue strip or the red dashed line corresponds to the global optimal HV, and the lower edge of the blue strip or the blue dashed line indicates the HV computed from the five ideal global optimal Pareto polymers with the reference points. Reproduced with permission from Huang et al., Mater. Today Phys. 44, 101438 (2024). Copyright 2024 Elsevier.219 

Close modal

Figures 14(a) and 14(b) present the number of explored de-duplicate polymers in 20 MOEA and MOBO runs, respectively. The number of unique polymers generated per MOEA run is significantly lower than that of MOBO, with a mean value of approximately 77, which is less than 5.0% of the average value for MOBO. To address this issue, an effective approach is to design high TC polymers through multiple parallel MOEAs with different random states, thereby reducing the influence of initial structures. Furthermore, we calculated the TC of 20 MOEA-designed polymers (red dots) using NEMD in Fig. 14(c), which indeed enhances the Pareto front (marked by stars) formed with 1144 raw polymers (blue dots). The MD validation of polymers with top 3 TCs is displayed in Fig. 14(d). These polymers are all combinations of fragments of the benzene and 3,5-dihydroimidazo[4,5-f]benzimidazole, with conjugated structures and large backbone stiffness. Consequently, intra-chain interactions of bonds, angles, and dihedrals dominate the contribution to TC.

FIG. 14.

Statistics on the outcomes of inverse design algorithms. (a) and (b) Number of candidates designed by MOEA and MOBO after de-duplication in 20 runs. (c) Pareto front improvement over the 1144 raw training data after adding 20 MOEA-optimized candidates with MD-calculated TC. (d) Quantitative decomposition of TC into contributions from convection and different types of interactions of three high TC polymers. Reproduced with permission from Huang et al., Mater. Today Phys. 44, 101438 (2024). Copyright 2024 Elsevier.219 

FIG. 14.

Statistics on the outcomes of inverse design algorithms. (a) and (b) Number of candidates designed by MOEA and MOBO after de-duplication in 20 runs. (c) Pareto front improvement over the 1144 raw training data after adding 20 MOEA-optimized candidates with MD-calculated TC. (d) Quantitative decomposition of TC into contributions from convection and different types of interactions of three high TC polymers. Reproduced with permission from Huang et al., Mater. Today Phys. 44, 101438 (2024). Copyright 2024 Elsevier.219 

Close modal

Over the past few years, data-driven informatics algorithms have contributed to a revolution in the materials development paradigm, greatly facilitating the design of polymers and enhancing our understanding of their underlying mechanisms. In this Tutorial, we discuss the basic principles and implementation of ML for the exploitation of high thermal conductivity polymers, covering polymer datasets, polymer modeling and TC calculation, feature engineering, as well as informatics algorithms. We begin by describing the construction of interpretable regression models via physical or graph descriptors, and reveal the mapping between polymer microstructures and TCs. Based on the trained surrogate prediction model and the knowledge derived from the ML, we create a library containing 32 motifs and employ lightweight active learning algorithms to design sequence-ordered triblock polymers with high TCs. We not only focus on designing polymers with a single optimization target of high TC using GA, BO and simulated QA, but also consider the synthetic feasibility of polymers in multi-objective optimization trials that are realized by two state-of-the-art algorithms of U-NSGA-III and qNEHVI, respectively.

Although ML has facilitated the development of macromolecules with high TC, there is still a large gap in satisfying the various demands of realistic engineering applications, which also provides great opportunities for future investigations.

  1. Sufficient high-quality polymer data are a fundamental prerequisite for polymer informatics. Accessible polymer databases are rare compared to inorganic databases, and the recorded data are rather sparse with strict acquisition rules. The development and preservation of publicly accessible polymer databases that adhere to the FAIR principles and encompass a wide range of properties that necessitate collaboration and consensus among chemical researchers. In addition, providing APIs for automated batch downloading of data is favored by polymer informatics.

  2. Several open-source software141,157 enable automated modeling and TC calculations of polymers through classical MD simulation, which promotes the development of polymer informatics. It is crucial to ensure the reliability of the obtained TCs of polymers. Therefore, efforts are being made at the computational level to enhance the accuracy of force fields for MD or to develop first-principles computational methods to be efficiently and economically applicable to the simulation of macromolecular systems.

  3. Most current work on ML in polymer science focuses on the computational TC given the convenience and consistency of the research. Subsequently, with the success of applying ML to automated chemistry experiments,242–244 we look forward to the emergence of automated platforms that integrate polymer literature mining, polymer synthesis, TC measurement, data storage and analysis, as well as novel structure generation and TC evaluation. This will enable the expansion of reliable polymer experimental data and the identification of promising polymers with high TCs.

  4. State-of-the-art informatics algorithms are always sought after by the polymer community. Deep learning algorithms such as transfer learning,94 recurrent neural networks, and reinforcement learning121 have been successfully applied to the exploitation of polymers with high TC. On the one hand, more intelligent molecular generation algorithms are required for the design of polymers with high TC; on the other hand, efforts are made to explore the application of large language models and multitask learning to the design of multifunctional polymers with enhanced TC.

Last but not the least, with the rapid advancement of artificial intelligence and automated experiments, we foresee that ML will become a powerful driving force in accelerating the design of advanced polymers to meet the immense demand in various fields. The attractive characteristics of polymers for ML extend beyond TC to encompass other properties such as optical, electrical, and mechanical properties.

This work was supported by Shanghai Key Fundamental Research Grant (No. 21JC1403300), Shanghai Pujiang Program (No. 20PJ1407500), the National Natural Science Foundation of China (NSFC) (Nos. 92366203 and 52006134.), and the SJTU Global Strategic Partnership Fund (2022 SJTU-Warwick).

The authors have no conflicts to disclose.

Xiang Huang: Investigation (lead); Methodology (lead); Writing – original draft (lead). Shenghong Ju: Conceptualization (lead); Funding acquisition (lead); Investigation (equal); Resources (lead); Supervision (lead); Writing – review & editing (lead).

Implementation of polymer physical feature engineering and the creation of interpretable machine learning models are available at https://github.com/SJTU-MI/APFEforPI, and the polymer inverse design cases are available at https://github.com/SJTU-MI/Inverse_Design_of_Polymers. More details can be found in our previous publications124,219 or on reasonable request from the corresponding author.

1.
S.
Han
,
P.
Wen
,
H.
Wang
,
Y.
Zhou
,
Y.
Gu
,
L.
Zhang
,
Y.
Shao-Horn
,
X.
Lin
, and
M.
Chen
,
Nat. Mater.
22
, 1515 (
2023
).
2.
C.
Zhao
,
S.
Ju
,
Y.
Xue
,
T.
Ren
,
Y.
Ji
, and
X.
Chen
,
Carbon Neutrality
1
,
7
(
2022
).
3.
K. L.
Law
and
R.
Narayan
,
Nat. Rev. Mater.
7
,
104
(
2022
).
4.
Y.
Wu
,
L.
Ma
,
Z.
Song
,
S.
Dong
,
Z.
Guo
,
J.
Wang
, and
Y.
Zhou
,
Carbon Neutrality
2
,
1
(
2023
).
5.
B.-G.
Kim
,
E. J.
Jeong
,
J. W.
Chung
,
S.
Seo
,
B.
Koo
, and
J.
Kim
,
Nat. Mater.
12
,
659
(
2013
).
6.
N.
Li
,
Y.
Li
,
Z.
Cheng
,
Y.
Liu
,
Y.
Dai
,
S.
Kang
,
S.
Li
,
N.
Shan
,
S.
Wai
,
A.
Ziaja
,
Y.
Wang
,
J.
Strzalka
,
W.
Liu
,
C.
Zhang
,
X.
Gu
,
J. A.
Hubbell
,
B.
Tian
, and
S.
Wang
,
Science
381
,
686
(
2023
).
7.
A.
Suberi
,
M. K.
Grun
,
T.
Mao
,
B.
Israelow
,
M.
Reschke
,
J.
Grundler
,
L.
Akhtar
,
T.
Lee
,
K.
Shin
,
A. S.
Piotrowski-Daspit
,
R. J.
Homer
,
A.
Iwasaki
,
H.-W.
Suh
, and
W. M.
Saltzman
,
Sci. Transl. Med.
15
,
eabq0603
(
2023
).
8.
L.
Gao
,
L.
Wang
,
J.
Lin
, and
L.
Du
,
Engineering
27
,
31
(
2023
).
9.
Y.
Guo
,
Y.
Zhou
, and
Y.
Xu
,
Polymer
233
,
124168
(
2021
).
10.
X.
Xu
,
J.
Zhou
, and
J.
Chen
,
Adv. Funct. Mater.
30
,
1904704
(
2020
).
11.
X.
Yang
,
C.
Liang
,
T.
Ma
,
Y.
Guo
,
J.
Kong
,
J.
Gu
,
M.
Chen
, and
J.
Zhu
,
Adv. Compos. Hybrid Mater.
1
,
207
(
2018
).
12.
S. S.
Akhtar
,
Polymers
13
, 807 (
2021
).
13.
P.
Zhang
,
J.
Zeng
,
S.
Zhai
,
Y.
Xian
,
D.
Yang
, and
Q.
Li
,
Macromol. Mater. Eng.
302
,
1700068
(
2017
).
14.
B.
Liu
,
Y.
Zhou
,
L.
Dong
,
Q.
Lu
, and
X.
Xu
,
iScience
25
,
105451
(
2022
).
15.
A.
Facchetti
,
Chem. Mater.
23
,
733
(
2011
).
16.
B.
Han
,
B.
Liu
,
G.
Wang
,
Q.
Qiu
,
Z.
Wang
,
Y.
Xi
,
Y.
Cui
,
S.
Ma
,
B.
Xu
, and
H.-Y.
Hsu
,
Adv. Funct. Mater.
33
,
2300570
(
2023
).
17.
J.
Liao
,
D.
Zhang
, and
Z.
Li
,
Opto-Electron. Eng.
49
,
210388
(
2022
).
18.
J.
Zhou
,
R.
Li
, and
T.
Luo
,
npj Comput. Mater.
9
,
212
(
2023
).
19.
S.
Lin
,
X.
Huang
,
Z.
Bu
,
L.
Yu
,
T.
Dai
,
Z.
Lin
, and
L.
Wang
,
ECS J. Solid State Sci. Technol.
8
,
N93
(
2019
).
20.
Z.
Xu
,
B.
Zhu
,
X.
Liu
,
T.
Lan
,
Y.
Huang
,
Y.
Zhang
, and
D.
Wu
,
Chem. Eng. J.
477
,
147246
(
2023
).
21.
W.
Yigen
,
D.
Shuai
,
L.
Xiaojuan
,
W.
Liguo
,
S.
Hongwei
,
L.
Mengjiao
,
L.
Xin
,
Z.
Yang
,
Z.
Guolong
,
Z.
Jianyi
, and
W.
Dezhi
,
Soft Sci.
3
,
33
(
2023
).
22.
Y.
Liu
,
H.
Liang
,
L.
Yang
,
G.
Yang
,
H.
Yang
,
S.
Song
,
Z.
Mei
,
G.
Csányi
, and
B.
Cao
,
Adv. Mater.
35
,
2210873
(
2023
).
23.
X.
Wei
,
Z.
Wang
,
Z.
Tian
, and
T.
Luo
,
J. Heat Transfer
143
,
072101
(
2021
).
24.
X.
Wang
,
W.
Wang
,
C.
Yang
,
D.
Han
,
H.
Fan
, and
J.
Zhang
,
J. Appl. Phys.
130
,
170902
(
2021
).
25.
M. F.
Mina
,
A. K. M. M.
Alam
,
M. N. K.
Chowdhury
,
S. K.
Bhattacharia
, and
F. J.
Baltá Calleja
,
Polym. Plast. Technol. Eng.
44
,
523
(
2005
).
26.
S.
Shen
,
A.
Henry
,
J.
Tong
,
R.
Zheng
, and
G.
Chen
,
Nat. Nanotechnol.
5
,
251
(
2010
).
27.
Y.
Xu
,
D.
Kraemer
,
B.
Song
,
Z.
Jiang
,
J.
Zhou
,
J.
Loomis
,
J.
Wang
,
M.
Li
,
H.
Ghasemi
,
X.
Huang
,
X.
Li
, and
G.
Chen
,
Nat. Commun.
10
,
1771
(
2019
).
28.
R.
Shrestha
,
P.
Li
,
B.
Chatterjee
,
T.
Zheng
,
X.
Wu
,
Z.
Liu
,
T.
Luo
,
S.
Choi
,
K.
Hippalgaonkar
,
M. P.
de Boer
, and
S.
Shen
,
Nat. Commun.
9
,
1664
(
2018
).
29.
W.
Kong
,
Z.
Zhang
,
X.
Zhao
, and
L.
Ye
,
Polymer
290
,
126499
(
2024
).
30.
J.
Ma
,
Q.
Zhang
,
A.
Mayo
,
Z.
Ni
,
H.
Yi
,
Y.
Chen
,
R.
Mu
,
L. M.
Bellan
, and
D.
Li
,
Nanoscale
7
,
16899
(
2015
).
31.
C.
Lu
,
S. W.
Chiang
,
H.
Du
,
J.
Li
,
L.
Gan
,
X.
Zhang
,
X.
Chu
,
Y.
Yao
,
B.
Li
, and
F.
Kang
,
Polymer
115
,
52
(
2017
).
32.
V.
Singh
,
T. L.
Bougher
,
A.
Weathers
,
Y.
Cai
,
K.
Bi
,
M. T.
Pettes
,
S. A.
McMenamin
,
W.
Lv
,
D. P.
Resler
,
T. R.
Gattuso
,
D. H.
Altman
,
K. H.
Sandhage
,
L.
Shi
,
A.
Henry
, and
B. A.
Cola
,
Nat. Nanotechnol.
9
,
384
(
2014
).
33.
B.-Y.
Cao
,
Y.-W.
Li
,
J.
Kong
,
H.
Chen
,
Y.
Xu
,
K.-L.
Yung
, and
A.
Cai
,
Polymer
52
,
1711
(
2011
).
34.
G.-H.
Kim
,
D.
Lee
,
A.
Shanker
,
L.
Shao
,
M. S.
Kwon
,
D.
Gidley
,
J.
Kim
, and
K. P.
Pipe
,
Nat. Mater.
14
,
295
(
2015
).
35.
J.
Chen
,
Y.
Zhou
,
X.
Huang
,
C.
Yu
,
D.
Han
,
A.
Wang
,
Y.
Zhu
,
K.
Shi
,
Q.
Kang
,
P.
Li
,
P.
Jiang
,
X.
Qian
,
H.
Bao
,
S.
Li
,
G.
Wu
,
X.
Zhu
, and
Q.
Wang
,
Nature
615
,
62
(
2023
).
36.
Z.
Guo
,
D.
Lee
,
Y.
Liu
,
F.
Sun
,
A.
Sliwinski
,
H.
Gao
,
P. C.
Burns
,
L.
Huang
, and
T.
Luo
,
Phys. Chem. Chem. Phys.
16
,
7764
(
2014
).
37.
R.
Adhikari
and
G. H.
Michler
,
Prog. Polym. Sci.
29
,
949
(
2004
).
38.
S.
Lin
,
Z.
Cai
,
Y.
Wang
,
L.
Zhao
, and
C.
Zhai
,
npj Comput. Mater.
5
,
126
(
2019
).
39.
Y.
Jaluria
,
Appl. Therm. Eng.
111
,
1574
(
2017
).
40.
B.
Zhang
,
P.
Mao
,
Y.
Liang
,
Y.
He
,
W.
Liu
, and
Z.
Liu
,
ES Energy Environ.
5
,
37
(
2019
).
41.
N.
Mehra
,
L.
Mu
,
T.
Ji
,
X.
Yang
,
J.
Kong
,
J.
Gu
, and
J.
Zhu
,
Appl. Mater. Today
12
,
92
(
2018
).
42.
X.
Qian
,
J.
Zhou
, and
G.
Chen
,
Nat. Mater.
20
,
1188
(
2021
).
43.
K.
Utimula
,
T.
Ichibha
,
R.
Maezono
, and
K.
Hongo
,
Chem. Mater.
31
,
4649
(
2019
).
44.
P.
Cheng
,
N.
Shulumba
, and
A. J.
Minnich
,
Phys. Rev. B
100
,
094306
(
2019
).
45.
X.
Wei
,
Z.
Wang
,
Z.
Tian
, and
T.
Luo
,
J. Heat Transfer
143
, 072101 (
2021
).
46.
S.
Chmiela
,
H. E.
Sauceda
,
K.-R.
Müller
, and
A.
Tkatchenko
,
Nat. Commun.
9
,
3887
(
2018
).
47.
A.
Henry
and
G.
Chen
,
Phys. Rev. Lett.
101
,
235502
(
2008
).
48.
G.
Chen
,
Nat. Rev. Phys.
3
,
555
(
2021
).
49.
W.
Lv
,
R. M.
Winters
,
F.
DeAngelis
,
G.
Weinberg
, and
A.
Henry
,
J. Phys. Chem. A
121
,
5586
(
2017
).
50.
A.
Kiessling
,
D. N.
Simavilla
,
G. G.
Vogiatzis
, and
D. C.
Venerus
,
Polymer
228
,
123881
(
2021
).
51.
T.
Feng
,
J.
He
,
A.
Rai
,
D.
Hun
,
J.
Liu
, and
S. S.
Shrestha
,
Phys. Rev. Appl.
14
,
044023
(
2020
).
52.
J.
Liu
and
R.
Yang
,
Phys. Rev. B
86
,
104307
(
2012
).
53.
X.
Wei
and
T.
Luo
,
Phys. Chem. Chem. Phys.
21
,
15523
(
2019
).
54.
A.
Crnjar
,
C.
Melis
, and
L.
Colombo
,
Phys. Rev. Mater.
2
,
015603
(
2018
).
55.
J.
Zhao
,
J.-W.
Jiang
,
N.
Wei
,
Y.
Zhang
, and
T.
Rabczuk
,
J. Appl. Phys.
113
,
184304
(
2013
).
56.
A.
Chen
,
Y.
Wu
,
S.
Zhou
,
W.
Xu
,
W.
Jiang
,
Y.
Lv
,
W.
Guo
,
K.
Chi
,
Q.
Sun
,
T.
Fu
,
T.
Xie
,
Y.
Zhu
, and
X.
Liang
,
Mater. Adv.
1
,
1996
(
2020
).
57.
D.
Luo
,
C.
Huang
, and
Z.
Huang
,
J. Heat Transfer
140
, 031302 (
2017
).
58.
X.
Wei
and
T.
Luo
,
Phys. Chem. Chem. Phys.
24
,
10272
(
2022
).
59.
H.
Ma
and
Z.
Tian
,
Appl. Phys. Lett.
110
,
091903
(
2017
).
60.
X.
Wei
,
T.
Zhang
, and
T.
Luo
,
Phys. Chem. Chem. Phys.
18
,
32146
(
2016
).
61.
R.
Muthaiah
and
J.
Garg
,
J. Appl. Phys.
124
,
105102
(
2018
).
62.
R.
Ma
,
D.
Huang
,
T.
Zhang
, and
T.
Luo
,
Chem. Phys. Lett.
704
,
49
(
2018
).
63.
H.
Ma
and
Z.
Tian
,
Appl. Phys. Lett.
107
,
073111
(
2015
).
64.
A. B.
Robbins
and
A. J.
Minnich
,
Appl. Phys. Lett.
107
,
201908
(
2015
).
65.
X.
Xie
,
K.
Yang
,
D.
Li
,
T.-H.
Tsai
,
J.
Shin
,
P. V.
Braun
, and
D. G.
Cahill
,
Phys. Rev. B
95
,
035406
(
2017
).
66.
E.
Lussetti
,
T.
Terao
, and
F.
Müller-Plathe
,
J. Phys. Chem. B
111
,
11516
(
2007
).
67.
H.
Subramanyan
,
W.
Zhang
,
J.
He
,
K.
Kim
,
X.
Li
, and
J.
Liu
,
J. Appl. Phys.
125
,
095104
(
2019
).
68.
H.
Ma
and
Z.
Tian
,
J. Mater. Res.
34
,
126
(
2019
).
69.
T.
Zhang
,
X.
Wu
, and
T.
Luo
,
J. Phys. Chem. C
118
,
21148
(
2014
).
70.
T.
Zhang
and
T.
Luo
,
J. Phys. Chem. B
120
,
803
(
2016
).
71.
T.
Luo
,
K.
Esfarjani
,
J.
Shiomi
,
A.
Henry
, and
G.
Chen
,
J. Appl. Phys.
109
,
074321
(
2011
).
72.
G.
Kikugawa
,
T. G.
Desai
,
P.
Keblinski
, and
T.
Ohara
,
J. Appl. Phys.
114
,
034302
(
2013
).
73.
X.
Xiong
,
M.
Yang
,
C.
Liu
,
X.
Li
, and
D.
Tang
,
J. Appl. Phys.
122
,
035104
(
2017
).
74.
X.
Wan
,
B.
Demir
,
M.
An
,
T. R.
Walsh
, and
N.
Yang
,
Int. J. Heat Mass Transfer
180
,
121821
(
2021
).
75.
X.
Liu
and
Z.
Rao
,
Comput. Mater. Sci.
172
,
109298
(
2020
).
76.
M. K.
Maurya
,
J.
Wu
,
M. K.
Singh
, and
D.
Mukherji
,
ACS Macro Lett.
11
,
925
(
2022
).
77.
Z.
Zhang
and
B.
Cao
,
Sci. China Phys. Mech. Astron.
65
,
117003
(
2022
).
78.
L.
Zhang
,
M.
Ruesch
,
X.
Zhang
,
Z.
Bai
, and
L.
Liu
,
RSC Adv.
5
,
87981
(
2015
).
79.
H.
Zheng
,
K.
Wu
,
Y.
Zhan
,
K.
Wang
, and
J.
Shi
,
J. Polym. Sci.
61
,
1622
(
2023
).
80.
N.
Mehra
,
L.
Mu
, and
J.
Zhu
,
Compos. Sci. Technol.
148
,
97
(
2017
).
81.
W.
Li
,
J.
Ma
,
S.
Wu
,
J.
Zhang
, and
J.
Cheng
,
Polym. Test.
101
,
107275
(
2021
).
82.
H.
Zheng
,
K.
Wu
,
W.
Chen
,
B.
Nan
,
Z.
Qu
, and
M.
Lu
,
Macromol. Chem. Phys.
222
,
2000418
(
2021
).
83.
W.
Shi
,
Z.
Shuai
, and
D.
Wang
,
Adv. Funct. Mater.
27
,
1702847
(
2017
).
84.
M.
Sangkhawasi
,
T.
Remsungnen
,
A. S.
Vangnai
,
R. P.
Poo-arporn
, and
T.
Rungrotmongkol
,
Polymers
14
, 1161 (
2022
).
85.
X.
Huang
,
S.
Ma
,
C. Y.
Zhao
,
H.
Wang
, and
S.
Ju
,
npj Comput. Mater.
9
,
191
(
2023
).
86.
E. N.
Muratov
,
J.
Bajorath
,
R. P.
Sheridan
,
I. V.
Tetko
,
D.
Filimonov
,
V.
Poroikov
,
T. I.
Oprea
,
I. I.
Baskin
,
A.
Varnek
,
A.
Roitberg
,
O.
Isayev
,
S.
Curtalolo
,
D.
Fourches
,
Y.
Cohen
,
A.
Aspuru-Guzik
,
D. A.
Winkler
,
D.
Agrafiotis
,
A.
Cherkasov
, and
A.
Tropsha
,
Chem. Soc. Rev.
49
,
3525
(
2020
).
87.
P.
Zhang
,
T.
Zhang
,
J.
Zhang
,
H.
Liu
,
C.
Chicaiza-Ortiz
,
J. T. E.
Lee
,
Y.
He
,
Y.
Dai
, and
Y. W.
Tong
,
Carbon Neutrality
3
,
2
(
2024
).
88.
H. R.
Allcock
,
Science
255
,
1106
(
1992
).
89.
A.
Agrawal
and
A.
Choudhary
,
APL Mater.
4
,
053208
(
2016
).
90.
S.
Hellberg
,
M.
Sjoestroem
,
B.
Skagerberg
, and
S.
Wold
,
J. Med. Chem.
30
,
1126
(
1987
).
91.
T. I.
Oprea
and
J.
Gottfries
,
J. Combinatorial Chem.
3
,
157
(
2001
).
92.
L.
Chen
,
G.
Pilania
,
R.
Batra
,
T. D.
Huan
,
C.
Kim
,
C.
Kuenneth
, and
R.
Ramprasad
,
Mater. Sci. Eng.: R: Rep.
144
,
100595
(
2021
).
93.
S.
Kim
,
P. A.
Thiessen
,
E. E.
Bolton
,
J.
Chen
,
G.
Fu
,
A.
Gindulyte
,
L.
Han
,
J.
He
,
S.
He
,
B. A.
Shoemaker
,
J.
Wang
,
B.
Yu
,
J.
Zhang
, and
S. H.
Bryant
,
Nucleic Acids Res.
44
,
D1202
(
2016
).
94.
S.
Wu
,
Y.
Kondo
,
M.
Kakimoto
,
B.
Yang
,
H.
Yamada
,
I.
Kuwajima
,
G.
Lambard
,
K.
Hongo
,
Y.
Xu
,
J.
Shiomi
,
C.
Schick
,
J.
Morikawa
, and
R.
Yoshida
,
npj Comput. Mater.
5
,
66
(
2019
).
95.
C. M.
Dobson
,
Nature
432
,
824
(
2004
).
96.
S.
Kim
,
C. M.
Schroeder
, and
N. E.
Jackson
,
ACS Polym. Au
3
,
318
(
2023
).
97.
R.
Ma
and
T.
Luo
,
J. Chem. Inform. Model.
60
,
4684
(
2020
).
98.
W.
Sha
,
Y.
Li
,
S.
Tang
,
J.
Tian
,
Y.
Zhao
,
Y.
Guo
,
W.
Zhang
,
X.
Zhang
,
S.
Lu
,
Y.-C.
Cao
, and
S.
Cheng
,
InfoMat
3
,
353
(
2021
).
99.
K.
Hatakeyama-Sato
,
Polym. J.
55
,
117
(
2023
).
100.
K.
Sattari
,
Y.
Xie
, and
J.
Lin
,
Soft Matter
17
,
7607
(
2021
).
101.
Y.
Shang
,
Z.
Xiong
,
K.
An
,
J. A.
Hauch
,
C. J.
Brabec
, and
N.
Li
,
Mater. Genome Eng. Adv.
2
,
e28
(
2024
).
102.
J. P.
Lightstone
,
L.
Chen
,
C.
Kim
,
R.
Batra
, and
R.
Ramprasad
,
J. Appl. Phys.
127
,
215105
(
2020
).
103.
A.
Mishra
,
P.
Rajak
,
A.
Irie
,
S.
Fukushima
,
R. K.
Kalia
,
A.
Nakano
,
K.
Nomura
,
F.
Shimojo
, and
P.
Vashishta
,
Appl. Phys. Lett.
123
,
121901
(
2023
).
104.
M. A. F.
Afzal
,
C.
Cheng
, and
J.
Hachmann
,
J. Chem. Phys.
148
,
241712
(
2018
).
105.
Y.
Wang
,
T.
Xie
,
A.
France-Lanord
,
A.
Berkley
,
J. A.
Johnson
,
Y.
Shao-Horn
, and
J. C.
Grossman
,
Chem. Mater.
32
,
4144
(
2020
).
106.
A.
Mannodi-Kanakkithodi
,
G.
Pilania
,
T. D.
Huan
,
T.
Lookman
, and
R.
Ramprasad
,
Sci. Rep.
6
,
20952
(
2016
).
107.
L.
Chen
,
C.
Kim
,
R.
Batra
,
J. P.
Lightstone
,
C.
Wu
,
Z.
Li
,
A. A.
Deshmukh
,
Y.
Wang
,
H. D.
Tran
,
P.
Vashishta
,
G. A.
Sotzing
,
Y.
Cao
, and
R.
Ramprasad
,
npj Comput. Mater.
6
,
61
(
2020
).
108.
B. K.
Wheatle
,
E. F.
Fuentes
,
N. A.
Lynd
, and
V.
Ganesan
,
Macromolecules
53
,
9449
(
2020
).
109.
R.
Bhowmik
,
S.
Sihn
,
R.
Pachter
, and
J. P.
Vernon
,
Polymer
220
,
123558
(
2021
).
110.
L.
Tao
,
V.
Varshney
, and
Y.
Li
,
J. Chem. Inform. Model.
61
,
5395
(
2021
).
111.
A.
Alesadi
,
Z.
Cao
,
Z.
Li
,
S.
Zhang
,
H.
Zhao
,
X.
Gu
, and
W.
Xia
,
Cell Rep. Phys. Sci.
3
,
100911
(
2022
).
112.
K. K.
Bejagam
,
J.
Lalonde
,
C. N.
Iverson
,
B. L.
Marrone
, and
G.
Pilania
,
J. Phys. Chem. B
126
,
934
(
2022
).
113.
T.
Yue
,
J.
He
,
L.
Tao
, and
Y.
Li
,
J. Chem. Theory Comput.
19
, 4641 (
2023
).
114.
L.
Tao
,
J.
He
,
N. E.
Munyaneza
,
V.
Varshney
,
W.
Chen
,
G.
Liu
, and
Y.
Li
,
Chem. Eng. J.
465
,
142949
(
2023
).
115.
L.
Tao
,
J.
He
,
T.
Arbaugh
,
J. R.
McCutcheon
, and
Y.
Li
,
J. Membr. Sci.
665
,
121131
(
2023
).
116.
J.
Yang
,
L.
Tao
,
J.
He
,
J. R.
McCutcheon
, and
Y.
Li
,
Sci. Adv.
8
,
eabn9545
(
2022
).
117.
S. M.
McDonald
,
E. K.
Augustine
,
Q.
Lanners
,
C.
Rudin
,
L.
Catherine Brinson
, and
M. L.
Becker
,
Nat. Commun.
14
,
4838
(
2023
).
118.
A. J.
Gormley
and
M. A.
Webb
,
Nat. Rev. Mater.
6
,
642
644
(
2021
).
119.
M.-X.
Zhu
,
H.-G.
Song
,
Q.-C.
Yu
,
J.-M.
Chen
, and
H.-Y.
Zhang
,
Int. J. Heat Mass Transfer
162
,
120381
(
2020
).
120.
R.
Ma
,
H.
Zhang
,
J.
Xu
,
L.
Sun
,
Y.
Hayashi
,
R.
Yoshida
,
J.
Shiomi
,
J.
Wang
, and
T.
Luo
,
Mater. Today Phys.
28
,
100850
(
2022
).
121.
R.
Ma
,
H.
Zhang
, and
T.
Luo
,
ACS Appl. Mater. Interfaces
14
,
15587
(
2022
).
122.
A.
Nagoya
,
N.
Kikkawa
,
N.
Ohba
,
T.
Baba
,
S.
Kajita
,
K.
Yanai
, and
T.
Takeno
,
Macromolecules
55
,
3384
(
2022
).
123.
T.
Zhou
,
Z.
Wu
,
H. K.
Chilukoti
, and
F.
Müller-Plathe
,
J. Chem. Theory Comput.
17
,
3772
(
2021
).
124.
X.
Huang
,
S.
Ma
,
Y.
Wu
,
C.
Wan
,
C. Y.
Zhao
,
H.
Wang
, and
S.
Ju
,
J. Mater. Chem. A
11
,
20539
(
2023
).
125.
M. D.
Wilkinson
,
M.
Dumontier
,
I. J.
Aalbersberg
,
G.
Appleton
,
M.
Axton
,
A.
Baak
,
N.
Blomberg
,
J.-W.
Boiten
,
L. B.
da Silva Santos
,
P. E.
Bourne
,
J.
Bouwman
,
A. J.
Brookes
,
T.
Clark
,
M.
Crosas
,
I.
Dillo
,
O.
Dumon
,
S.
Edmunds
,
C. T.
Evelo
,
R.
Finkers
,
A.
Gonzalez-Beltran
,
A. J. G.
Gray
,
P.
Groth
,
C.
Goble
,
J. S.
Grethe
,
J.
Heringa
,
P. A. C.
‘t Hoen
,
R.
Hooft
,
T.
Kuhn
,
R.
Kok
,
J.
Kok
,
S. J.
Lusher
,
M. E.
Martone
,
A.
Mons
,
A. L.
Packer
,
B.
Persson
,
P.
Rocca-Serra
,
M.
Roos
,
R.
van Schaik
,
S.-A.
Sansone
,
E.
Schultes
,
T.
Sengstag
,
T.
Slater
,
G.
Strawn
,
M. A.
Swertz
,
M.
Thompson
,
J.
van der Lei
,
E.
van Mulligen
,
J.
Velterop
,
A.
Waagmeester
,
P.
Wittenburg
,
K.
Wolstencroft
,
J.
Zhao
, and
B.
Mons
,
Sci. Data
3
,
160018
(
2016
).
126.
B.
Mons
,
C.
Neylon
,
J.
Velterop
,
M.
Dumontier
,
L. O. B.
da Silva Santos
, and
M. D.
Wilkinson
,
Inform. Services Use
37
,
49
(
2017
).
127.
Y.
Rao
,
Y.
Lu
,
L.
Zhang
,
S.
Ju
,
N.
Yu
,
A.
Zhang
,
L.
Chen
, and
H.
Wang
,
J. Mater. Inform.
2
,
17
(
2022
).
128.
H.
Wang
,
X.-D.
Xiang
, and
L.
Zhang
,
Engineering
6
,
609
(
2020
).
129.
A.
Jain
,
S. P.
Ong
,
G.
Hautier
,
W.
Chen
,
W. D.
Richards
,
S.
Dacek
,
S.
Cholia
,
D.
Gunter
,
D.
Skinner
,
G.
Ceder
, and
K. A.
Persson
,
APL Mater.
1
,
011002
(
2013
).
130.
See https://atomly.net for “Atomly.”
131.
A.
Belsky
,
M.
Hellenbrandt
,
V. L.
Karen
, and
P.
Luksch
,
Acta Crystallogr. Sect. B Struct. Sci.
58
,
364
369
(
2002
).
132.
S.
Curtarolo
,
W.
Setyawan
,
G. L. W.
Hart
,
M.
Jahnatek
,
R. V.
Chepulskii
,
R. H.
Taylor
,
S.
Wang
,
J.
Xue
,
K.
Yang
,
O.
Levy
,
M. J.
Mehl
,
H. T.
Stokes
,
D. O.
Demchenko
, and
D.
Morgan
,
Comput. Mater. Sci.
58
,
218
226
(
2012
).
133.
L.
Gao
,
L.
Wang
,
J.
Lin
, and
L.
Du
,
Engineering
27
,
31
(
2023
).
134.
P.
Hodge
,
K.-H.
Hellwich
,
R. C.
Hiorns
,
R. G.
Jones
,
J.
Kahovec
,
C. K.
Luscombe
,
M. D.
Purbrick
, and
E. S.
Wilks
,
Pure and Applied Chemistry
92
,
797
(
2020
).
135.
C.
Lin
,
P.-H.
Wang
,
Y.
Hsiao
,
Y.-T.
Chan
,
A. C.
Engler
,
J. W.
Pitera
,
D. P.
Sanders
,
J.
Cheng
, and
Y. J.
Tseng
,
ACS Appl. Polym. Mater.
2
,
3107
(
2020
).
136.
S.
Otsuka
,
I.
Kuwajima
,
J.
Hosoya
,
Y.
Xu
, and
M.
Yamazaki
,
PoLyInfo: Polymer Database for Polymeric Materials Design
(Conference,
2011
), p.
22
.
137.
Khazana
, see https://khazana.gatech.edu/ for “Materials data and tools from the Ramprasad Group.”
138.
B.
Ellis
and
R.
Smith
,
Polymers: A Property Database
(
CRC Press
,
2008
).
139.
See https://pppdb.uchicago.edu/ for “Polymer Property Predictor and Database.”
140.
See https://www.campusplastics.com/ for “CAMPUS—A Material Information System for the Plastics Industry.”
141.
Y.
Hayashi
,
J.
Shiomi
,
J.
Morikawa
, and
R.
Yoshida
,
npj Comput. Mater.
8
,
222
(
2022
).
142.
C.
Kim
,
A.
Chandrasekaran
,
T. D.
Huan
,
D.
Das
, and
R.
Ramprasad
,
J. Phys. Chem. C
122
,
17575
(
2018
).
143.
See https://www.emolecules.com for “eMolecules.”
144.
E. W. C.
Spotte-Smith
,
O. A.
Cohen
,
S. M.
Blau
,
J. M.
Munro
,
R.
Yang
,
R. D.
Guha
,
H. D.
Patel
,
S.
Vijay
,
P.
Huck
,
R.
Kingsbury
,
M. K.
Horton
, and
K. A.
Persson
,
Digital Discov.
2
, 1862 (
2023
).
145.
M.
Meuwly
,
Chem. Rev.
121
,
10218
(
2021
).
146.
P.
Shetty
and
R.
Ramprasad
,
J. Chem. Inform. Model.
61
,
5377
(
2021
).
147.
M.
Ohno
,
Y.
Hayashi
,
Q.
Zhang
,
Y.
Kaneko
, and
R.
Yoshida
,
J. Chem. Inform. Model.
63
,
5539
(
2023
).
148.
G.
Landrum
, see https://www.rdkit.org/ for “RDKit software (2020).”
149.
R. L. C.
Akkermans
,
N. A.
Spenley
, and
S. H.
Robertson
,
Mol. Simul.
39
,
1153
(
2013
).
150.
S.
Jo
,
T.
Kim
,
V. G.
Iyer
, and
W.
Im
,
J. Comput. Chem.
29
,
1859
(
2008
).
151.
L.
Martínez
,
R.
Andrade
,
E. G.
Birgin
, and
J. M.
Martínez
,
J. Comput. Chem.
30
,
2157
(
2009
).
152.
A. I.
Jewett
,
D.
Stelter
,
J.
Lambert
,
S. M.
Saladi
,
O. M.
Roscioni
,
M.
Ricci
,
L.
Autin
,
M.
Maritan
,
S. M.
Bashusqeh
,
T.
Keyes
,
R. T.
Dame
,
J.-E.
Shea
,
G. J.
Jensen
, and
D. S.
Goodsell
,
J. Mol. Biol.
433
,
166841
(
2021
).
153.
M. E.
Fortunato
and
C. M.
Colina
,
SoftwareX
6
,
7
(
2017
).
154.
H.
Sahu
,
K.-H.
Shen
,
J. H.
Montoya
,
H.
Tran
, and
R.
Ramprasad
,
J. Chem. Theory Comput.
18
,
2737
(
2022
).
155.
P. J.
in ‘t Veld
and
G. C.
Rutledge
,
Macromolecules
36
,
7358
(
2003
).
156.
A. P.
Thompson
,
H. M.
Aktulga
,
R.
Berger
,
D. S.
Bolintineanu
,
W. M.
Brown
,
P. S.
Crozier
,
P. J.
in ‘t Veld
,
A.
Kohlmeyer
,
S. G.
Moore
,
T. D.
Nguyen
,
R.
Shan
,
M. J.
Stevens
,
J.
Tranchida
,
C.
Trott
, and
S. J.
Plimpton
,
Comput. Phys. Commun.
271
,
108171
(
2022
).
157.
See https://polymer-molecular-dynamics.netlify.app/ for “Polymer Molecular Dynamics (PMD).”
158.
D.
Weininger
,
J. Chem. Inform. Comput. Sci.
28
,
31
(
1988
).
159.
M. E.
Fisher
,
J. Chem. Phys.
44
,
616
(
1966
).
160.
D.
Chowdhury
and
B. K.
Chakrabarti
,
J. Phys. A Math. General
18
,
L377
(
1985
).
161.
D.
Surblys
,
H.
Matsubara
,
G.
Kikugawa
, and
T.
Ohara
,
Phys. Rev. E
99
,
051301
(
2019
).
162.
P.
Boone
,
H.
Babaei
, and
C. E.
Wilmer
,
J. Chem. Theory Comput.
15
,
5579
(
2019
).
163.
J.
Wang
,
R. M.
Wolf
,
J. W.
Caldwell
,
P. A.
Kollman
, and
D. A.
Case
,
J. Comput. Chem.
25
,
1157
(
2004
).
164.
X.
He
,
V. H.
Man
,
W.
Yang
,
T. S.
Lee
, and
J.
Wang
,
J. Chem. Phys.
153
, 114502 (
2020
).
165.
X.
He
,
B.
Walker
,
V. H.
Man
,
P.
Ren
, and
J.
Wang
,
Curr. Opin. Struct. Biol.
72
,
187
(
2022
).
166.
G. S.
Larsen
,
P.
Lin
,
K. E.
Hart
, and
C. M.
Colina
,
Macromolecules
44
,
6944
(
2011
).
167.
X.
Yang
,
Q.
Liu
,
X.
Zhang
,
C.
Ji
, and
B.
Cao
,
Fluid Phase Equilib.
562
,
113566
(
2022
).
168.
P. K.
Schelling
,
S. R.
Phillpot
, and
P.
Keblinski
,
Phys. Rev. B
65
,
144306
(
2002
).
169.
Z.
Fan
,
H.
Dong
,
A.
Harju
, and
T.
Ala-Nissila
,
Phys. Rev. B
99
,
064308
(
2019
).
170.
F.
DeAngelis
,
M. G.
Muraleedharan
,
J.
Moon
,
H. R.
Seyf
,
A. J.
Minnich
,
A. J. H.
McGaughey
, and
A.
Henry
,
Nanoscale Microscale Thermophys. Eng.
23
,
81
(
2019
).
171.
J. E.
Turney
,
E. S.
Landry
,
A. J. H.
McGaughey
, and
C. H.
Amon
,
Phys. Rev. B
79
,
064301
(
2009
).
172.
F.
Müller-Plathe
,
J. Chem. Phys.
106
,
6082
(
1997
).
173.
D.
Torii
,
T.
Nakano
, and
T.
Ohara
,
J. Chem. Phys.
128
,
044504
(
2008
).
174.
Y.
Zhao
,
R. J.
Mulder
,
S.
Houshyar
, and
T. C.
Le
,
Polym. Chem.
14
,
3325
(
2023
).
175.
S.
Stuart
,
J.
Watchorn
, and
F. X.
Gu
,
npj Comput. Mater.
9
,
102
(
2023
).
176.
D.
Weininger
,
A.
Weininger
, and
J. L.
Weininger
,
J. Chem. Inform. Comput. Sci.
29
,
97
(
1989
).
177.
M.
Krenn
,
F.
Häse
,
A.
Nigam
,
P.
Friederich
, and
A.
Aspuru-Guzik
,
Mach. Learn. Sci. Technol.
1
,
045024
(
2020
).
178.
A.
Yüksel
,
E.
Ulusoy
,
A.
Ünlü
, and
T.
Doğan
,
Mach. Learn. Sci. Technol.
4
,
025035
(
2023
).
179.
L.
Turcani
,
E.
Berardo
, and
K. E.
Jelfs
,
J. Comput. Chem.
39
,
1931
(
2018
).
180.
M. M.
Cencer
,
J. S.
Moore
, and
R. S.
Assary
,
Polym. Int.
71
,
537
(
2022
).
181.
R.-R.
Griffiths
and
J. M.
Hernández-Lobato
,
Chem. Sci.
11
,
577
(
2020
).
182.
C.
Kuenneth
and
R.
Ramprasad
,
Nat. Commun.
14
,
4099
(
2023
).
183.
C.
Xu
,
Y.
Wang
, and
A.
Barati Farimani
,
npj Comput. Mater.
9
,
64
(
2023
).
184.
H.
Qiu
,
L.
Liu
,
X.
Qiu
,
X.
Dai
,
X.
Ji
, and
Z.-Y.
Sun
,
Chem. Sci.
15
,
534
(
2024
).
185.
J. L.
Durant
,
B. A.
Leland
,
D. R.
Henry
, and
J. G.
Nourse
,
J. Chem. Inform. Comput. Sci.
42
,
1273
(
2002
).
186.
E. L.
Willighagen
,
J. W.
Mayfield
,
J.
Alvarsson
,
A.
Berg
,
L.
Carlsson
,
N.
Jeliazkova
,
S.
Kuhn
,
T.
Pluskal
,
M.
Rojas-Chertó
,
O.
Spjuth
,
G.
Torrance
,
C. T.
Evelo
,
R.
Guha
, and
C.
Steinbeck
,
J. Cheminform.
9
,
33
(
2017
).
187.
N. M.
O’Boyle
,
M.
Banck
,
C. A.
James
,
C.
Morley
,
T.
Vandermeersch
, and
G. R.
Hutchison
,
J. Cheminform.
3
,
33
(
2011
).
188.
X.
Zhang
,
G.
Wei
,
Y.
Sheng
,
W.
Bai
,
J.
Yang
,
W.
Zhang
, and
C.
Ye
,
ACS Appl. Mater. Interfaces
15
,
21537
(
2023
).
189.
A.
Capecchi
,
D.
Probst
, and
J.-L.
Reymond
,
J. Cheminform.
12
,
43
(
2020
).
190.
H. L.
Morgan
,
J. Chem. Docum.
5
,
107
(
1965
).
191.
D.
Rogers
and
M.
Hahn
,
J. Chem. Inform. Model.
50
,
742
(
2010
).
192.
S.
Jaeger
,
S.
Fulle
, and
S.
Turk
,
J. Chem. Inform. Model.
58
,
27
(
2018
).
193.
R.
Ma
,
Z.
Liu
,
Q.
Zhang
,
Z.
Liu
, and
T.
Luo
,
J. Chem. Inform. Model.
59
,
3110
(
2019
).
194.
L.
Tao
,
J.
Byrnes
,
V.
Varshney
, and
Y.
Li
,
iScience
25
,
104585
(
2022
).
195.
M.
Karelson
,
V. S.
Lobanov
, and
A. R.
Katritzky
,
Chem. Rev.
96
,
1027
(
1996
).
196.
A.
Mauri
,
V.
Consonni
,
M.
Pavan
, and
R.
Todeschini
,
Match
56
,
237
(
2006
).
197.
M.
Haghighatlari
,
G.
Vishwakarma
,
D.
Altarawy
,
R.
Subramanian
,
B. U.
Kota
,
A.
Sonpal
,
S.
Setlur
, and
J.
Hachmann
,
WIREs Comput. Mol. Sci.
10
,
e1458
(
2020
).
198.
H.
Moriwaki
,
Y.-S.
Tian
,
N.
Kawashita
, and
T.
Takagi
,
J. Cheminform.
10
,
4
(
2018
).
199.
J. S.
Gábor
,
L. R.
Maria
, and
K. B.
Nail
,
Ann. Statist.
35
,
2769
(
2007
).
200.
D.
Albanese
,
M.
Filosi
,
R.
Visintainer
,
S.
Riccadonna
,
G.
Jurman
, and
C.
Furlanello
,
Bioinformatics
29
,
407
(
2013
).
201.
I.
Guyon
,
J.
Weston
,
S.
Barnhill
, and
V.
Vapnik
,
Mach. Learn.
46
,
389
(
2002
).
202.
E.
Bisong
, in
Building Machine Learning and Deep Learning Models on Google Cloud Platform A Comprehensive Guide for Beginners
, edited by
E.
Bisong
(
Apress
,
Berkeley
,
CA
,
2019
), p.
287
.
203.
S. M.
Lundberg
and
S.-I.
Lee
,
Adv. Neural Inform. Process. Syst.
30
,
4765
4774
(
2017
).
204.
C.
Loftis
,
K.
Yuan
,
Y.
Zhao
,
M.
Hu
, and
J.
Hu
,
J. Phys. Chem. A
125
,
435
(
2021
).
205.
T.
Stephens
, see https://gplearn.readthedocs.io/en/stable/ for gplearn software.
206.
Y.
Wang
,
N.
Wagner
, and
J. M.
Rondinelli
,
MRS Commun.
9
,
793
(
2019
).
207.
T. A. R.
Purcell
,
M.
Scheffler
,
L. M.
Ghiringhelli
, and
C.
Carbogno
,
npj Comput. Mater.
9
,
112
(
2023
).
208.
M.
Cranmer
, arXiv:2305.01582 (2023).
209.
R.
Ouyang
,
E.
Ahmetcik
,
C.
Carbogno
,
M.
Scheffler
, and
L. M.
Ghiringhelli
,
J. Phys.: Mater.
2
,
024002
(
2019
).
210.
See https://gplearn.readthedocs.io/en/stable/examples.html for “Examples of Symbolic Regressor in gplearn.”
213.
F.
Pedregosa
,
G.
Varoquaux
,
A.
Gramfort
,
V.
Michel
,
B.
Thirion
,
O.
Grisel
,
M.
Blondel
,
P.
Prettenhofer
,
R.
Weiss
, and
V.
Dubourg
,
J. Mach. Learn. Res.
12
,
2825
(
2011
).
214.
F.
Nogueira
, see https://github.com/fmfn/BayesianOptimization for “Bayesian optimization tool (2014).”
215.
A.
Tiihonen
,
S. J.
Cox-Vazquez
,
Q.
Liang
,
M.
Ragab
,
Z.
Ren
,
N. T. P.
Hartono
,
Z.
Liu
,
S.
Sun
,
C.
Zhou
,
N. C.
Incandela
,
J.
Limwongyut
,
A. S.
Moreland
,
S.
Jayavelu
,
G. C.
Bazan
, and
T.
Buonassisi
,
J. Am. Chem. Soc.
143
,
18917
(
2021
).
216.
F. R.
Burden
,
Quantitat. Struct. Activity Relationships
16
,
309
(
1997
).
217.
P.
Mohamad Zaim Awang
and
K. K.
Krishna Prakash
,
Sparklinglight Trans. Artif. Intelligence Quantum Computing (STAIQC)
01
,
36
(
2021
).
218.
M.
Abadi
,
A.
Agarwal
,
P.
Barham
,
E.
Brevdo
,
Z.
Chen
,
C.
Citro
,
G. S.
Corrado
,
A.
Davis
,
J.
Dean
, and
M.
Devin
, arXiv:1603.04467 (2016).
219.
X.
Huang
,
C. Y.
Zhao
,
H.
Wang
, and
S.
Ju
,
Mater. Today Phys.
44
,
101438
(
2024
).
220.
B.
Weng
,
Z.
Song
,
R.
Zhu
,
Q.
Yan
,
Q.
Sun
,
C. G.
Grice
,
Y.
Yan
, and
W.-J.
Yin
,
Nat. Commun.
11
,
3513
(
2020
).
221.
K. Y. L.
Andre
,
V.-G.
Eleonore
,
L.
Yee-Fun
, and
H.
Kedar
,
J. Mater. Inform.
3
,
11
(
2023
).
222.
G.
Agarwal
,
H. A.
Doan
,
L. A.
Robertson
,
L.
Zhang
, and
R. S.
Assary
,
Chem. Mater.
33
,
8133
(
2021
).
223.
P.
Ertl
and
A.
Schuffenhauer
,
J. Cheminform.
1
,
8
(
2009
).
224.
M. T.
Bhoskar
,
M. O. K.
Kulkarni
,
M. N. K.
Kulkarni
,
M. S. L.
Patekar
,
G. M.
Kakandikar
, and
V. M.
Nandedkar
,
Mater. Today Proc.
2
,
2624
(
2015
).
225.
X.
Huang
,
S.
Ma
,
H.
Wang
,
S.
Lin
,
C. Y.
Zhao
,
H.
Wang
, and
S.
Ju
,
Int. J. Heat Mass Transfer
197
,
123332
(
2022
).
226.
X.
Zhang
,
L.
Feng
,
X.
Li
,
Y.
Xu
,
L.
Wang
, and
H.
Chen
,
Carbon Neutrality
2
,
16
(
2023
).
227.
J.
Blank
and
K.
Deb
,
IEEE Access
8
,
89497
(
2020
).
228.
K.
Deb
,
K.
Sindhya
, and
T.
Okabe
, in
Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation
(
Association for Computing Machinery
,
London
,
2007
), p.
1187
.
229.
T.
Ueno
,
T. D.
Rhone
,
Z.
Hou
,
T.
Mizoguchi
, and
K.
Tsuda
,
Mater. Discov.
4
,
18
(
2016
).
230.
B.
Shi
,
Y.
Zhou
,
D.
Fang
,
Y.
Tian
,
X.
Ding
,
J.
Sun
,
T.
Lookman
, and
D.
Xue
,
J. Mater. Inform.
2
,
8
(
2022
).
231.
S.
Ju
,
S.
Shimizu
, and
J.
Shiomi
,
J. Appl. Phys.
128
,
161102
(
2020
).
232.
A.
Agnihotri
and
N.
Batra
,
Distill
5
,
e26
(
2020
).
233.
M.
Balandat
,
B.
Karrer
,
D.
Jiang
,
S.
Daulton
,
B.
Letham
,
A. G.
Wilson
, and
E.
Bakshy
,
Adv. Neural Inform. Process. Syst.
33
,
21524
(
2020
).
234.
P.
Ray
,
B. K.
Chakrabarti
, and
A.
Chakrabarti
,
Phys. Rev. B
39
,
11828
(
1989
).
235.
T.
Kadowaki
and
H.
Nishimori
,
Phys. Rev. E
58
,
5355
(
1998
).
236.
Z.
Mao
,
Y.
Matsuda
,
R.
Tamura
, and
K.
Tsuda
,
Digital Discov.
2
,
1098
(
2023
).
237.
B. A.
Wilson
,
Z. A.
Kudyshev
,
A. V.
Kildishev
,
S.
Kais
,
V. M.
Shalaev
, and
A.
Boltasseva
,
Appl. Phys. Rev.
8
,
041418
(
2021
).
238.
K.
Kitai
,
J.
Guo
,
S.
Ju
,
S.
Tanaka
,
K.
Tsuda
,
J.
Shiomi
, and
R.
Tamura
,
Phys. Rev. Res.
2
,
013319
(
2020
).
239.
240.
K.
Deb
and
H.
Jain
,
IEEE Trans. Evolut. Comput.
18
,
577
(
2014
).
241.
H.
Jain
and
K.
Deb
,
IEEE Trans. Evolut. Comput.
18
,
602
(
2014
).
242.
M.
Seifrid
,
R.
Pollice
,
A.
Aguilar-Granda
,
Z.
Morgan Chan
,
K.
Hotta
,
C. T.
Ser
,
J.
Vestfrid
,
T. C.
Wu
, and
A.
Aspuru-Guzik
,
Acc. Chem. Res.
55
,
2454
(
2022
).
243.
A. A.
Volk
,
R. W.
Epps
,
D. T.
Yonemoto
,
B. S.
Masters
,
F. N.
Castellano
,
K. G.
Reyes
, and
M.
Abolhasani
,
Nat. Commun.
14
,
1403
(
2023
).
244.
D. A.
Boiko
,
R.
MacKnight
,
B.
Kline
, and
G.
Gomes
,
Nature
624
,
570
(
2023
).