A knowledge and data-synergized intelligent computation architecture for materials was proposed within the data science paradigm. As a vital operation, two digital ensemble descriptors implying chemical composition and structural trend for crystals were created using the features contained in the Periodic Table of elements without a priori assumption, which affords causal emergence and regulation principles for the mechanical response of covalent and ionic solids. In addition to a linear correlation among structural state/mechanical response parameters, causal analytic relations in an exponential form between structural and thermodynamic state/mechanical response parameters and a digital ensemble descriptor were unveiled through least squares regression, in which the coefficients are classified in accordance with symmetry principles on the atom and lattice. Thereafter, the underlying physicochemical mechanisms of chemical pressure and chemical bonding are found responsible for the mechanical responses of bulk modulus and hardness of solids. At last, a physical prediction model was established for crystalline solids and demonstrated the feasibility of the predictive design of novel superhard materials. It is believed that by constructing suitable digital ensemble descriptors, this intelligent computation architecture and consequent physical prediction models on the basis of causal analytic relations are able to generalize by depicting crystalline solids with covalent and ionic bonds in other crystallographic structures.

Material chemistry concerns composition–structure–property triangle relationships. As shown in Fig. 1, three kinds of structure–property, composition–structure, and composition–property analytic relationships occur naturally. Among them, the structure–property relationship has been intensively and extensively investigated with experimental and theoretical methods. In addition to heavy experimental syntheses and characterizations, lattice constants, the volume of the unit cell (V0), bond length (d), bulk modulus (B0), phonon spectra, bandgap, and other physical properties are able to be obtained via first-principle calculations from the atomic number, mass, and size of constituent elements as input,1,2 which requires more complex software and significant computation power. Besides ab initio total-energy approaches, empirical models take advantage of their applicability to a broad class of materials and concise and useful trends for material designing.3–8 As exemplified by models No. 1–6 in Table I, some analytic relations between mechanical properties and structural parameters were proposed for covalent and ionic crystals. It is seen from Table I that empirical models connect mechanical properties with some lattice structural features and some elemental features, which exhibit various dependencies on the dimension of length or scale of bond length and should be clarified for these discrepancies. Nonetheless, quantitative structure–property relations remain little known for material chemistry due to some structural features not being digitized and most material data under operation conditions being missing. Furthermore, it is widely accepted that targeted performances of material are regulated in practice predominantly by changing compositions and secondly by controlling microstructures like refining crystallite sizes and so on. The composition–structure and composition–property relations set up in nature are a substantial cornerstone to regulate those desired properties for engineering applications as well as to discover new materials.9–11 Since the features of composition, including chemical formulas and chemical bonds, are not yet digitized, both analytic composition–structure and composition–property relations wait for unveiling, while experimental observations and theoretical calculations just get a simple mapping relationship for materials.

FIG. 1.

The composition–structure–property triangle relations center on material chemistry. Among them, the structure–property relationship has been intensively investigated but is still not satisfactory due to some structural features not digitizing and most material data under operation conditions being missing. It is known from physical chemistry that entropy is a function of volume and chemical potential is a function of volume and entropy, so Gibbs free energy as a function of volume, entropy, and chemical potential is determined firstly by the volume of material. Of most importance, analytic composition–structure and composition-physical/chemical/biological property relations remain unknown because the composition features are not digitized yet.

FIG. 1.

The composition–structure–property triangle relations center on material chemistry. Among them, the structure–property relationship has been intensively investigated but is still not satisfactory due to some structural features not digitizing and most material data under operation conditions being missing. It is known from physical chemistry that entropy is a function of volume and chemical potential is a function of volume and entropy, so Gibbs free energy as a function of volume, entropy, and chemical potential is determined firstly by the volume of material. Of most importance, analytic composition–structure and composition-physical/chemical/biological property relations remain unknown because the composition features are not digitized yet.

Close modal
TABLE I.

Summary of empirical models for mechanical response-structural parameter relations for covalent and ionic crystals. B0, Hv, and Hk are zero-pressure bulk moduli, Vickers hardness, and Knoop hardness, respectively. An ionicity indicator fe and fi is defined by ei (reference energy) and χi (atomic electronegativity), respectively.

Model no. Analytic relation Dimension for length Scale for bond length References
B0 = (1971 − 220λ)/d3.5 of which λ = 0, 1, and 2 for group IV,  −3.5  −3.5  1 and 2  
  III-V, and II-VI tetrahedral crystals, respectively       
  B0 = 550/d3 for halide salts  −3  −3   
B 0 = 2 M q B 2 9 d V 0 = 2 E g q B 2 9 V 0 of which M, qB, and Eg are Madelung constant,  −4  −1  3 and 4  
  bond charge, and bandgap width in a termed       
  bond charge model (BCM), respectively       
B 0 = m n 9 U min V 0 derived under assumption of cohesion energy  −3  −3  5  
  U r = A r m + B r n (integer n > m, A and B are constants larger       
  than zero) for covalent and ionic crystals       
H v = 556 N a e 1.191 f i / d 2.5 of which fi and Na  −5.5  −2.5  6  
  are the ionicity of chemical bond and the covalent       
  bond number per unit area, respectively       
H = 1550 e i e j e 4 f e / ( V 0 d i j n i j ) with ei = Zi/Ri, Zi is the valence electron   −5  −1  7  
  number of atom i, Ri is the covalent radius of each atom, and nij is the number       
  of bonds between atom i and its nearest neighboring atoms j at a distance dij       
H k = 423.8 N v χ i C N i χ j C N j e 2.7 f i 3.4 with χi = 0.481Zi/Ri, Nv is the  −4  8  
  number of covalent bonds per unit cell, and CNi is the        
  coordination number of the atom i forming a covalent bond       
B0d−3 ∝ 1/V0 and HvB0, of which the proportional coefficient is  −3  −3  This work 
  determined by the atom geometric arrangement and group of constituent elements       
Model no. Analytic relation Dimension for length Scale for bond length References
B0 = (1971 − 220λ)/d3.5 of which λ = 0, 1, and 2 for group IV,  −3.5  −3.5  1 and 2  
  III-V, and II-VI tetrahedral crystals, respectively       
  B0 = 550/d3 for halide salts  −3  −3   
B 0 = 2 M q B 2 9 d V 0 = 2 E g q B 2 9 V 0 of which M, qB, and Eg are Madelung constant,  −4  −1  3 and 4  
  bond charge, and bandgap width in a termed       
  bond charge model (BCM), respectively       
B 0 = m n 9 U min V 0 derived under assumption of cohesion energy  −3  −3  5  
  U r = A r m + B r n (integer n > m, A and B are constants larger       
  than zero) for covalent and ionic crystals       
H v = 556 N a e 1.191 f i / d 2.5 of which fi and Na  −5.5  −2.5  6  
  are the ionicity of chemical bond and the covalent       
  bond number per unit area, respectively       
H = 1550 e i e j e 4 f e / ( V 0 d i j n i j ) with ei = Zi/Ri, Zi is the valence electron   −5  −1  7  
  number of atom i, Ri is the covalent radius of each atom, and nij is the number       
  of bonds between atom i and its nearest neighboring atoms j at a distance dij       
H k = 423.8 N v χ i C N i χ j C N j e 2.7 f i 3.4 with χi = 0.481Zi/Ri, Nv is the  −4  8  
  number of covalent bonds per unit cell, and CNi is the        
  coordination number of the atom i forming a covalent bond       
B0d−3 ∝ 1/V0 and HvB0, of which the proportional coefficient is  −3  −3  This work 
  determined by the atom geometric arrangement and group of constituent elements       

Besides the error-and-trial and screening methods, material design in a predictive way is also developing to speed up the development and deployment of new materials.11–15 Physical prediction models for designing materials require at least three key elements:14 (1) physical and chemical principles, (2) causal analytic relations on changes, and (3) desired targets for every application. Schrödinger equations accompanying the symmetry principle describe the dynamic behaviors of atoms and atom close packing systems successfully, while the thermodynamic equation of Gibbs free energy describes phase formability, structural phase transformation, and chemical reaction’s direction and extent, which set up a substantial physical and chemical foundation for materials. The desired targets are determined by engineering application or technological development trends while discovering analytic relations between targeted performance and chemical composition/microstructure (object system), processing parameters (manufacturing system), and operation conditions (engineering system), which thus play a key role in establishing physical prediction models. Wherein, digital feature descriptors are substantial not only for the interpretability of physical models but also for the accuracy and efficiency of predictions. Currently, AI technology represented by machine learning is being introduced into the material community to undertake tasks such as selecting features, discovering relations, and solving equations for accelerating the discovery of materials.15–18 In recent years, machine learning has also been tried to find the hidden relations behind material data (AI for materials)16,17 or to find solutions to partial differential equations (AI for science).18 Different from traditional mathematical models, machine learning provides a data-centric modeling method that constructs models by learning representations or rules from data and works with statistical information rather than structural rules about how systems work.13 Most of the impressive progress obtained by deep learning was built on big data for small tasks, while AI for materials encountered the challenge of small data for multi-objective tasks.19 Machine learning is thus inefficient at tackling the dynamic and emergent behaviors of complex and large systems like materials. For the utmost goal, causal relationships are most substantial in both nature and data science to understand how a system works, why it works in this way, and what happens when it is disturbed externally.13 Therefore, when facing the reality of materials being small data, creating knowledge and data-synergized models and algorithms is alternatively promising and unavoidable to bridge the gap between data-driven models and mechanistic models, and developing causally intelligent computations could balance the efficiency of statistical modeling with the conciseness of physical models.13,14,20–23

Material research with traditional paradigms of experimental observation, theoretical modeling, and computational simulation provides data, models, and theories, while the data-science paradigm provides new insights and methods for data analysis and data generation.24 To unveil analytic relations hidden behind material data, digitizing chemical composition as feature descriptors plays a vital role in understanding the performance of those known materials as well as in developing new materials. Reductionism usually treats the features of chemical elements and lattice structure as descriptors, which inevitably result in high-dimensionally analytic relations, while emergentism says that the features of atoms (ions) and electrons cannot directly describe the collective behaviors and emergent properties of the atom close packing system.25,26 Therefore, it is necessary to construct digital ensemble descriptors from the features of chemical elements for crystalline materials, which naturally reduces the dimension for the next regression operation.14,20–23 In accordance with system theory and the quantum mechanical Schrödinger equation, as shown in Fig. 2, the inputs for designing crystalline materials are the feature data of chemical elements contained in the Periodic Table, element combination, and concentration, of which the choices of element and concentration are the content of inverse designing for material; the outputs are both structural/thermodynamic state parameters and physical/chemical/biological response parameters, of which their comprehensive combination varies with respect to engineering application and are the content of forward prediction. In contrast to correlative relations among the state and response parameters, causal analytic relations of the state/response parameters with respect to digital ensemble descriptors are much more substantial, on the basis of which an intelligent prediction model is physical and feasible with respect to interpretability, generalizeability, and accuracy. Through unveiling causal emergence and regulation principles behind every target property, physical prediction models and intelligent generation algorithms will be established consequently by combining ensemble descriptors and causal analytic relations. Only then is AI for material science physically interpretable: (1) forward prediction models to discover those governing rules, and (2) inverse design models capturing the physical and chemical world of materials accurately.13,18

FIG. 2.

The nature of analytic relationships among digital ensemble descriptors featuring atom close packing system–state parameters–response parameters. In accordance with quantum mechanics and system theory, analytic relationships between state/response parameters and digital ensemble descriptors are causal, while those among state and response parameters are correlative.

FIG. 2.

The nature of analytic relationships among digital ensemble descriptors featuring atom close packing system–state parameters–response parameters. In accordance with quantum mechanics and system theory, analytic relationships between state/response parameters and digital ensemble descriptors are causal, while those among state and response parameters are correlative.

Close modal

It is known from condensed matter physics that macroscopic physical properties are emergent phenomena of the atom close packing system, so they are not able to be described directly using the features of the atom such as atomic number, weight, radius, charge, spin, and so on.26 In crystalline solids, chemical composition and atom geometric arrangement in a unit cell are the features of matter dimension. Herein, two digital ensemble descriptors are creatively constructed for covalent and ionic solids with the features of elements including atomic weight, covalent radius, effective nuclear charge, the number of atoms in a unit cell as14,20–23 μ × r 1 r 2 / a B (μ is the reduced mass of primitive cell,20,21 ri covalent radius of constituent atoms,27 and aB Bohr radius), and Z 1 * Z 2 * / Z H * ( Z i * and Z H * are effective nuclear charge of constituent atoms and hydrogen,28 respectively), in which aB and Z H * are included here just for dimensionless convenience. Adopting geometric averaged covalent radii and effective nuclear charge is due to two sublattices in these diamond-, zinc-blende-, and rocksalt-type crystallographic structures. As illustrated in Fig. 2, digitizing chemical composition and structural trends as ensemble descriptors makes it possible to establish causal quantitative analytic relations with structural and thermodynamic state/physical response parameters.

So far, it has been seen that, through adopting the data science paradigm and hackling the connotation and denotation of material data, the relationships behind material data are discriminated into two kinds: causality and correlation. With digital ensemble descriptors constructed by the author, causal analytic relations between chemical composition and state/response parameters could be found via small-data mining, and thereafter, causal emergence and regulation principles for forward prediction and inverse design of crystalline materials could be unveiled. In this section, some covalent and ionic crystals were exemplified to illustrate this architecture of intelligent computation for materials, and a physical prediction model for superhard materials was finally illustrated on the basis of causal analytic relations.

Within the experimental observation paradigm, symmetry was treated as a passive result of the interaction of the atom close packing system. On the contrary, symmetry plays an active role in dictating the interaction of atom close packing systems within the theoretical modeling paradigm.29,30 As illustrated in Figs. 1 and 2, choosing chemical elements and concentration is active in designing materials, while the stable configuration of atom geometric arrangement is a resulted state with a minimum free energy of the atom close packing system dictated by lattice symmetry under a certain thermodynamic boundary condition. In consequence, lattice structure, electronic structure, thermodynamic state, and physical properties are inherently embodying information of symmetry and correlative each other, which vary with changing constituent elements and thermodynamic conditions and reflect various aspects of state/response to external stimuli. Figure 3 illustrates such correlations among bulk moduli (B0), hardness (Hv), the volume of the unit cell (V0), and bond length (d) for diamond-, zinc-blende-, and rocksalt-structured covalent and ionic solids with various ionicities in a range of 0–1.00.31 Through a least-square regression, a linear relation between B0d−3 ∝ 1/V0 and Hvd−3 ∝ 1/V0 was obtained, and those proportional coefficients are summarized in Table II for various symmetric crystals.

FIG. 3.

Data-mining correlative relations among bulk moduli (B0),1,31–33 hardness (Hv),6,7,34 the volume of the unit cell (V0),35 and bond length (d)1,6,7,31,33 for diamond-, zinc-blende-, and rocksalt-structured solids with various ionicities in a range of 0–1.00.

FIG. 3.

Data-mining correlative relations among bulk moduli (B0),1,31–33 hardness (Hv),6,7,34 the volume of the unit cell (V0),35 and bond length (d)1,6,7,31,33 for diamond-, zinc-blende-, and rocksalt-structured solids with various ionicities in a range of 0–1.00.

Close modal
TABLE II.

Summary of proportional coefficients during data-mining correlation among mechanical properties and structural parameters showed in Fig. 3. The superscript of fit, calc, and theory stands for the method used to obtain this proportional coefficient, while the subscript for the relation between the corresponding parameters.

B0 ∝ 1/d3 ∝ 1000/V0 Hv ∝ 1/d3 ∝ 1000/V0 V0d3 HvB0 HkB0
Structure Group C B d fit C B V fit C V d B calc C H d fit C H V fit C V d H calc C V d fit C V d theory C H B fit C d 3 calc C V 0 calc C H B fit
Diamond  IV  1824  21.3  11.7  382  4.43  11.6  8.4  8.0  0.208  0.209  0.208  0.228 
Zinc blende  III-V  1655  20.4  12.3  282  3.48  12.3  0.172  0.170  0.170  0.133 
II-VI  1009  10.1  10.0  31.7  0.398  12.5  0.019  0.031  0.039  0.028 
Rocksalt  I-VII  635  5.24  8.3  9.65  0.080  8.3  13.2  12.3  0.015  0.015  0.015   
B0 ∝ 1/d3 ∝ 1000/V0 Hv ∝ 1/d3 ∝ 1000/V0 V0d3 HvB0 HkB0
Structure Group C B d fit C B V fit C V d B calc C H d fit C H V fit C V d H calc C V d fit C V d theory C H B fit C d 3 calc C V 0 calc C H B fit
Diamond  IV  1824  21.3  11.7  382  4.43  11.6  8.4  8.0  0.208  0.209  0.208  0.228 
Zinc blende  III-V  1655  20.4  12.3  282  3.48  12.3  0.172  0.170  0.170  0.133 
II-VI  1009  10.1  10.0  31.7  0.398  12.5  0.019  0.031  0.039  0.028 
Rocksalt  I-VII  635  5.24  8.3  9.65  0.080  8.3  13.2  12.3  0.015  0.015  0.015   

On bulk modulus (B0), the proportion coefficient ( C B d fit ) as a function of d−3 was obtained as 1824, 1655, 1009, and 635 for diamond-structured solids of group IV, zinc blende solids of group III-V, II-VI, and rocksalt solids of group I-VII, respectively. As illustrated by Fig. 3, this classification is similar to Cohen’s observation (model No. 1 in Table I) of the proportional coefficient determined by the combination of crystallographic structure and constituent elements.1,2 However, in contrast to Cohen’s empirical model with different scales for tetrahedral- and octahedral-coordinated solids, a universal scale law of B0 = (1912 − 421λ)/d3 occurs to depict the quantitative bulk modulus of covalent and ionic solids with various ionicities in a range of 0–1.00, of which the proportion coefficient has a simple dimension of energy and λ = 0, 1, 2, and 3 for groups IV, III-V, II-VI, and I-VII, respectively. At the same time, the proportion coefficient ( C B V fit ) as a function of 1000/V0 was obtained as 21.3, 20.4, 10.1, and 5.24 for groups IV, III-V, II-VI, and I-VII, respectively. This observation is very different from the inference of the bond charge model (model No. 2 in Table I), which was simultaneously associated with bandgap and bond charge in addition to the volume of the unit cell as variables. In a well-accepted phenomenological model (model No. 3 in Table I), following the assumption of cohesion energy, the bulk modulus is derived as B0 ∝ 1/V0 on thermodynamic equilibrium, of which the proportion coefficient is determined by the constants of m, n, A, and B. Although B0 is often interpreted as quantitatively dependent on volumetric energy density connected with compression,5 this analytic relation implies those trends observed in Fig. 3 when the constants of m, n, A, and B are kept for every class of crystals marked by lattice symmetry and constituent elements as showed in Table II. In each crystallographic lattice, atoms occupy the same space in the same way, so the Coulomb interaction between atoms and relevant cohesion energy is thus determined by constituent elements. From both crystallographic analyses and data-mining, V0 is found to be linearly proportional to d3 for zinc blende, rocksalt, cesium chloride, corundum (A2O3), and rutile (AO2)-structured crystals.35 Therefore, B0 ∝ 1/V0d−3 occurs universally for covalent and ionic solids with various crystallographic structures.

On hardness (Hv), the proportion coefficient ( C H d fit ) as a function of d−3 was obtained as 382, 282, 31.7, and 9.65, while the proportion coefficient ( C H V fit ) as a function of 1000/V0 was 4.43, 3.48, 0.398, and 0.080 for groups IV, III-V, II-VI, and I-VII, respectively. In comparison with empirical models No. 4–6 listed in Table I, a linear relation of Hv ∝ 1/d3 ∝ 1/V0 is much more concise to depict the structure–property relation for covalent and ionic crystals. On one hand, those empirical analytic relations shown in Table I contain some parameters on both atomic (e.g., Zi, Ri) and unit cell (e.g., V0, dij, the coordination number of each atom, and the number of atoms in a unit cell) hierarchical levels.6–8 In quantum mechanics, the parameters of a unit cell describe quantitatively the state of a solid and are determined by the features of the constituent atoms. On the other hand, each parameter at the unit cell level is a function of those at the atomic level in mathematics. For instance, crystallographic analyses have established well-established analytic relations between lattice constants and volume of unit cell (bond length) for each Bravais lattice, e.g., V0 = a3 = (2d)3 and V 0 = a 3 = ( 4 d / 3 ) 3 (The format of equation should be modified for it looks like subscript.), obtained theoretically for rocksalt and diamond structures, respectively.31 Through data-mining extant experimental data, as shown in Fig. 3, V0 = 8.4d3 and V0 = 13.2d3 were obtained for rocksalt and diamond structures, respectively. So far, it is seen from Fig. 3 and Table II that there is a simple and concise analytic relation between each couple among structural parameters and mechanical properties, which is not like the complex empirical relations shown in Table I.

Although hardness is different from bulk modulus, they are considered correlative to each other for covalent and ionic crystals because both compressibility and hardness are mechanical responses of the atom close packing system to pressure. Till now, the nature of both of them has been widely understood microscopically on the basis of valence electron behaviors (bond length, charge density, ionicity, strength, and others) and chemical bonding.2,6,7,36,37 As illustrated by Fig. 3, a linear correlation of HvB0 with a proportion coefficient ( C H B fit ) of 0.208, 0.172, 0.019, and 0.015 was obtained for groups IV, III-V, II-VI, and I-VII, respectively. In a similar way, a linear relation of HkB0 with a proportion coefficient of 0.228, 0.133, and 0.028 was also obtained for diamond-structured solids of groups IV, III-V, and II-VI, respectively. Hardness is a complex property related to the extent to which solids resist both elastic and plastic deformations, of which the value is defined experimentally by pressing an indenter into the surface of the material and measuring the size of the impression.7 Knoop’s scale uses a sharper diamond wedge than Vicker’s one. As shown by C H B fit in Table II, the ratio of Hk to Hv is varied, not like the expectation of Hk being lower than Hv, which may need to take the microstructure and impurity effects into account further. So far, it has been seen that a linear correlation occurs between hardness and bulk modulus, which is not complex like those empirical models shown in Table I. In Griffith’s crack theory, both crack strength and crack toughness are determined predominantly by the material’s parameters of elastic modulus and surface energy,14 which also supports the fact that a low compressibility with a large bulk modulus is promising for hard materials.

At the moment, it should be noted that more correlative analytic relations between structure and property, between different state parameters, and between different properties are still waiting for data-mining, as illustrated in Fig. 3. However, data-mining analytic relations between composition and structure/property should be carried out priorly because they provide causal emergence and regulation principles for material science and engineering.

In Fig. 4, the room temperature volume of the unit cell (V0), bond length (d), bulk modulus (B0), hardness (Hv), and entropy (S0) were plotted with respect to μ × r 1 r 2 / a B or Z 1 * Z 2 * / Z H * for diamond-, zinc-blende-, and rocksalt-structured solids. Through data-mining with least squares regression, it was found an exponential relation of y = A exp(−x/B) + C (A, B, and C are constants) that depicts quantitatively well the trend of structural parameters (1/d3 and 1/V0) and mechanical response parameters (B0 and Hv) with respect to μ × r 1 r 2 / a B , of which the constants (A, B, and C) were classified into three classes: diamond/zinc-blende-type solids of group IV, III-V, and II-VI compounds, rocksalt-type solids of I-VI excepting Li halides, and rocksalt-type solids of Li halides. In comparison with Fig. 3, it is noted that, for diamondlike solids of groups IV, III-V, and II-VI, the causal relations of composition–structural/mechanical parameters belong to one class, while the slope of the linear relation of structure–mechanical properties varies with changing the group of constituent elements, of which the latter incorporates the microstructure effect of solids. On rocksalt-type halides, such two classifications can be reasonably attributed to the feature of atom size for the covalent radius ratio of Li/Na (0.69) being much smaller than that of Na/K (0.84), K/Rb (0.90), and Rb/Cs (0.88). On thermodynamic parameter S0, rocksalt-type halides keep the same classification as structural and mechanical parameters, while diamond-type solids (group IV) and zinc-blende-type solids (groups III-V and II-VI) are separated into two classes, of which the latter is attributed to the factor of two kinds of constituent elements increasing the number of possible microscopic arrangements (configurational entropy). With respect to Z 1 * Z 2 * / Z H * , both 1/V0 and Hv do not exhibit a monotonic trend with a maximum response at the position of the diamond, on both sides of which an exponential trend for diamond/zinc-blende-type solids was observed. So far, the digital ensemble descriptors constructed here are seen as chemically intuitive and physically meaningful to extract major chemical compositions and structural trends of crystalline solids, which affords the mechanism of those learned relations between state/response parameters and ensemble descriptors.

FIG. 4.

Data-mining causal relations of bond length (d),1,6,7,31,33 unit cell volume (V0),35 bulk moduli (B0),1,31–33 hardness (Hv),6,7,34 and entropy (S0)27 experimentally recorded at room temperature with respect to the μ × r 1 r 2 / a B or Z 1 * Z 2 * / Z H * ensemble descriptor. The solid/dashed lines in the form of y = A exp(−x/B) + C (A, B, and C are constants) were obtained through a least square regression, which describes the trend of each class of crystalline solids classified by element group and lattice structure. Black star: Vicker hardness (Hv), red circle: bulk moduli (B0), blue half-filled down-triangle: inverse volume of the unit cell (1/V0), pink half-filled up-triangle: bond length (d), Inset picture: schematic processing distribution of state/response parameters in practice. Orange star and dashed line: Hv of cubic BC2N38 and BC5.39 These plots represent principal component analysis (PCA) relevant to the chemical composition of crystalline solids, while the microstructure effect is reflected by the deviation of each point from the line, of which the relevant details are often not well documented in published papers and handbooks. The corresponding data are the same as in Fig. 3.

FIG. 4.

Data-mining causal relations of bond length (d),1,6,7,31,33 unit cell volume (V0),35 bulk moduli (B0),1,31–33 hardness (Hv),6,7,34 and entropy (S0)27 experimentally recorded at room temperature with respect to the μ × r 1 r 2 / a B or Z 1 * Z 2 * / Z H * ensemble descriptor. The solid/dashed lines in the form of y = A exp(−x/B) + C (A, B, and C are constants) were obtained through a least square regression, which describes the trend of each class of crystalline solids classified by element group and lattice structure. Black star: Vicker hardness (Hv), red circle: bulk moduli (B0), blue half-filled down-triangle: inverse volume of the unit cell (1/V0), pink half-filled up-triangle: bond length (d), Inset picture: schematic processing distribution of state/response parameters in practice. Orange star and dashed line: Hv of cubic BC2N38 and BC5.39 These plots represent principal component analysis (PCA) relevant to the chemical composition of crystalline solids, while the microstructure effect is reflected by the deviation of each point from the line, of which the relevant details are often not well documented in published papers and handbooks. The corresponding data are the same as in Fig. 3.

Close modal

In chemistry, the Periodic Table brings ordering into the chemical and physical behaviors of elements, of which the period and group are classified in accordance with an electronic configuration that is dictated by quantum mechanics and the rotation symmetry of the Coulomb force.14,29,40 In quantum mechanics for solids, it is popularly assumed that those atomic core electrons are unchanged when a solid is formed, which reduces the problem to a description of a model solid in which a sea of valence electrons interacts with a periodic array of positive cores.41 Thereafter, the problem of predicting the existence of crystalline solids is reduced, in principle, to checking the total energy of a system of valence electrons interacting with different structural arrays of cores and with each other. Within the paradigm of computation simulation, the total energy of the atom close packing system is dictated by the lattice symmetry of the point group, i.e., by the geometric arrangement of the atom. Following the above-mentioned ideas, it is observed from Figs. 3 and 4 that the features of period and group of constituent elements and geometric arrangements of atoms combinatorially provide natural criteria for the classification of crystalline solids, which means those crystalline materials formed by constituent elements of the same group with the same geometric arrangement configuration exhibiting a similar behavior except in quantity for their state/response parameters.

Chemical bonding is an alternative language for quantum chemistry.42,43 Within the energetic scenario of bond formation, the lowering of electron kinetic energy upon the initial encounter of atoms has long been cited as the primary origin of covalent chemical bonds. Slater identified the role of potential energy and the accumulation of electron density in the internuclear region of bonding, and Feynman developed an image of bonding in terms of electrostatic forces. Within the state scenario of a chemical bond, quantum mechanical wavefunction interference drives the lowering of the kinetic energy of two approaching atoms. Within the real space scenario, chemical bonds occur between interacting atoms as objects in the from of electron-pair bonding after final stable or metastable atom arrangements, which are divided into various kinds of covalent, ionic, metallic, and molecular bonds. Although chemical bonds have been widely argued, their descriptions remain on a qualitative level. In contrast to the electronegativity of valence electrons, the effective nuclear charge ( Z i * ) of an atom is a causal factor in determining bond strength and degree of ionicity and, therefore, Z 1 * Z 2 * / Z H * is thought to be a good quantitative descriptor as a whole, affording the bonding strength of solids on valence electrons, regardless of the kind of chemical bond and the difference in ionicity. Simultaneously, μ × r 1 r 2 / a B is closely related to the volume effect of the atom close packing system, taking the kinetic energy of relative motion into account, which is a causal factor of chemical pressure. From the above discussion, it is seen that both ensemble descriptors constructed here imply information about chemical interaction on the atomic level.

In Fig. 5, the relation between μ × r 1 r 2 / a B and Z 1 * Z 2 * / Z H * was checked furthermore, and an exponential analytic relation in the same form of y = A exp(−x/B) + C was also obtained. Through adjusting the scale range for alignment as illustrated in Fig. 5, the inversed volume of the unit cell (1/V0), hardness (Hv), and fracture toughness (KIC) on a logarithmic scale follow the same trend as the Z 1 * Z 2 * / Z H * ensemble descriptor on a linear scale. It means that the physical prediction model studied here has upgraded from data regression to data generation using causal analytic relations. Of importance, it should be noted that the state/response parameter experimentally recorded for crystalline solids is a comprehensive result of chemical composition and microstructure, including grain size, pores, impurity phase, and so on, of which the latter factors change from sample to sample due to the unavoidable distribution of processing parameters in practice. In Figs. 4 and 5, it is just taking the predominant factor of chemical composition into account so that the experimental value scatters around the line of y = A exp(−x/B) + C. Keeping the microstructure effect on the state/response parameters in mind and adopting the idea of principal component analysis (PCA), the trend of mechanical properties such as hardness and fracture toughness for diamondlike solids is able to be quantitatively predicted using correlative μ × r 1 r 2 / a B and Z 1 * Z 2 * / Z H * descriptors, and their intrinsic value determined by chemical composition can be obtained through physics-guided small-data mining out of extant experimental data. So far, it is seen that an exponential model describes the causality of structure/thermodynamic state and mechanical properties and the correlation between chemical bonding and chemical pressure, of which the coefficient A stands for a pre-exponential factor of every dependent parameter, B for characteristic quantity with respect to each argument for crystalline solids, and C for the baseline of each class of atom close packing system, respectively, owing to both ensemble descriptors being dimensionless.

FIG. 5.

Prediction model of the state and response parameters using two digital ensemble descriptors of the atom close packing system for diamond/zinc blende structured solids. Black square: Z 1 * Z 2 * / Z H * ensemble descriptor, red circle: Vicker hardness (Hv),6,7 blue star: inverse volume of unit cell (1/V0),35 pink half-filled diamond: fracture toughness (KIC),44–46 inset picture: enlarged part, solid line: regression result. Orange crossed/solid circle: Hv of cubic BC2N38 and BC5.39 These plots represent principal component analysis relevant to the chemical composition of crystalline solids, while the microstructure effect is reflected by the deviation of each point from the line. The corresponding data are the same as in Figs. 3 and 4.

FIG. 5.

Prediction model of the state and response parameters using two digital ensemble descriptors of the atom close packing system for diamond/zinc blende structured solids. Black square: Z 1 * Z 2 * / Z H * ensemble descriptor, red circle: Vicker hardness (Hv),6,7 blue star: inverse volume of unit cell (1/V0),35 pink half-filled diamond: fracture toughness (KIC),44–46 inset picture: enlarged part, solid line: regression result. Orange crossed/solid circle: Hv of cubic BC2N38 and BC5.39 These plots represent principal component analysis relevant to the chemical composition of crystalline solids, while the microstructure effect is reflected by the deviation of each point from the line. The corresponding data are the same as in Figs. 3 and 4.

Close modal

As one of their practical applications, superhard materials are useful in a wide range of industrial applications, from scratch-resistant coatings to polishing and cutting tools. Previously, theoretical and experimental efforts focused on the possibility of finding new low compressibility materials with a hardness comparable to diamonds through the correlation between valence electrons and hardness, because the strength and compressibility of a bond were thought to determine a solid’s ability to resist deformation.2,33 Theoretical arguments have also yielded a simple scenario of covalent bonding in diamond- and zinc-blende-type crystals, promising ultrahigh bulk modulus and hardness. From Figs. 35 and Table II, it is observed that bulk modulus and hardness are strongly dependent on the feature of elements marked by position (period and group) in the Periodic Table of elements, and that diamond is the hardest solid with the largest bulk modulus. From the viewpoint of ensemble descriptors, lower μ × r 1 r 2 / a B coupled with suitable Z 1 * Z 2 * / Z H * implies higher hardness and lower compressibility for diamond/zinc blende structured solids, both of which are relevant to smaller covalent radii of elements. On one hand, the elements B, C, and N have smaller covalent radii of 0.88, 0.77, and 0.70 Å, respectively, and superhard diamond and cubic BN solids are yielded in practice.2,41 On the other hand, for those compounds in the center of the Periodic Table of elements, cubic BC2N was proposed as a candidate for superhard material due to its shorter bond length.41 Experimentally, cubic BC2N was synthesized with the high-pressure synthesis method, while the bulk modulus was 282 GPa and the Vicker hardness was 76 GPa.38 Through calculating its μ × r 1 r 2 / a B and Z 1 * Z 2 * / Z H * , as demonstrated in Figs. 4 and 5, it is found that the predicted Hv for cubic BC2N agrees well with the experimental one. The hardness of cubic BC2N is higher than that of cubic BN but less than that of diamond, which coincides with the trend predicted by chemical pressure and chemical bonding combinatorially. Moreover, cubic BC5 was also proposed and synthesized using the high-pressure synthesis method.39 As demonstrated by Figs. 4 and 5, the trend for the hardness of diamond, cubic BC5, BC2N, and BN solids coincides with that of Z 1 * Z 2 * / Z H * with respect to μ × r 1 r 2 / a B . So far, it has been seen that the physical prediction model established here on the basis of causal relations with respect to chemical pressure and chemical bonding ensemble descriptors provides a feasible way to predict mechanical properties from chemical formulas and provides essential guidance for searching for new materials like superhard for cutting tools and bearings.

In summary, an intelligent prediction architecture synergized by small data and physicochemical theories of materials was proposed after adopting the data science paradigm to survey the connotation and denotation of material data. This intelligent prediction model contains three key components: First, material data are classified into various classes and hierarchies in accordance with system theory, while digital ensemble descriptors, including chemical formulas and chemical bonds, are constructed using the atomic features contained in the Periodic Table of elements within the framework of quantum mechanics, the Schrödinger equation, and the symmetry principle. Second, causal analytic relations between state/response parameters and digital ensemble descriptors hidden behind extant experimental data on materials, for instance, an exponential function for diamond, zinc blende, and rocksalt-structured solids, are mined out through physics-guided small-data intelligent algorithms. Therefore, causal principles for the emergence and regulation of every target property are unveiled to understand the underlying physicochemical mechanism and to improve the performance of crystalline materials. Third, an end-to-end physical prediction model based on causal analytic relations is thus established to connect multi-objectively comprehensive performance with those features of chemical elements through digital ensemble descriptors, which imply chemical composition and atom geometric arrangement in the unit cell of crystalline materials. Our essay opens a new era other than trial-and-error and screening for both forward prediction of state/response parameters and inverse designing of chemical composition for new materials in a predictive way, e.g., superhard materials.

See the supplementary material for more details of the data and calculations of ensemble descriptors used in this study, which includes an Excel file.

The authors acknowledge the National Natural Science Foundation of China (Grant Nos. 12274066 and 61771122), the Research Project of Zhejiang Lab (Grant No. 2021PE0AC02), and the Department of Science and Technology of Zhejiang Province (Grant No. 2023C01182) for supporting this research.

The authors have no conflicts to disclose.

Zhijie Hu: Data curation (equal); Formal analysis (equal); Writing – original draft (supporting); Writing – review & editing (supporting). Jian Yu: Conceptualization (lead); Funding acquisition (lead); Data curation (lead); Formal analysis (lead); Writing – original draft (lead); Writing – review & editing (lead).

The data that support the findings of this study are available within the article and as supplementary material in an Excel file.

1.
M. L.
Cohen
, “
Calculation of bulk moduli of diamond and zinc-blende solids
,”
Phys. Rev. B
32
,
7988
7991
(
1985
).
2.
A. Y.
Liu
and
M. L.
Cohen
, “
Prediction of new low compressibility solids
,”
Science
245
,
841
842
(
1989
).
3.
J.
Contreras-García
and
C.
Cardenas
, “
On understanding the chemical origin of band gaps
,”
J. Mol. Model.
23
,
271
(
2017
).
4.
J.
Contreras-García
,
M.
Marqués
,
J. M.
Menéndez
, and
J. M.
Recio
, “
From ELF to compressibility in solids
,”
Int. J. Mol. Sci.
16
,
8151
8167
(
2015
).
5.
F. L.
Chen
,
W. J.
Fan
,
S. M.
Zhou
, and
X. P.
Qiu
, “
Effects of cohesion energy on properties of lattice dynamics
,”
Coll. Phys.
41
(
9
),
1
3
(
2022
)
(in Chinese).
6.
F. M.
Gao
,
J. L.
He
,
E. D.
Wu
,
S. M.
Liu
,
D. L.
Yu
,
D. C.
Li
,
S. Y.
Zhang
, and
Y. J.
Tian
, “
Hardness of covalent crystals
,”
Phys. Rev. Lett.
91
,
015502
(
2003
).
7.
A.
Šimůnek
and
J.
Vackář
, “
Hardness of covalent and ionic crystals: First-principle calculations
,”
Phys. Rev. Lett.
96
,
085501
(
2006
).
8.
K. Y.
Li
,
X. T.
Wang
,
F. F.
Zhang
, and
D. F.
Xue
, “
Electronegativity identification of novel superhard materials
,”
Phys. Rev. Lett.
100
,
235504
(
2008
).
9.
A.
Molkeri
,
D.
Khatamsaz
,
R.
Couperthwaite
,
J.
James
,
R.
Arróyave
,
D.
Allaire
, and
A.
Srivastava
, “
On the importance of microstructure information in materials design: PSP vs PP
,”
Acta Mater.
223
,
117471
(
2022
).
10.
N.
Kireeva
and
V. P.
Solov’ev
, “
Machine learning analysis of microwave dielectric properties for seven structure types: The role of the processing and composition
,”
J. Phys. Chem. Solids
156
,
110178
(
2021
).
11.
D.
Raabe
,
J. R.
Mianroodi
, and
J.
Neugebauer
, “
Accelerating the design of compositionally complex materials via physics-informed artificial intelligence
,”
Nat. Comput. Sci.
3
,
198
209
(
2023
).
12.
L. M.
Ghiringhelli
,
J.
Vybiral
,
S. V.
Levchenko
,
C.
Draxl
, and
M.
Scheffler
, “
Big data of materials science: Critical role of the descriptor
,”
Phys. Rev. Lett.
114
,
105503
(
2015
).
13.
P.
Berens
,
K.
Cranmer
,
N. D.
Lawrence
,
U.
von Luxburg
, and
J.
Montgomery
, “
AI for science: An emerging agenda
,” arXiv:2303.04217 (
2023
).
14.
J.
Yu
and
J. H.
Chu
,
Perovskite-Structured Ferroic Materials
(
Chinese Scientific Publisher
,
Beijing
,
2022
) (in Chinese).
15.
A.
Jain
,
G.
Hautier
,
S. P.
Ong
, and
K.
Persson
, “
New opportunities for materials informatics: Resources and data mining techniques for uncovering hidden relationships
,”
J. Mater. Res.
31
,
977
994
(
2016
).
16.
R.
Ramprasad
,
R.
Batra
,
G.
Pilania
,
A.
Mannodi-Kanakkithodi
, and
C.
Kim
, “
Machine learning in materials informatics: Recent applications and prospects
,”
npj Comput. Mater.
3
,
54
(
2017
).
17.
S.
Sun
,
R. H.
Ouyang
,
B. C.
Zhang
, and
T. Y.
Zhang
, “
Data-driven discovery of formulas by symbolic regression
,”
MRS Bull.
44
,
559
564
(
2019
).
18.
E. O.
Pyzer-Knapp
,
J. W.
Pitera
,
P. W. J.
Staar
,
S.
Takeda
,
T.
Laino
,
D. P.
Sanders
,
J.
Sexton
,
J. R.
Smith
, and
A.
Curioni
, “
Accelerating materials discovery using artificial intelligence, high performance computing and robotics
,”
npj Comput. Mater.
8
,
84
(
2022
).
19.
P. C.
Xu
,
X. B.
Ji
,
M. J.
Li
, and
W. C.
Lu
, “
Small data machine learning in materials science
,”
npj Comput. Mater.
9
,
42
(
2023
).
20.
J.
Yu
,
F. F.
An
, and
F.
Cao
, “
Ferroic phase transition of tetragonal Pb0.6−xCaxBi0.4(Ti0.75Zn0.15Fe0.1)O3 ceramics: Factors determining Curie temperature
,”
Jpn. J. Appl. Phys.
53
,
051501
(
2014
).
21.
J.
Yu
and
M.
Itoh
, “
Physics-guided data-mining driven design of room-temperature multiferroic perovskite oxides
,”
Phys. Status Solidi RRL
13
,
1900028
(
2019
).
22.
Q.
Wu
,
H. P.
Ning
, and
J.
Yu
, “
Narrow bandgap ferroelectric semiconductors within BiFeO3-based solid solution perovskites
,”
Chin. Sci. Bull.
66
,
4045
4053
(
2021
) (in Chinese).
23.
J.
Yu
,
H. P.
Ning
, and
Q.
Wu
, “
Room temperature ferromagnetic spin ordering in multiferroic double perovskite oxides
,” in
2021 IEEE-ISAF
(
IEEE Xplore
,
2021
).
24.
A.
Agrawal
and
A.
Choudhary
, “
Perspective: Materials informatics and big data: Realization of the ‘fourth paradigm’ of science in materials science
,”
APL Mater.
4
,
053208
(
2016
).
25.
D.
Feng
and
G. J.
Jin
, “
Basic concepts in condensed matter physics
,”
Prog. Phys.
20
,
1
21
(
2000
) (in Chinese).
26.
P. W.
Anderson
, “
More is different: Broken symmetry and the nature of the hierarchical structure of science
,”
Science
177
,
393
396
(
1972
).
27.
W. M.
Haynes
,
D. R.
Lide
, and
T. J.
Bruno
,
CRC Handbook of Chemistry and Physics: A Ready-Reference Book of Chemical and Physical Data
, 97th ed. (
CRC Press, Taylor & Francis Group
,
Boca Raton, London, New York
,
2017
).
28.
K. Y.
Li
and
D. F.
Xue
, “
Estimation of electronegativity values of elements in different valence states
,”
J. Phys. Chem. A
110
,
11332
11337
(
2006
).
29.
C. N.
Yang
, “
Einstein’s impact on theoretical physics
,”
Phys. Today
33
(
6
),
42
49
(
1980
).
30.
R. E.
Newnham
,
Properties of Materials—Anisotropy, Symmetry, Structure
(
Oxford University Press
,
New York
,
2005
).
31.
C.
Kittel
,
Introduction to Solid State Physics
, 8th ed. (
John Wiley & Sons, Inc.
,
2005
).
32.
M. L.
Cohen
, “
Theory of bulk moduli of hard solids
,”
Mater. Sci. Eng.: A
105–106
,
11
18
(
1988
).
33.
P. K.
Lam
,
M. L.
Cohen
, and
G.
Martinez
, “
Analytic relation between bulk moduli and lattice constants
,”
Phys. Rev. B
35
,
9190
9194
(
1987
).
34.
D. B.
Sirdeshmukh
,
K. G.
Subhadra
,
K.
Kishan Rao
, and
T.
Thirmal Rao
, “
Hardness of crystals with NaCl structure and the significance of the GILMAN-CHIN parameter
,”
Cryst. Res. Technol.
30
,
861
866
(
1995
).
35.
K. Y.
Li
,
Z. S.
Ding
, and
D. F.
Xue
, “
Electronegativity-related bulk moduli of crystal materials
,”
Phys. Status Solidi B
248
,
1227
1236
(
2011
).
36.
X. J.
Guo
,
L.
Li
,
Z. Y.
Liu
,
D. L.
Yu
,
J. L.
He
,
R. P.
Liu
,
B.
Xu
,
Y. J.
Tian
, and
H. T.
Wang
, “
Hardness of covalent compounds: Roles of metallic component and d valence electrons
,”
J. Appl. Phys.
104
,
023503
(
2008
).
37.
C. F.
Schön
,
S.
Van Bergerem
,
C.
Mattes
,
A.
Yadav
,
M.
Grohe
,
L.
Kobbelt
, and
M.
Wuttig
, “
Classification of properties and their relation to chemical bonding: Essential steps toward the inverse design of functional materials
,”
Sci. Adv.
8
,
eade0828
(
2022
).
38.
V. L.
Solozhenko
et al, “
Synthesis of superhard cubic BC2N
,”
Appl. Phys. Lett.
78
,
1385
1387
(
2001
).
39.
V. L.
Solozhenko
,
O. O.
Kurakevych
,
D.
Andrault
,
Y.
Le Godec
, and
M.
Mezouar
, “
Ultimate metastable solubility of boron in diamond: Synthesis of superhard diamondlike BC5
,”
Phys. Rev. Lett.
102
,
015506
(
2009
).
40.
P.
Schwerdtfeger
,
O. R.
Smits
, and
P.
Pyykkö
, “
The periodic table and the physics that drives it
,”
Nat. Rev. Chem.
4
,
359
380
(
2020
).
41.
M. L.
Cohen
, “
Predicting useful materials
,”
Science
261
,
307
308
(
1993
).
42.
D. S.
Levine
and
M.
Head-Gordon
, “
Clarifying the quantum mechanical origin of the covalent chemical bond
,”
Nat. Commun.
11
,
4893
(
2020
).
43.
Á.
Martín Pendás
and
E.
Francisco
, “
The role of references and the elusive nature of the chemical bond
,”
Nat. Commun.
13
,
3327
(
2022
).
44.
M.
Bauccio
,
ASM Engineered Materials Reference Book
, 2nd ed. (
ASM International
,
Materials Park, OH
,
1994
).
45.
I.
Chasiotis
,
S. W.
Cho
, and
K.
Jonnalagadda
, “
Fracture toughness and subcritical crack growth in polycrystalline silicon
,”
J. Appl. Mech.
73
,
714
722
(
2006
).
46.
J.
Pittari
III
,
G.
Subhash
,
J.
Zheng
,
V.
Halls
, and
P.
Jannotti
, “
The rate-dependent fracture toughness of silicon carbide- and boron carbide-based ceramics
,”
J. Eur. Ceram. Soc.
35
,
4411
4422
(
2015
).