We have analyzed structural motifs in the Deem database of hypothetical zeolites to investigate whether the structural diversity found in this database can be well-represented by classical descriptors, such as distances, angles, and ring sizes, or whether a more general representation of the atomic structure, furnished by the smooth overlap of atomic position (SOAP) method, is required to capture accurately structure–property relations. We assessed the quality of each descriptor by machine-learning the molar energy and volume for each hypothetical framework in the dataset. We have found that a SOAP representation with a cutoff length of 6 Å, which goes beyond near-neighbor tetrahedra, best describes the structural diversity in the Deem database by capturing relevant interatomic correlations. Kernel principal component analysis shows that SOAP maintains its superior performance even when reducing its dimensionality to those of the classical descriptors and that the first three kernel principal components capture the main variability in the dataset, allowing a 3D point cloud visualization of local environments in the Deem database. This “cloud atlas” of local environments was found to show good correlations with the contribution of a given motif to the density and stability of its parent framework. Local volume and energy maps constructed from the SOAP/machine learning analyses provide new images of zeolites that reveal smooth variations of local volumes and energies across a given framework and correlations between the contributions to volume and energy associated with each atom-centered environment.
For some of the ring descriptors, the dataset contains fewer than 2000 unique feature vectors, in which case only the unique representations were considered as representative environments.