Feature selection (FS) methods often are used to develop data-driven descriptors (i.e., features) for rapidly predicting the functional properties of a physical or chemical system based on its composition and structure. FS algorithms identify descriptors from a candidate pool (i.e., feature space) built by feature engineering (FE) steps that construct complex features from the system’s fundamental physical properties. Recursive FE, which involves repeated FE operations on the feature space, is necessary to build features with sufficient complexity to capture the physical behavior of a system. However, this approach creates a highly correlated feature space that contains millions or billions of candidate features. Such feature spaces are computationally demanding to process using traditional FS approaches that often struggle with strong collinearity. Herein, we address this shortcoming by developing a new method that interleaves the FE and FS steps to progressively build and select powerful descriptors with reduced computational demand. We call this method iterative Bayesian additive regression trees (iBART), as it iterates between FE with unary/binary operators and FS with Bayesian additive regression trees (BART). The capabilities of iBART are illustrated by extracting descriptors for predicting metal–support interactions in catalysis, which we compare to those predicted in our previous work using other state-of-the-art FS methods (i.e., least absolute shrinkage and selection operator + l0, sure independence screening and sparsifying operator, and Bayesian FS). iBART matches the performance of these methods yet uses a fraction of the computational resources because it generates a maximum feature space of size O(102), as opposed to O(106) generated by one-shot FE/FS methods.
Skip Nav Destination
Article navigation
28 April 2022
Research Article|
April 27 2022
A rapid feature selection method for catalyst design: Iterative Bayesian additive regression trees (iBART)
Special Collection:
Chemical Design by Artificial Intelligence
Chun-Yen Liu
;
Chun-Yen Liu
1
Department of Chemical and Biomolecular Engineering
, Rice University, Houston, Texas 77005, USA
Search for other works by this author on:
Shengbin Ye
;
Shengbin Ye
2
Department of Statistics, Rice University
, Houston, Texas 77005, USA
Search for other works by this author on:
Meng Li
;
Meng Li
a)
2
Department of Statistics, Rice University
, Houston, Texas 77005, USA
Search for other works by this author on:
Thomas P. Senftle
Thomas P. Senftle
a)
1
Department of Chemical and Biomolecular Engineering
, Rice University, Houston, Texas 77005, USA
Search for other works by this author on:
Note: This paper is part of the JCP Special Topic on Chemical Design by Artificial Intelligence.
J. Chem. Phys. 156, 164105 (2022)
Article history
Received:
March 02 2022
Accepted:
April 04 2022
Citation
Chun-Yen Liu, Shengbin Ye, Meng Li, Thomas P. Senftle; A rapid feature selection method for catalyst design: Iterative Bayesian additive regression trees (iBART). J. Chem. Phys. 28 April 2022; 156 (16): 164105. https://doi.org/10.1063/5.0090055
Download citation file:
Sign in
Don't already have an account? Register
Sign In
You could not be signed in. Please check your credentials and make sure you have an active account and try again.
Sign in via your Institution
Sign in via your InstitutionPay-Per-View Access
$40.00