Autoencoder-based reduced-order modeling (ROM) has recently attracted significant attention, owing to its ability to capture underlying nonlinear features. However, two critical drawbacks severely undermine its scalability to various physical applications: entangled and therefore uninterpretable latent variables (LVs) and the blindfold determination of latent space dimension. In this regard, this study proposes the physics-aware ROM using only interpretable and information-intensive LVs extracted by *β*-variational autoencoder, which are referred to as physics-aware LVs throughout this paper. To extract these LVs, their independence and information intensity are quantitatively scrutinized in a two-dimensional transonic flow benchmark problem. Then, the physical meanings of the physics-aware LVs are thoroughly investigated and we confirmed that with appropriate hyperparameter *β*, they actually correspond to the generating factors of the training dataset, Mach number, and angle of attack. To the best of our knowledge, our work is the first to practically confirm that *β*-variational autoencoder can automatically extract the physical generating factors in the field of applied physics. Finally, physics-aware ROM, which utilizes only physics-aware LVs, is compared with conventional ROMs, and its validity and efficiency are successfully verified.

## I. INTRODUCTION

Computational fluid dynamics (CFD) has been widely applied in numerous engineering disciplines. However, high-fidelity CFD simulations are often computationally intensive due to the fine discretization of space and time domains. A practical approach to reduce its computational burden is using a regression model as a cost-efficient alternative to time-consuming flow analysis.^{1} Such regression models have been developed extensively, to name a few, cubic spline interpolation,^{2} the radial basis function,^{3} and Gaussian process regression (GPR);^{4,5} however, these models are only suitable for estimating scalar quantities such as lift coefficients of the airfoil. Consequently, when the quantities of interest are high-dimensional vectors, such as pressure or the velocity field, the computational complexity becomes prohibitive owing to “the curse of dimensionality.”

Data-driven reduced-order modeling (ROM) has recently attracted attention for its potential to deal with this problem. It treats a high-fidelity computer simulation as a “black-box function” and generates simplified models in a data-driven manner without any modification of the governing equation. The main purpose of this approach is to reduce the degrees of freedom of the data using the dimensionality reduction (DR) technique (also known as representation learning or manifold learning), which finds the low-dimensional latent representation of high-dimensional original data. Through this technique, the dimensionality of the data can be significantly reduced to a level that is suitable for training the regression models effectively.

One of the most widely used DR techniques is proper orthogonal decomposition (POD),^{6} which is also referred to as principal component analysis or the Karhunen–Loève theorem. Given the training data, POD extracts a set of orthogonal bases (also referred to as modes) that maximizes the variance of the projected data. Extracted modes are ranked by their energy content so that the dimensionality can be reduced by truncating non-dominant modes. Specifically, the original high-dimensional data can be represented in a low-dimensional subspace using linear combinations of the dominant modes. However, its linearity makes POD-based ROM suffer from performance degradation in nonlinear problems,^{7–10} requiring an excessive number of modes compared with nonlinear DR methods for the same reconstruction accuracy.

In this regard, deep neural networks (DNNs) have become an alternative to POD for their performance on highly nonlinear problems. The most commonly used DNN-based DR technique is the autoencoder (AE), which learns the low-dimensional latent space of the original data in an unsupervised manner.^{11} The nonlinear functions between its multiple hidden layers enable it to model nonlinearity, and in this context, Hinton *et al.* referred to AE as a nonlinear generalization of POD.^{12} Owing to this property, numerous ROM studies have recently adopted AE as DR model for highly nonlinear problems,^{13–22} and they confirmed that AE-based ROM shows higher reconstruction accuracy than the POD-based ROM in various nonlinear problems.^{13–15} Moreover, the flow field reconstruction using AE can be further improved by applying convolutional neural networks (CNNs), which were introduced to train grid-pattern data efficiently.^{16–22}

However, there are several issues to be addressed in AE-based ROM. First, AE does not guarantee disentangled latent representation since its training algorithm only aims to reduce the reconstruction error without considering the regularity in latent space. Its irregularity allows multiple features to be entangled in a single latent variable (LV),^{23} making it difficult to interpret the physical meanings of LVs. Second, since the model architecture of AE should be predetermined before training, its latent dimension needs to be blindly selected large enough to contain the entire information (otherwise, a trial-and-error process should be performed). In POD, it is possible to truncate trivial LVs since it provides the energy contents of LVs (or modes). However, in AE, its algorithm does not treat LVs hierarchically according to their information intensity so that theoretically, the entire information is evenly distributed to each LV. Therefore, all dimensions blindly selected should be used, as truncation of even only one LV can severely compromise its reconstruction accuracy. As a result, the resultant redundancy of the latent dimension will degrade both the physical interpretability of LVs (too many LVs are entangled) and the efficiency of AE-based ROM (too many regression models should be trained).

The introduction of a variational autoencoder (VAE) and its improved understanding has been providing a clue to these problems. While AE only minimizes reconstruction error, VAE also regularizes latent space by minimizing Kullback–Leibler (KL) -divergence term.^{24} The balance between these two terms in VAE can be adjusted by the hyperparameter *β*, and the corresponding model is referred to as *β*-VAE.^{23} It is known that by weighting the KL-divergence term in *β*-VAE, two notable effects can be achieved. First, it learns disentangled latent representations: a single LV encodes only one representation feature of the original dataset and therefore becomes interpretable. This ensures independence between each LV as the POD extracts orthogonal bases. Second, the regularization loss (KL-divergence) encourages learning the most efficient latent representation.^{23,25,26} Therefore, regardless of the predetermined latent dimension, only necessary LVs are automatically activated to contain the essential information.^{27} In summary, unlike AE model, *β*-VAE enables its latent space to be information-intensive without being entangled, indicating its potential to further improve the performance of AE-based ROM.

However, there are few studies related to *β*-VAE in the field of fluid dynamics. Eivazi *et al.*^{27} adopted *β*-VAE to extract interpretable nonlinear modes for time-dependent turbulence flows and proposed a method to rank LVs by measuring their energies. However, their ranking method requires separate post-processing of the reconstructed data after model training, and the energies of LVs are indirectly calculated in the output space rather than in the latent space. Accordingly, the approach is unrelated to KL-divergence, which is the main cause of information discrepancies between LVs. Though they finally concluded that their framework successfully extracted interpretable features of turbulent flows, it lacks objectivity since it was based on the visual inspection of the flow fields. Last but not least, their research ended up with nonlinear mode decomposition without showing how interpretable LVs can be efficiently utilized for the ROM process. Indeed, Wang *et al.*^{17} already have applied *β*-VAE to ROM for transonic flow problems and verified its superior performance over POD. Though they extended *β*-VAE to practical application, they only focused on the implementation of *β*-VAE for ROM so that the interpretability of extracted features (main purpose of *β*-VAE) and their effects on ROM performance were not addressed and referred to as their future work.

This study aims to utilize physically interpretable and information-intensive LVs obtained by *β*-VAE, which are referred to as “physics-aware LVs,” for the efficient ROM process. For this purpose, a two-dimensional (2D) transonic flow problem is adopted as benchmark case, and the independence and information intensity of LVs are investigated first to confirm whether they are physics-aware. Herein, it was practically confirmed for the first time in the field of applied physics that *β*-VAE can automatically extract the physical generating factors. Then, “physics-aware ROM,” which utilizes only physics-aware LVs, is proposed for its efficiency in that the number of required regression models can be reduced significantly. The presented physics-aware ROM is compared with conventional ROMs, and finally, we successfully verify its validity and efficiency.

The rest of this paper is organized as follows. In Sec. II, the DNN-based DR techniques are described in detail, and in Sec. III, the physics-aware ROM is newly proposed. In Sec. IV, the setup of the numerical experiment is presented. In Sec. V, the process of extracting physics-aware LVs and the discovery of their actual physical meanings are described, and the effectiveness of physics-aware ROM based on them is investigated. Finally, in Sec. VI, the conclusion of this study and future work are presented.

## II. DIMENSIONALITY REDUCTION TECHNIQUES

### A. Autoencoder (AE)

The AE model is a widely used deep learning-based DR technique. The main objective of the AE is to output exactly what is inputted. Its structure consists of two parts: an encoder and a decoder. The input of the AE, **x**, is entered into the encoder for the compression and exits as LVs, **z**. Then, **z** enters the decoder and exits as reconstructed data $x\u0303$. The encoder and decoder consist of a multi-layer perceptron (MLP), which makes it possible to model nonlinearity in the reduction and reconstruction processes. Because the objective of training the AE model is to reconstruct $x\u0303$ similar to the original data **x**, the loss function is defined by Eq. (1), where *N* denotes the number of data samples. Although its loss function is defined by the mean square error (MSE) in this study, any other error metrics, such as the binary cross entropy between **x** and $x\u0303$, can be used depending on the properties of the data. Note that the adopted MSE is the mean operation over sample-wise, and the summation over element-wise (pixel-wise) directions

However, this AE model has an obvious limitation: there is no training algorithm for guaranteeing a regularized latent space. Herein, the expression “regularized latent space” means that the latent space is trained to be smooth and continuous; thus, when inputs $x1$ and $x2$ are similar (or close), their corresponding mapped latent space, $z1$ and $z2$, should also be similar.^{28} In AE model, as the original input **x** becomes condensed layer-by-layer through the encoder, its representation becomes increasingly abstract.^{29} Therefore, the output of the encoder **z** is not guaranteed to be regularized in the training procedure of the AE. This explains why the decoder part of the AE model cannot be used as a generative model: the trained latent space is irregular, and therefore, its correlation with the reconstructed data is abstract.

### B. Variational autoencoder (VAE)

Many studies have focused on the limitations of this unregularized latent space trained by an AE and have alternatively applied VAE in their framework. The structure of the VAE is similar to that of the AE. One major difference is that VAE stochastically extracts the latent space **z** via random sampling, whereas the AE model obtains its latent space deterministically.

The mathematical formulas for the VAE model are presented below (for further details, please refer to these Refs. 24 and 30). We consider reducing the original dataset $X={xi}Ni=1$ (assumed to be independently and identically distributed) to the latent space **z** using the VAE model. In the inference of **z** from **X**, variational inference is adopted due to the intractability of the posterior $p\theta (z|x)$, where $\theta $ is a parameter of the VAE model (to be more specific, it is intractable owing to the likelihood of $p\theta (z)$). Therefore, instead of the intractable posterior $p\theta (z|x)$, the VAE is trained to obtain its substitute, $q\varphi (z|x)$ ($\varphi $ is a variational parameter). Then, the log-likelihood of $p\theta (x)$ can be expressed as follows:

where the second term on the right-hand side (RHS) is the KL-divergence of $q\varphi (z|x)$ from $p\theta (z|x)$, $KL(q\varphi (z|x)||p\theta (z|x))$, which is always non-negative according to its definition (KL-divergence is a measurement of the statistical distance between two probability distributions). Therefore, the first term on the RHS becomes the lower bound of the log-likelihood and the problem of maximizing the log-likelihood becomes the problem of maximizing the lower bound. This lower bound can be expressed as follows:

where the first and second terms on the RHS are the reconstruction error and KL-divergence of $q\varphi (z|x)$ from $p\theta (z)$, respectively. However, owing to the existence of $q\varphi (z|x)$ in the reconstruction error, the backpropagation process cannot be performed, and calculating the gradient of the reconstruction error with respect to $\varphi $ is problematic due to the posterior $q\varphi (z|x)$. Therefore, a “reparameterization trick” is adopted, which allows for backpropagation during the sampling process. The concept behind this trick is to sample **z** from the auxiliary noise variable $\u03f5\u223cN(0,12)$: this random sampling makes the latent space in the VAE stochastically determined. To be more specific, the $kth$ LV (*z _{k}*) is assumed to follow the distribution below:

where $\u2299$ denotes Hadamard product (element-wise product). Accordingly, the KL-divergence, the second term on the RHS in Eq. (3), can be rewritten as below when the posterior $q\varphi (z|x)$ and prior $p\theta (z)$ are assumed to follow the Gaussian distribution $N(\mu ,\sigma 2)$ and $N(0,I2)$, respectively,

where *μ _{k}* and

*σ*represent the mean and standard deviation used during the reparameterization of

_{k}*z*, and

_{k}*d*denotes the dimension of the latent space. Herein, as the posterior approximation $q\varphi (z|x)$ approaches the prior $p\theta (z)$, the KL-divergence decreases. Finally, the loss function of VAE model can be formulated as in Eq. (6), which consists of the reconstruction error (MSE term) and regularizer (KL-divergence term). Since the KL-divergence term serves to regularize the latent space to be trained, it is also called regularization loss. It induces a sparser latent space

^{23,24,31,32}just as the L1 regularization term makes the model sparse in the Lasso regression

^{33}

### C. *β*-variational autoencoder (*β*-VAE)

Higgins *et al.*^{23} focused on the trade-off relationship between the reconstruction error and KL-divergence in the loss function of the VAE. As the reconstruction accuracy increases, the degree of regularization (or information capacity) in the latent space decreases. To tune the balance between these two performances, Higgins *et al.*^{23} proposed the *β*-VAE, which can control the relative importance of the KL-divergence term using the adjustable hyperparameter *β*. It is a simple modification of the VAE: they have exactly the same structures but slightly different loss functions. The loss function of the *β*-VAE is as follows:

The only difference in the loss functions between the VAE and *β*-VAE is whether the KL-divergence term is weighted by hyperparameter *β*. By introducing this hyperparameter, Higgins *et al.*^{23} achieved quantitative and qualitative improvements in the disentanglement within the latent representations over the traditional VAE model. Finally, they concluded that owing to the disentangling performance of *β*-VAE, the interpretable representations of the independent generating factors of the given dataset can be discovered automatically. In this paper, the terminologies “disentanglement,” “interpretability,” “independence,” and “orthogonality” are used interchangeably to refer to the following property of LVs: single LV stands for a single representation feature without the intervention of other LVs. However, what makes *β*-VAE special is not only in its disentangling performance. The regularization loss (KL-divergence) is known to have sparsification effect to encourage the most efficient representation learning.^{23} Burda *et al.* and Sønderby *et al.* practically confirmed this effect in that some LVs become inactive during the training process of VAE.^{25,34} In particular, Burda *et al.* confirmed that inactive LVs have a negligible effect on the reconstruction. This finding indicates that VAE trains its LVs in an efficient manner by selectively activating LVs to contain the only necessary information. Since *β*-VAE model also has regularization term in the loss function and even weighted by the hyperparameter *β*, it can be easily inferred that the sparsification effect in the *β*-VAE model will become more dominant as *β* increases. Taking these two outstanding advantages of *β*-VAE, disentangled and information-intensive latent space, a novel ROM framework is proposed in Sec. III.

## III. PHYSICS-AWARE REDUCED-ORDER MODELING

The main goal of ROM is to predict the high-dimensional data with low computational cost. As described in Sec. II, the high-dimensional data can be effectively reconstructed by the low-dimensional LVs through DR techniques. In this context, ROM aims to predict high-dimensional data efficiently by predicting these LVs from input parameters using regression models. Overall structure of ROM is illustrated in Fig. 1. In the reconstruction process (solid arrows), the high-dimensional data are encoded into LVs and reconstructed back into high-dimensional data by the decoder. In the prediction process (dashed arrows), LVs are first predicted from input parameters through regression models (GPR^{4,5} is adopted for regression in this study), and then, predicted LVs are decoded into high-dimensional data. Since the high-dimensional data can be repeatedly predicted from input parameters at a very low computational cost, this prediction process is often referred to as the online phase.

Since LVs act as intermediaries in ROM for predicting high-dimensional data, they inevitably affect ROM performance. In this regard, AE model has two critical drawbacks to be applied to ROM. First, it trains the entangled and therefore uninterpretable latent space. Second, its latent dimension needs to be blindly selected large enough since the model architecture should be predetermined before the training. Therefore, the AE-based ROM becomes physically uninterpretable and inefficient due to the excessive number of entangled LVs and the regression models to be trained. This study newly proposes physics-aware ROM via *β*-VAE to deal with these issues, considering the following characteristics of *β*-VAE. As stated in Sec. II C, the latent space in *β*-VAE is trained to be disentangled and information-intensive. When LVs actually satisfy these two properties simultaneously, they can be regarded as the latent representations, which contain both interpretable and necessary information. In that this study uses physical dataset, LVs will correspondingly contain physical information, and accordingly, we refer to those LVs as “physics-aware LVs.” To ease the understanding of physics-aware LVs, the ideal schematic of their extraction with *β*-VAE is shown in Fig. 2: when the dataset is generated through Mach number (*Ma*) and angle of attack (*AoA*), the ideally extracted physics-aware LVs will be *Ma* and *AoA*. Finally, the ROM only utilizing these physics-aware LVs (which refers to “physics-aware ROM”) is proposed for its efficiency that the number of regression models required in the prediction process can be significantly reduced.

In order to extract physics-aware LVs, we first estimate the independence of LVs. In this regard, Eivazi *et al.*^{27} have already measured it using a Pearson correlation matrix and the same approach is applied herein. Then, the information intensity of LVs is quantified. Similar attempt has been made by Eivazi *et al.*,^{27} who proposed a strategy to rank LVs by energy percentage, which quantifies the contribution of each LV to the reconstruction quality. However, they ranked LVs not in the latent space, which is directly related to them, but in the output space so that there exist two problems. First, its low practicality due to cumbersome post-processing: flow fields should be reconstructed using a forward selection of LVs, and then, energy percentage is calculated from them. Second, since this method is irrelevant to KL-divergence, which is the main cause of the inactiveness of LVs,^{25,34} it cannot be regarded as a fundamental approach for ranking LVs based on their amount of information. Therefore, another ranking criterion should be proposed that estimates the information intensity of each LV without such burdensome post-processing. For this purpose, we propose to apply the KL-divergence [Eq. (5)], the immediate cause for the sparser latent space due to its regularization effect. Since the calculation of KL-divergence is performed directly in the latent space, the decoder part of *β*-VAE does not even need to be utilized (no reconstruction required), solving all the limitations of the previous ranking approach.

In physics-aware ROM framework, only physics-aware LVs are utilized so that the number of regression models required in the prediction process can be significantly reduced. For example, let’s suppose that AE-based ROM and *β*-VAE-based ROM are performed where both AE and *β*-VAE have the latent dimension of 16. In the case of AE-based ROM, 16 regression models should be trained because exclusion of just one LV can degrade ROM performance significantly in that all 16 LVs contain information in an entangled manner. However, in *β*-VAE-based ROM, since physics-unaware LVs are judged not to contain any meaningful information, it is sufficient to utilize only the regression models of physics-aware LVs. The overall procedure of physics-aware ROM is summarized in Algorithm 1.

(1) Prepare the training dataset generated by physical parameters. |

(2) Train β-VAE with dataset in (1). |

(3) Extract physics-aware LVs through estimating their independence by correlation coefficient and information intensity by KL-divergence. |

(4) Train regression models to predict physics-aware LVs from physical parameters in (1). |

(5) Predict the high-dimensional data from physical parameters using regression models trained in (4) and decoder part of β-VAE trained in (2). During this process, physics-unaware LVs are deactivated by being fixed to their estimated mean values. |

(1) Prepare the training dataset generated by physical parameters. |

(2) Train β-VAE with dataset in (1). |

(3) Extract physics-aware LVs through estimating their independence by correlation coefficient and information intensity by KL-divergence. |

(4) Train regression models to predict physics-aware LVs from physical parameters in (1). |

(5) Predict the high-dimensional data from physical parameters using regression models trained in (4) and decoder part of β-VAE trained in (2). During this process, physics-unaware LVs are deactivated by being fixed to their estimated mean values. |

## IV. NUMERICAL EXPERIMENTS

### A. Data preparation

A 2D transonic flow problem, the benchmark case that is widely used in previous ROM studies^{8–10,17} is adopted for the validation of the proposed physics-aware ROM. The transonic flow field is generated by the KFLOW finite-volume-based CFD solver.^{35,36} Specifically, the Reynolds-averaged Navier–Stokes equation is solved coupled with the Spalart–Allmaras turbulence model. RAE 2822 airfoil is selected as the baseline geometry, and a structured O-grid with a shape of 512 × 256 (wall-tangential direction × wall-normal direction) is generated; the corresponding grid is shown in Fig. 3. The Reynolds number is fixed at $6.5\xd7106$, and two physical parameters are chosen: *Ma* and *AoA*. The design space of each variable is set to [0.5, 0.8] and [$0\xb0,\u20093\xb0$], respectively. Latin hypercube design is used to generate 500 sample points in 2D parameter space. A flow analysis of these samples is then conducted, and the resultant velocity and pressure fields are normalized by their far-field conditions. As the variations of flow properties far from the airfoil are negligible, only the inner half of the entire grid with respect to the normal direction from the airfoil is used so that the resultant grid becomes 512 × 128. Consequently, a total dataset with a shape of $500\xd73\xd7512\xd7128$ (dataset size × flow field components × wall-tangential direction grids × wall-normal direction grids) is obtained, where the flow-field components are the x-velocity, y-velocity, and pressure. Finally, the dataset size of 500 is split into a ratio of 9:1 so that the size of the training dataset is 450, and that of the test dataset is 50.

### B. Training details

In this study, all the models mentioned in Sec. II, AE, VAE, and *β*-VAE, are trained to investigate their differences in terms of ROM for transonic flow. In particular, several *β*-VAE models are trained ($\beta \u2208[10,20,30,40,50,100,150,200,500,750,1000,2000,3000,4000]$) to investigate the effects of the *β* value (technically speaking, *β*-VAE can be regarded as the VAE when *β* has a value of 1). For all models, the dimension of the latent space should be determined blindly before the model training. The selected dimensions should be sufficient for encoding the training data generated from the 2D parameter domain (*Ma* and *AoA*). In this regard, we adopted the approach suggested by Wang *et al.*, which infers the latent dimension of *β*-VAE considering the accuracy of the POD with the same dimension^{17} (their assumption was that POD requires much more dimensions than *β*-VAE for the equivalent reconstruction accuracy, which was also proved in their paper). Finally, dimension of the latent space is determined to be 16 since it conserves 99.18% in terms of energy contents of POD, which is judged to be sufficient.

An appropriate encoder/decoder structure should be selected to effectively compress/reconstruct the data, from dimensions of 3 × 512 × 128 to 16 and vice versa. To determine suitable structures, hyperparameter tuning based on a grid search was conducted. Finally, it was confirmed that the MSE reconstruction error, which represents the overall accuracy of the trained model, does not strongly depend on whether batch normalization, max-pooling, and up-sampling are applied. However, their application significantly affects the smoothness of the reconstructed flow field (it can be inferred that this is due to the effects of batch normalization and max-pooling that prevent overfitting, and the interpolation effect of up-sampling). If an artificial discontinuity (rather than a discontinuity that reflects a physical phenomenon, such as a shock wave) is observed in the reconstructed flow field, the flow field cannot be considered realistic. Finally, the architectures of the selected models, which are considered to be sufficient in terms of MSE error and the smoothness of the reconstructed flow field, are shown in Tables I and II and Fig. 4. The only difference between the structure of the AE and VAE/*β*-VAE is the bottleneck, and the structures of the VAE and *β*-VAE are exactly the same.

Name . | Layer type . | Filter . | Kernel . | Stride . | Activation . | Batch Norm. . |
---|---|---|---|---|---|---|

Up | Max-polling | ⋯ | 2 × 2 | ⋯ | ⋯ | ⋯ |

Convolution | 64 | 3 × 3 | 1 | LeakyReLU | o | |

Down | Upsampling | ⋯ | 2 × 2 | ⋯ | ⋯ | ⋯ |

Convolution | 64 | 3 × 3 | 1 | LeakyReLU | o | |

Conv1_in | Convolution | 64 | 1 × 1 | 1 | LeakyReLU | o |

Conv1_out | Convolution | 3 | 1 × 1 | 1 | ⋯ | ⋯ |

FC | Fully Connected | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ |

Name . | Layer type . | Filter . | Kernel . | Stride . | Activation . | Batch Norm. . |
---|---|---|---|---|---|---|

Up | Max-polling | ⋯ | 2 × 2 | ⋯ | ⋯ | ⋯ |

Convolution | 64 | 3 × 3 | 1 | LeakyReLU | o | |

Down | Upsampling | ⋯ | 2 × 2 | ⋯ | ⋯ | ⋯ |

Convolution | 64 | 3 × 3 | 1 | LeakyReLU | o | |

Conv1_in | Convolution | 64 | 1 × 1 | 1 | LeakyReLU | o |

Conv1_out | Convolution | 3 | 1 × 1 | 1 | ⋯ | ⋯ |

FC | Fully Connected | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ |

Encoder . | Bottleneck . | Decoder . | |||
---|---|---|---|---|---|

Layer . | Output size . | Layer . | Output size . | Layer . | Output size . |

Input | 3 × 512 × 128 | Flatten | 256 | Up1 | 64 × 8 × 2 |

Conv1_in | 64 × 512 × 128 | FC1: μ | 16 | Up2 | 64 × 16 × 4 |

Down1 | 64 × 256 × 64 | FC2: σ | 16 | Up3 | 64 × 32 × 8 |

Down2 | 64 × 128 × 32 | Resampling | 16 | Up4 | 64 × 64 × 16 |

Down3 | 64 × 64 × 16 | FC3 | 256 | Up5 | 64 × 128 × 32 |

Down4 | 64 × 32 × 8 | Unflatten | 64 × 4 × 1 | Up6 | 64 × 256 × 64 |

Down5 | 64 × 16 × 4 | … | … | Up7 | 64 × 512 × 128 |

Down6 | 64 × 8 × 2 | … | … | Conv1_out | 3 × 512 × 128 |

Down7 | 64 × 4 × 1 | … | … | Output | 3 × 512 × 128 |

Encoder . | Bottleneck . | Decoder . | |||
---|---|---|---|---|---|

Layer . | Output size . | Layer . | Output size . | Layer . | Output size . |

Input | 3 × 512 × 128 | Flatten | 256 | Up1 | 64 × 8 × 2 |

Conv1_in | 64 × 512 × 128 | FC1: μ | 16 | Up2 | 64 × 16 × 4 |

Down1 | 64 × 256 × 64 | FC2: σ | 16 | Up3 | 64 × 32 × 8 |

Down2 | 64 × 128 × 32 | Resampling | 16 | Up4 | 64 × 64 × 16 |

Down3 | 64 × 64 × 16 | FC3 | 256 | Up5 | 64 × 128 × 32 |

Down4 | 64 × 32 × 8 | Unflatten | 64 × 4 × 1 | Up6 | 64 × 256 × 64 |

Down5 | 64 × 16 × 4 | … | … | Up7 | 64 × 512 × 128 |

Down6 | 64 × 8 × 2 | … | … | Conv1_out | 3 × 512 × 128 |

Down7 | 64 × 4 × 1 | … | … | Output | 3 × 512 × 128 |

The Adam optimizer is adopted to train selected models. The initial learning rate is set to $10\u22123$, and it decays at a rate of 0.1 every 1000 epochs. The maximum epoch should be selected carefully because a model that does not fully converge can make significantly different predictions from a converged model.^{37} The maximum number of epochs is set to 3000 since it is verified to guarantee sufficient convergence in terms of the loss function. A mini-batch size of 50 is selected so that nine iterations are performed per epoch (as the size of the training dataset is 450). Using a Tesla P100-PCIE-16GB GPU with Pytorch deep learning library,^{38} the average training time for each model is calculated to be approximately 3 h.

## V. RESULT AND DISCUSSION

In this section, the results of physics-aware ROM and physics-unaware ROM are presented in the following order. First, training results of AE/VAE/*β*-VAE are demonstrated to investigate the flow reconstruction accuracy. Next, the methods to measure the independence and information intensity of LVs are presented to determine whether the LVs obtained by AE/VAE/*β*-VAE are physics-aware LVs. At the same time, the validity of the KL-divergence ranking method to measure the information intensity of LVs is confirmed both quantitatively and qualitatively. Then, it is thoroughly investigated with various techniques whether the physics-aware LVs actually have interpretable physical features. In the end, it is verified that the physics-aware ROM based on these physics-aware LVs has equivalent prediction accuracy much more efficiently than physics-unaware ROM.

### A. Training results

First, the resultant loss functions of the trained models are shown in Fig. 5. More specifically, the loss function is decomposed into the MSE and KL-divergence [as in Eq. (7)] to confirm their trade-off relationship. The MSE and KL-divergence of the VAE/*β*-VAE models are indicated by blue and red symbols, respectively. The *β* and KL-divergence values cannot be defined in the AE model; therefore, only its MSE loss is indicated separately by a dashed line. Herein, a clear trade-off relationship between MSE and KL-divergence can be confirmed, as mentioned by Higgins *et al.*^{23} As *β* increases, KL-divergence decreases, while MSE increases. This is consistent with the concept of the *β*-VAE, which suppresses KL-divergence by increasing *β*. In particular, when $\beta >1000$, the MSE increases significantly so that an accurate reconstruction of the flow field is no longer possible.

Second, the reconstructed pressure flow fields of the three test cases (which are not used during the training process) are shown in Fig. 6 to compare the trained models more intuitively and visually. The selected test cases are as follows: test case 1 is in the absence of a shock wave (*Ma *=* *0.61 and $AoA=2.12\xb0$), test case 2 is in the presence of a weak shock wave (*Ma *=* *0.72 and $AoA=1.68\xb0$), and test case 3 is in the presence of a strong shock wave (*Ma *=* *0.78 and $AoA=1.57\xb0$). Five models are compared, including the AE, VAE, 30-VAE (*β*-VAE with *β* = 30), 100-VAE (*β* = 100), and 1000-VAE (*β* = 1000). In Fig. 6, it can be confirmed that the reconstructed pressure contours for all three cases and all models do not exhibit any significant difference, indicating all models have been trained to reconstruct the accurate flow fields.

### B. Independence of LVs

This section is to confirm whether LVs of *β*-VAE are actually trained to be disentangled from each other so that they are interpretable. In Fig. 7, the absolute values of the components in the Pearson correlation matrix are shown from AE to 1000-VAE. Since there is no algorithm in AE model to promote the independence of LVs, it has the largest values. However, in *β*-VAEs, their values decrease as *β* increases, which indicates that the LVs gradually become independent of each other. These results practically prove that the KL-divergence actually encourages the independence of each LV. To measure the degree of correlation within the entire set of LVs, Eivazi *et al.* computed the determinant of the correlation matrix (when the determinant is 0/1, it indicates that these variables are completely correlated/uncorrelated).^{27} However, given the fact that the KL-divergence term leads to a sparser latent space, which will be discussed in detail in Sec. V C, examining the determinant of the whole matrix size of 16 × 16 is contradictory. Therefore, the independence of various LV combinations is further analyzed. First, all possible combinations of two to seven LVs are obtained. Then, the determinants of these combinations are calculated and their statistics are summarized in Fig. 8. For any number of LVs used in a combination, the *β*-VAE models have significantly higher determinants than the AE and VAE models. In summary, the results presented in Figs. 7 and 8 consistently show that the *β*-VAE successfully learns uncorrelated LVs compared with AE and VAE models, owing to the KL-divergence, which forces the LVs to be independent.

### C. Information intensity of LVs

It was verified in Sec. V B that the LVs in *β*-VAE are trained to contain disentangled information. In this section, we tried to rank such disentangled LVs according to their information intensity using KL-divergence. Figure 9 shows the KL-divergence of each LV in AE/VAE/*β*-VAE. It can be confirmed that both the AE and VAE models do not have any inactive LVs. On the contrary, notable trends are observed in 30-VAE, 100-VAE, and 1000-VAE models: each model has only four (LV index 6, 8, 11, and 15), three (index 8, 11, and 15), and two (index 8 and 11) activated LVs. These results are consistent with those of Eivazi *et al.* in that the number of activated LVs decreases as beta increases in the *β*-VAE model.^{27} In fact, a similar approach was applied by Sønderby *et al.*, who judged whether the LV is activated via the KL-divergence.^{34} However, the relationship between the activeness of the LV and KL-divergence was not described clearly. Therefore, we also attempt to clarify it: further investigations are conducted to check whether the LV judged to be more active in terms of the KL-divergence actually has a greater effect on the reconstructed flow field. Sobol sensitivity analysis is utilized for this analysis.^{39,40} A total 18,432 latent vectors are sampled using the Saltelli sampler in the range of $[\mu \u0302k\u22122\sigma \u0302k,\mu \u0302k+2\sigma \u0302k]$, where $\mu \u0302k$ and $\sigma \u0302k$ each represents estimated mean and standard deviation of $kth$ LV with respect to the training dataset. The corresponding reconstructed flow fields are obtained through the decoder parts of the AE/VAE/*β*-VAE models. Since Sobol analysis should be conducted on the scalar output, a set of 3 × 512 × 128 pixels of the flow fields is converted to a scalar value. The sum of all pixel components is used in this study, but any scalar value representing the main characteristics of the flow field can be used. Figure 9 shows the calculated first-order Sobol indices. It can be confirmed that the higher the KL-divergence, the larger the value of the Sobol indices in the VAE/*β*-VAE models (because KL-divergence has nothing to do with the algorithm of the AE model, it does not exhibit a clear trend with the Sobol indices).

Additionally, since the standard deviation represents the dispersion of the variable, which can be regarded as its activeness, $\sigma \u0302k$ values are also investigated. When $\sigma \u0302k$ is low for a specific LV, it can be understood that the corresponding LV remains inactive (or less dispersed) in the latent space during training: a situation where the value of the LV does not change even if the input data changes. In Fig. 10, it can be confirmed again that both AE and VAE models do not have any inactive LVs in that all $\sigma \u0302k$ values are larger than 0.5. However, the most interesting point is that compared with Fig. 9, LVs with high KL-divergence also have high $\sigma \u0302k$ in 30-VAE, 100-VAE, and 1000-VAE models: the LV indices of the activated LVs in terms of KL-divergence exactly match those of $\sigma \u0302k$. To sum up, we suggest the use of KL-divergence as a criterion for ranking the activeness (information intensity) of LVs for the two reasons. First, it is a regularization loss, which is the direct cause of the information discrepancies between LVs. Second, it does not require any cumbersome post-processing. The justification of this decision-making process is successfully performed through intuitive but quantitative investigations by comparing its ranking with the Sobol indices (Fig. 9) and estimated standard deviations from the training dataset (Fig. 10).

So far, using proposed KL-divergence criterion, the inactiveness within the latent space has been investigated in a quantitative manner, but it does not provide straightforward information on what these activated/inactivated variables actually contain. Accordingly, each LV is visualized via a traversal of itself, which is the most widely adopted approach for this purpose:^{23,30} it shows the traversal of a single LV, while other LVs remain fixed so that one can visually understand the features of a specific LV without the intervention of other variables. In this study, a latent traversal plot is applied to identify the physical features of the flow field contained in each LV. Figure 11 shows the pressure flow fields for two extreme LVs: one is the most dominant LV, which is ranked first by KL-divergence, and the other is the most trivial LV, which is ranked last. In AE, the pressure field changes abruptly as the most dominant LV changes; when the most trivial LV changes, the field changes gradually, but not as significant as that of the most dominant LV. Since the sparsification effect does not occur in AE due to the absence of KL-divergence term in the loss function, all the variables are activated and therefore contain uninterpretable (or entangled) physical features. However, in VAE and 1000-VAE models, the most trivial LVs cause indistinguishable variations in the flow fields. This is because the KL-divergence forces inactiveness in the latent space, so only necessary LVs become activated to contain meaningful physical features: it can be regarded as a virtue of *β*-VAE model. When *β* increases, information is packed into LVs more compactly,^{27} as observed in the traversal of the most dominant LV in the 1000-VAE model. This LV can be interpreted as containing information on the occurrence of the shock wave, and the variation it arouses is the largest compared with other models.

### D. Physics-aware LVs

The two requirements for physics-aware LVs are investigated so far: whether LVs are disentangled (Sec. V B) or information-intensive (Sec. V C). In this section, physics-oriented investigations are performed to figure out the physical meanings each physics-aware LV contains.

First, the relationship between the two physical parameters used in this study [whose distributions are shown in Fig. 12(a)] and the top two ranked LVs by KL-divergence (first LV and second LV) is visually investigated as *β* varies. Accordingly, in Fig. 12(b), the distributions of the training dataset with respect to those two LVs are shown. For the AE, VAE, 100-VAE, and 1000-VAE models, the plots of the first column are colored by *Ma* value, and the second column by *AoA*. Moreover, to confirm that how physical parameter domain can be represented by top two LVs, we also draw the trajectories of the boundary data in Fig. 12(b) [they are extracted from the boundary of the physical parameter space, as in Fig. 12(a)]. In AE, neither *Ma* nor *AoA* have any noticeable trends. By contrast, in VAE, the plot in the first column exhibits a trend that is not clear, but still noticeable: as the first/second LV increases, *AoA*/*Ma* increases. However, the increase in *Ma* cannot be explained by the increase in the second LV alone (the same goes for first LV and *AoA*). These ambiguous correlations between the two LVs and two physical parameters become more clear in 100-VAE (for this case, as the first/second LV increases, *Ma*/*AoA* increases). Though 100-VAE shows much obvious relationship than AE, the physical parameters cannot be fully represented only by two dominant LVs in that its boundary trajectory does not cover the entire training dataset. Considering the fact that there are three physics-aware LVs in 100-VAE, this may be an expected result. However, it should be noted here that the top two variables sorted by KL-divergence almost succeeded in representing the physical parameters, whereas when the same plot is drawn with the first and third dominant LVs, a very irregular pattern is observed. In this regard, the validity of ranking LVs through KL-divergence is confirmed once again. Finally, the 1000-VAE is investigated. Since this model has two physics-aware LVs, in the ideal case one can expect them to correspond to the two physical parameters (this ideal situation was depicted in Fig. 2). This conjecture is actually happening in 1000-VAE: the correlations between two physical parameters and two LVs become clear, and the latent space of the training dataset is perfectly closed by its boundary trajectory. Here, it is the key finding of this study: though *β*-VAE has no information about the physical parameters used to generate the training data, it effectively extracts only two LVs out of total 16 LVs (especially when *β* = 1000), which correspond to actual physical parameters *Ma* and *AoA*. It can be concluded that these marvelous results are owing to the orthogonality effect (makes LVs disentangled) and regularization effect (makes redundant LVs inactive) of the KL-divergence term in *β*-VAE. To make this clear, it should be noted that AE, one of the most widely used nonlinear DR techniques, fails to construct a physics-aware latent space in that it learns all 16 entangled LVs and therefore physically uninterpretable.

As the visual analysis in the previous paragraph is qualitative, a quantitative analysis is performed herein to verify whether *LV _{Ma}* or

*LV*in 1000-VAE corresponds to

_{AoA}*Ma*or

*AoA*(for the sake of brevity,

*LV*refers to the first LV, and

_{Ma}*LV*refers to the second LV, which implies that

_{AoA}*LV*and

_{Ma}*LV*are the LVs responsible for

_{AoA}*Ma*and

*AoA*, respectively). Single-variable linear regression (LR) models are trained for this purpose: since the relationships between physical parameters and LVs appear linear in Fig. 12, the LR model is considered to be sufficient. For example, LR model for

*Ma*is trained with the input variable as

*LV*and the output variable as

_{Ma}*Ma*, which can be expressed as $Ma=f(LVMa)$. The training dataset is the same as in Sec. IV A, and physical parameters and LVs are standardized before training. If

*LV*and

_{Ma}*Ma*(or

*LV*and

_{AoA}*AoA*) have a linear relationship, the fitted LR models will perform well on the test dataset. The results are shown in Fig. 13. In each subplot, the equation of the trained LR model is also included. For both LR models $Ma=f(LVMa)$ and $AoA=f(LVAoA)$, their equations clearly show that $Ma\u2248LVMa$ and $AoA\u2248LVAoA$ in that the coefficients are approximately 1 and the intercepts are 0. Also, the coefficients of determination (

*R*

^{2}) calculated based on the test dataset are 0.950 and 0.969, respectively. It indicates that only one LV is sufficient to accurately represent each physical parameter. In addition, we train two additional LR models with both LVs as input features: one as $Ma=f(LVMa,LVAoA)$ and the other as $AoA=f(LVMa,LVAoA)$. If

*Ma*needs both LVs for its expression, the

*R*

^{2}value of the LR model $Ma=f(LVMa,LVAoA)$ will be significantly higher than that of $Ma=f(LVMa)$. The trained LR models are described as follows:

Herein, there are two notable points. First, the coefficient of *LV _{AoA}* is negligible (approximately 0) compared to that of

*LV*(approximately 1) when modeling

_{Ma}*Ma*, which indicates that

*LV*is redundant variable for representing

_{AoA}*Ma*(same principal applies when modeling

*AoA*). The second interesting point is that the values of

*R*

^{2}do not change compared to those of the single-variable LR models. The

*R*

^{2}of LR model $Ma=f(LVMa,LVAoA)$ only increases 0.001 than $Ma=f(LVMa)$, and $AoA=f(LVMa,LVAoA)$ model has the same

*R*

^{2}as $AoA=f(LVAoA)$. These two points quantitatively suggest that the extracted physics-aware LVs actually correspond to

*Ma*and

*AoA*in a disentangled (or independent) manner.

Since the physical meanings of *LV _{Ma}* and

*LV*are verified both qualitatively and quantitatively, the latent traversal plots of airfoil surface pressure distributions are shown in Fig. 14 to intuitively check their impact on the flow field. In the traversal of

_{AoA}*LV*[Fig. 14(a)], when the LV is in the range of $\mu \u0302k\u22123\sigma \u0302k$ to $\mu \u0302k$, there is no shock wave, but as it increases to $\mu \u0302k+1.5\sigma \u0302k$ and $\mu \u0302k+3\sigma \u0302k$, the occurrence of a shock wave is shown. In the traversal of

_{Ma}*LV*[Fig. 14(b)], a variation in the location and magnitude of the leading-edge suction peak is observed. This visual analysis of the actual effects of

_{AoA}*LV*and

_{Ma}*LV*on the pressure distributions also leads to the consistent conclusion that they correspond to

_{AoA}*Ma*and

*AoA*, respectively.

Although the ability of *β*-VAE to extract physics-aware LVs is first observed with simple physical parameters in this study, it has tremendous potential to be utilized to extract generating factors from any dataset in any disciplinary. The scalability of this framework to the real-world engineering dataset, which is sparse and noisy can be verified in Appendix B.

### E. Physics-aware ROM

To summarize the results so far, physics-aware LVs are extracted by estimating the independence and information intensity of each LVs. These physics-aware LVs are strongly correlated with physical parameters, which are the generating factor of training dataset, significantly increasing the interpretability. Since LVs act as intermediaries in the prediction process of ROM, their impact on the ROM is bound to be enormous. In this regard, this section investigates the validity and efficiency of physics-aware ROM, which utilizes physics-aware LVs for the prediction of high-dimensional data.

Figure 15 shows the MSE of the regression models in ROM, $MSEreg$; it is calculated between true LV and predicted LV by regression model (please refer to Fig. 1 for more details about regression models in ROM). In that 16 regression models are required due to 16 LVs, each point represents the MSE of each model, and the symbol o/x indicates whether each LV is physics-aware or physics-unaware. The results show that the $MSEreg$ values of physics-unaware LVs are considerably higher than those of physics-aware LVs. It is because the correlations between physical parameters and physics-unaware LVs are too trivial to be trained by regression models. Figure 16 supports this; the distribution of physics-unaware LV [Fig. 16(b)] is much noisier than that of the physics-aware LV [Fig. 16(a)]. Another interesting point in Fig. 15 is that $MSEreg$ values of the physics-aware LVs decrease as *β* increases. This implies that the tight coupling between physics-aware LVs and physical parameters is advantageous in terms of regression performance in the ROM process.

Then, the MSE between the ground truth flow fields and those predicted by ROM with the exclusion of $kth$ LV is calculated, which is denoted as $MSEpred,\u2212k$ (please note that prediction by ROM means obtaining the flow field from unknown input parameters, as already described in Fig. 1). Specifically, $kth$ LV is assumed to be constant as $\mu \u0302k$, whereas the other LVs are predicted from regression models so that the importance of $kth$ LV can be investigated. Figure 17 demonstrates their results, where the x axis indicates KL-divergence ranking of LVs. An important observation is that the effect of the LV on $MSEpred,\u2212k$ decreases as the LV ranks down. Likewise, the physics-unaware LVs, those ranked after fourth in 30-VAE, after third in 100-VAE, and after second in 1000-VAE, have negligible effects on $MSEpred,\u2212k$: from the fact that the LV ranked higher by KL-divergence has a greater effect on ROM accuracy, it can be concluded that the proposed ranking approach is also valid in terms of ROM performance. This implies that training regression models of physics-unaware LVs are meaningless with respect to the prediction accuracy of ROM. In other words, it is sufficient enough to train regression models of only physics-aware LVs. In this regard, to examine the necessity of each LV in ROM prediction, Fig. 18 visualizes the $MSEpred$ values of physics-aware ROM via *β*-VAE, where all physics-unaware LVs are excluded (e.g., only two LVs are used for ROM via 1000-VAE). For the comparison, those of physics-unaware ROM via AE are also shown: herein, $MSEpred$ utilizing all 16 LVs or 15 LVs with one LV excluded is presented. Although a significantly small number of LVs are used in physics-aware ROM, its accuracy is comparable to that of AE with 16LVs, physics-unaware ROM. Furthermore, even if only one LV is excluded in AE, their ROM accuracy becomes equivalent to or even lower than that of 1000-VAE only with two LVs. This result highlights the inefficiency of ROM through AE, in which it requires all 16 entangled LVs and therefore requires training 16 regression models. Using physics-aware ROM via 30-VAE, the equivalent accuracy can be achieved with only 4 regression models. To visually confirm these results, Fig. 19 shows the pressure contour of physics-aware ROM via *β*-VAE and physics-unaware ROM via AE. Herein, the indiscernible difference between them can be confirmed, indicating their equivalent prediction accuracy.

## VI. CONCLUSION

This study proposed the physics-aware ROM based on physics-aware LVs, which are interpretable and information-intensive LVs extracted by *β*-VAE. The proposed framework is validated with the 2D transonic benchmark problem in the following order. First, the extraction process of physics-aware LVs is scrutinized by quantitatively estimating their independence and information intensity. Then, the actual physical meanings of these LVs are thoroughly investigated. Finally, the effectiveness of the proposed physics-aware ROM compared with conventional ROMs is verified. The key contributions of our study can be summarized as follows:

The impacts of hyperparameter

*β*on the independence of LVs were scrutinized, and its effect on the independence of LVs was practically confirmed in that LVs become disentangled from each other as*β*increases.KL-divergence ranking method was proposed to measure the information intensity of each LV. This approach has two following advantages over the previous ranking method: KL-divergence is the direct cause of the discrepancies in information intensity of LVs, and it does not require cumbersome post-processing of reconstructed data. The proposed criterion was confirmed to have a consistent trend with estimated standard deviations and Sobol indices, indicating their validity. Through this ranking method, the effect of

*β*on the latent space regularization was practically confirmed in that LVs become information-intensive as*β*increases.The physical meanings contained in physics-aware LVs were thoroughly investigated. The correlation between the physical generating factors of the training dataset and the information physics-aware LVs contain was scrutinized as

*β*varies. Finally, it was confirmed quantitatively and qualitatively that the extracted physics-aware LVs in 1000-VAE actually correspond to the generating factors, which were*Ma*and*AoA*in this study. To the best our knowledge, this is the first study to practically confirm that*β*-VAE can automatically extract the physical generating factors in the field of applied physics.The effects of physics-awareness of LVs on the accuracy of regression and prediction processes in ROM are analyzed, and it was confirmed that only physics-aware LVs had a significant effect on their accuracy. Therefore, physics-aware ROM, which utilizes only physics-aware LVs, is proposed for its efficiency in that the number of required regression models can be reduced significantly. Finally, compared with conventional ROMs, its validity and efficiency were successfully verified.

The presented data-driven physics-aware ROM has great potential in two engineering applications. First, extraction process of physics-aware LVs can be applied to discovering generating factors from the given dataset. For example, this application can be extended to identify fault-causing factors in sensor data from manufacturing processes. Second, physics-aware ROM can be an efficient alternative to conventional black-box ROMs in that it utilizes necessary regression models of only physically interpretable and information-intensive LVs, rather than redundant regression models of all uninterpretable LVs. Though the application of this framework was demonstrated via the 2D transonic benchmark problem, it can be easily applied and extended to numerous engineering disciplines in that no special assumptions have been made on this specific problem. For future work, a more comprehensive investigation on physics-aware LVs will be conducted, such as their scalability to the temporal dataset or their ability to discern redundant physical parameters.

## ACKNOWLEDGMENTS

Yu-Eop Kang and Sunwoong Yang equally contributed to this work. This work was supported by a National Research Foundation of Korea grant funded by the Ministry of Science and ICT (No. NRF-2017R1A5A1015311).

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

### Author Contributions

**Yu-Eop Kang:** Conceptualization (equal); Data curation (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Software (equal); Validation (equal); Visualization (equal); Writing—original draft (equal); Writing—review and editing (equal). **Sunwoong Yang:** Conceptualization (equal); Data curation (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Software (equal); Validation (equal); Visualization (equal); Writing—original draft (equal); Writing—review and editing (equal). **Kwanjung Yee:** Funding acquisition (lead); Project administration (lead); Resources (lead); Supervision (lead).

## DATA AVAILABILITY

The data that support the findings of this study are available from the corresponding author upon reasonable request.

### APPENDIX A: POD-BASED ROM RESULTS

While the excellence of AE-based ROM over POD-based ROM has already been conducted by numerous studies,^{13,15,27} it is worth reaffirming from the perspective of this study. Therefore, additional results by POD-based ROM are presented in this section. Figure 20 shows the same results as in Fig. 14 except it is based on POD. When compared with 1000-VAE in Fig. 14, the discontinuity due to the shock wave is not sharply captured by the top two dominant LVs of POD. Moreover, these two LVs encode two physical characteristics, the presence of the shock wave and the magnitude of the suction peak, in an entangled manner, whereas 1000-VAE successfully separated their information in a disentangled manner. To quantitatively verify the entanglement of POD LVs, the same analysis using LR models with two LVs [as Eq. (8)] is repeated, and the results are as follows:

As can be confirmed, both LVs contribute significantly to the prediction of each physical parameter (herein, test dataset *R*^{2} values for Ma and AoA are calculated as 0.949 and 0.994), meaning that they cannot solely represent the *Ma* and *AoA* as 1000-VAE model. All these results indicate that despite the POD algorithm ensuring orthogonality, the physical interpretability of its latent space cannot be guaranteed.

To inspect the prediction accuracy of POD-based ROM, Fig. 21 shows the pressure contour predicted from POD-based ROM and its absolute error contour. Herein, POD (16LVs) and POD (2LVs) each represent ROM with 16 and 2 LVs (or modes) of POD. When compared with Fig. 19, the error contour of POD with 16LVs is comparable to that of AE/VAE/*β*-VAE. However, when the number of LVs is reduced to 2, which is the number of generating factors of training dataset, the error contour is much more worse. Indeed, $MSEpred$ of ROM based on POD with 2LVs ($4.94\xd7104$) is 26 times higher than that of 1000-VAE with 2LVs ($1.87\xd7103$). More interestingly, even that of POD with 16LVs ($2.07\xd7103$) is 1.1 times higher than 1000-VAE with 2LVs. Despite the further increment of the number of LVs used in POD-based ROM to 64, it is confirmed that the $MSEpred$ of POD ($1.93\xd7103$) is still higher than that of 1000-VAE. These results confirm that POD-based ROM requires an excessive number of LVs compared with *β*-VAE-based ROM for ensuring the same prediction accuracy.

This section shows the obvious superiority of nonlinear-based DR methods over linear-based DR method (POD) in terms of physical interpretability of latent space and consequent prediction accuracy of ROM, both of which are key factors of this study.

### APPENDIX B: SCALABILITY OF THE FRAMEWORK FOR EXTRACTING PHYSICS-AWARE LVS

In the main text, the framework for extracting physics-aware LVs using *β*-VAE is validated with regularized and well-organized CFD dataset. However, most of the data in real-world engineering is sparse and noisy. Accordingly, in this section, the practical scalability of the proposed framework is verified by utilizing only a small portion of the training dataset and adding artificial noises. For this purpose, the training dataset in the main text (which is the tensor of $3\xd7512\xd7128$) is preprocessed to be a vector with only 35 elements: 32 surface pressure values, lift coefficient (*C _{l}*), drag coefficient (

*C*), and pitching moment coefficient (

_{d}*C*). Artificial Gaussian noises are added considering the scale of each element, and the final dataset are shown in Fig. 22. Then, AE and

_{m}*β*-VAE models ($\beta \u2208[0.01,0.1,1]$) are trained and their top two ranked LVs are visualized in Fig. 23 (which corresponds to Fig. 12 in Sec. V D). Again, it can be seen that LVs of AE are highly entangled. On the other hand, LVs of

*β*-VAE become more and more disentangled as

*β*increases so that eventually in 1-VAE, each LV solely represents

*Ma*and

*AoA*, respectively. It can be concluded that the extraction of physics-aware LVs via

*β*-VAE, which is first observed and reported in this study, has the potential to be applied to real-world engineering problems where training datasets are sparse and noisy.