This article deals with largeeddy simulations of threedimensional incompressible laryngeal flow followed by acoustic simulations of human phonation of five cardinal English vowels, /ɑ, æ, i, o, u/. The flow and aeroacoustic simulations were performed in OpenFOAM and inhouse code openCFS, respectively. Given the large variety of scales in the flow and acoustics, the simulation is separated into two steps: (1) computing the flow in the larynx using the finite volume method on a fine moving grid with 2.2 million elements, followed by (2) computing the sound sources separately and wave propagation to the radiation zone around the mouth using the finite element method on a coarse static grid with 33 000 elements. The numerical results showed that the anisotropic minimum dissipation model, which is not well known since it is not available in common CFD software, predicted stronger sound pressure levels at higher harmonics, and especially at first two formants, than the walladapting local eddyviscosity model. The model on turbulent flow in the larynx was employed and a positive impact on the quality of simulated vowels was found.
I. INTRODUCTION
Aeroacoustic simulations undoubtedly have strong potential to be applied in clinical diagnostics, treatment control, and to support medical education. Regarding this application, computer analysis can provide highly resolved threedimensional (3D) data of the flow and acoustic field for further studies. Such highly resolved 3D data are infeasible by in vivo, excised larynx or synthetic vocal fold measurements (Döllinger , 2011; Kniesburges , 2011).
Although numerical tools are available, realistic 3D turbulent flow simulations are computationally expensive as the timeconsuming accurate modeling of supraglottal turbulence is essential for flowinduced sound generation (Mattheus and Brücker, 2011). Conventional turbulence modeling approaches, such as unsteady Reynoldsaveraged NavierStokes (uRANS) equations are inadequate for aeroacoustics because they do not provide the instantaneous state of flow quantities. Therefore, in comparison with a direct numerical simulation (DNS), largeeddy simulation (LES) is a beneficial balance between the resolution of turbulent structures and the accurate prediction of turbulent sound generation mechanism. One of the first studies employing LES to research laryngeal aeroacoustics was the work of Suh and Frankel (2007), who combined a compressible LES and acoustic analogy published by Ffowcs Williams and Hawkings (1969; FWH) in a static model of the human glottis for human voice signal predictions. Mihaescu (2010) employed the LES capability to study the laryngeal airflow during phonation and inspiration. Schwarze (2011) explored an implicit LES, where the intrinsic dissipation of the numerical method was assumed to act as a subgridscale model. During recent years, simulations have advanced considerably (Bodaghi , 2021; Tokuda and Shimamura, 2017; Yoshinaga , 2017) toward a state where fullscale aeroacoustic simulations on realistic computed tomography (CT) or magnetic resonance imaging (MRI)based geometries are possible. It can be anticipated that these simulations could be used for subjectspecific presurgical predictions of vocal fold oscillations (Avhad , 2022). These phonation simulations can be useful and help to improve the voice quality for subjects suffering from various vocal fold dysfunctions (Falk , 2021; Sadeghi , 2019a,b) or evaluate potential effects on voice production affected by an implant insertion in medialization laryngoplasty (Zhang , 2020).
In clinical applications, the reliability of the LES is a key factor. Hence, further studies on the accuracy of subgridscale models are important. In our previous study (Lasota , 2021), the implementation of a relatively new subgridscale anisotropic minimum dissipation (AMD) model, published first by Rozema (2015), was described. The reasons for introducing the given model are based on the major features of the minimum dissipation models, which are constructed to confine time derivative of subgridscale energy and satisfy the need to dissipate the energy of subgrid scales. In addition, the AMD model is also derived for anisotropic meshes, which can circumvent an overuse of corrections needed to retain the stability of the numerical schemes when severe mesh element deformation occurs due to vocal fold motion. Rozema (2015) confirmed high consistency between results obtained by AMD and the DNS of turbulent channel flow and between AMD and the DNS of temporal mixing layers, where both were performed on anisotropic grids. These flow regimes are comparable to the flow structures arising inside the larynx.
In this work, AMD is investigated for biological laryngeal flows based on the abovementioned advantages. The flow field prediction capabilities are assessed and compared to stateoftheart LES using a conventional oneequation (OE) and walladapting local eddyviscosity (WALE) subgridscale turbulence models. Furthermore, the AMD model's impact on the aeroacoustic sound generation and the produced voice signal is evaluated for the first time.
The article is organized into two major sections describing the computational fluid dynamic (CFD) and computational aeroacoustic (CAA) parts. CFD in Sec. II describes models, geometry, boundary conditions, mesh, discretization, numerical solution, and results. CAA in Sec. III keeps the same structure.
II. CFD MODEL
A. Mathematical model
LES is a mathematical concept for modeling turbulent flows, which deals with flow structures carrying most kinetic energy, k, i.e., spatially organized large scales. These consist of two main categories: coherent structures and coherent vortices of recognizable shape (Lesieur , 2005). In the numerical implementation, the characteristic length, Δ, defining a cutoff between resolved large scales and modeled subgrid scales, is usually given by the mesh grid spacing (Versteeg and Malalasekera, 2007).
1. OE subgridscale model
Neither the Smagorinsky nor the OE model can reproduce the laminar to turbulent transition, and both tend to overpredict the production rate and, thus, the turbulent viscosity in free shear layers and near the walls. This is caused by the fact that the term $ S \xaf i j S \xaf i j$ is large in the regions of pure shear because it is only related to the rateofstrain, $ S \xaf i j$, and not to the rateofrotation, $ \Omega \xaf i j$ (Lesieur , 2005).
2. WALE subgridscale model
3. AMD subgridscale model
The AMD model was implemented into the OpenFOAM code as the LAAMD library.^{1}
Based on decaying grid turbulence tests, the value of the constant, C_{A}, in Eq. (17) is recommended with respect to the order of discretization of NavierStokes equations, which were tested on decaying grid turbulence cases. Rozema (2015) has concluded that AMD gave the best results with C_{A} = 0.3 for a central secondorder scheme and $ C A = 0.212$ for a fourthorder scheme. A recent study by Zahiri and Roohi (2019) states an optimal value of the constant $ C A = 1 / 3 = 0.577$ based on various test cases. Lasota (2022) performed his own tests with model constants, $ C A = 0.3$ and $ C A = 0.57735$, on cases with turbulent flow in a planar channel and periodic hill. Based on better agreement with velocity profiles obtained by DNS or experiments, $ C A = 0.3$ was chosen as the model constant for turbulence modeling in the larynx.
Rozema (2015) has also shown that after a Taylor expansion of τ_{ij} in Eq. (5), consistency between AMD and the exact subgridscale stress tensor, τ_{ij}, can be proved. After Taylor expansion of the eddy dissipation of the exact subgridscale stress tensor, $ \tau i j S i j$, consistency with R5 in Eq. (17) can be also proved. If the exact eddy dissipation gives zero dissipation, then the term R5 gives zero dissipation as well. This means that the AMD model can be switched off for flows where the exact eddy dissipation is vanishing. Thus, the AMD model also switches off when no subgrid energy is created (Rozema , 2015; Vreugdenhil and Taylor, 2018).
B. Geometry and boundary conditions
The geometry of vocal folds is based on the M5 parametric shape by Scherer (2001); see Fig. 1. The false vocal folds were specified according to data published by Agarwal (2003). The geometrical model is 3D, having a square cross section at inlet 12 × 12 mm. Details can be found in Šidlof (2015).
The boundary conditions for the CFD model are summarized in Table I. The flow is driven by constant pressure difference, $ P k = p \xaf / \rho = 300 \u2009 m 2 / s s$, between the inlet $ \Gamma in$ and outlet $ \Gamma out$. The velocity on $ \Gamma in$ and $ \Gamma out$ is computed from the flux. Turbulence initialization at inlet is not used because turbulent intensity upstream of the glottis is low and the flow at inlet can be considered laminar (Lasota and Šidlof, 2019).
.  $ u \xaf \u2009 \u2009 ( m / s )$ .  $ p \xaf \u2009 \u2009 ( Pa )$ . 

$ \Gamma in$  $ u \xaf = \u2009 0$ if $ u \xaf \u22c5 n < 0 ,$  350 
$ \u2207 u \xaf \u22c5 n = 0$ if $ u \xaf \u22c5 n > 0$  
$ \Gamma out$  $ u \xaf = \u2009 0$ if $ u \xaf \u22c5 n < 0 ,$  0 
$ \u2207 u \xaf \u22c5 n = 0$ if $ u \xaf \u22c5 n > 0$  
$ \Gamma bVF , \u2009 \Gamma uVF$  $ u \xaf 1 = 0 , \u2009 u \xaf 2 = \u2202 t h ( x , t ) ,$  $ \u2207 p \xaf \u22c5 n = 0$ 
$ u \xaf 3 = 0$  
$ \Gamma wall$  $ u \xaf = \u2009 0$  $ \u2207 p \xaf \u22c5 n = 0$ 
.  $ u \xaf \u2009 \u2009 ( m / s )$ .  $ p \xaf \u2009 \u2009 ( Pa )$ . 

$ \Gamma in$  $ u \xaf = \u2009 0$ if $ u \xaf \u22c5 n < 0 ,$  350 
$ \u2207 u \xaf \u22c5 n = 0$ if $ u \xaf \u22c5 n > 0$  
$ \Gamma out$  $ u \xaf = \u2009 0$ if $ u \xaf \u22c5 n < 0 ,$  0 
$ \u2207 u \xaf \u22c5 n = 0$ if $ u \xaf \u22c5 n > 0$  
$ \Gamma bVF , \u2009 \Gamma uVF$  $ u \xaf 1 = 0 , \u2009 u \xaf 2 = \u2202 t h ( x , t ) ,$  $ \u2207 p \xaf \u22c5 n = 0$ 
$ u \xaf 3 = 0$  
$ \Gamma wall$  $ u \xaf = \u2009 0$  $ \u2207 p \xaf \u22c5 n = 0$ 
On the moving boundaries, $ \Gamma bVF$ and $ \Gamma uVF$, the flow velocity is equal to the velocity of the moving vocal fold surface, given by the function $ h ( x , t )$. The function $ h ( x , t )$ based on the sinusoidal displacement, $ w 1 , 2 = A 1 , 2 \u2009 sin \u2009 ( 2 \pi f o t + \xi 1 , 2 )$, ensures the vibrating motion of vocal folds in the mediallateral (y) direction with two degrees of freedom. In the current simulation, the vocal folds oscillate symmetrically with a frequency f_{o} = 100 Hz, amplitudes at the superior and inferior vocal fold margin are $ A 1 , 2 = 0.3 \u2009 \u2009 mm$, and the phase difference is $ \xi 1 \u2212 \xi 2 = \pi / 2$ between the inferior and superior vocal fold margin. During vocal fold oscillation cycle, the medial surface convergence angle, $ \psi / 2$ (see Fig. 1), alternates between $ \xb1 10 \xb0$. In this study, the oscillation of the vocal folds allows closing/opening the glottal gap, g, in the range 0.42–1.46 mm.
C. Mesh and discretization
In wallbounded flows, the presence of solid walls fundamentally influences the flow dynamics, turbulence generation, and transport in the nearwall regions due to significant viscous stresses. The accuracy of the numerical simulation is, thus, closely related to the grid resolution near the fixed walls. According to the classification by Pope (2000), LESs of wallbounded flows can be classified as largeeddy simulations with nearwall resolution (LESNWR) with a grid sufficiently fine to resolve 80% of the turbulent energy in the boundary layer, and largeeddy simulation with nearwall modeling (LESNWM), which employs a modeling approach similar to Reynoldsaveraged NavierStokes (RANS) in the nearwall region. For these simulations, an important parameter is the wall unit, $ y + = u \tau y / \nu $, where $ u \tau $ is the friction velocity and y is the dimensional distance in normal direction from the wall. Using the same normalization, x^{+} and z^{+} denote the dimensionless streamwise and spanwise distances, respectively. Wall units are also commonly used to indicate LES adequacy. According to Georgiadis (2010) and Jiang and Lai (2016), in LESNWR, the theoretical limits for the grid spacing adjacent to the wall are $ 50 \u2264 x + \u2264 150 , \u2009 \u2009 y + < 1$ and $ 15 \u2264 z + \u2264 40$, with at least 3–5 gridpoints for $ 0 < y + < 10$. The computational mesh in the current CFD simulation is block structured to capture well the boundary layer and consists of 2.2 M hexahedral elements (Lasota , 2021). An opensource 3D finite volume mesh generator, blockMesh (part of OpenFOAM), was used to build the mesh. The mesh deforms in time due to vocal fold oscillation. On the boundary, $ \Gamma bVF$, at the critical time when the vocal folds are maximally adducted were evaluated values $ y avg + = 1.77 , \u2009 \u2009 z + = 14$ and $ x + = 8$.
The NavierStokes equations were discretized using the collocated cellcentered finite volume method. Fletcher (2012) demonstrated that evenordered derivatives in the truncation error are associated with numerical dissipation, and oddordered spatial derivatives are associated with the numerical dispersion in the solution. Ideally, LES simulations should use schemes with low numerical dissipation. The nondissipative central differencing scheme, which was applied in this study, allows an accurate representation of the changing flow field (Launchbury, 2016). The discretization of the diffusion term is split into an orthogonal and crossdiffusion term using a procedure described in Jasak (1996). Unlike the discretization of the temporal, convective, and orthogonal parts of the diffusive term, the nonorthogonal correctors are treated explicitly.
D. Numerical solution
CFD simulations were run in parallel on the computational cluster Charon [Metacentrum NGI (National Grid Infrastructure)—Technical University of Liberec, using 20 Intel Xeon Silver 4114 cores; Santa Clara, CA] and SGI (silicon graphics based on Intel processors) Altix UV (ultraviolet) 100 supercomputer Fox (Computing center of the Czech Technical University in Prague, using 20 Intel Xeon Nehalem cores).
To have sufficient resolution in the spectrum of the aeroacoustic signal, a sufficiently long simulation time of 20 periods of vocal fold vibration is needed ( $ t = 0.2 \u2009 \u2009 s$). For these settings, one CFD simulation required 27–37 days (about 15 000 corehours of computational time).
E. CFD results
In this part, CFD simulations using different turbulence modeling approaches are presented; see Table II. The CFD model validation performed on the periodic hill and backwardfacing step is reported in Lasota (2022).
Case .  Type .  SGS model .  Cluster .  Walltime . 

LAM  Laminar  —  Charon  27 days 13 h 
OE  LES  OE  Charon  34 days 5 h 
WALE  LES  WALE  Charon  37 days 13 h 
AMD  LES  AMD  Fox  34 days 18 h 
Case .  Type .  SGS model .  Cluster .  Walltime . 

LAM  Laminar  —  Charon  27 days 13 h 
OE  LES  OE  Charon  34 days 5 h 
WALE  LES  WALE  Charon  37 days 13 h 
AMD  LES  AMD  Fox  34 days 18 h 
1. Laryngeal flow rate
Figure 2 shows the glottal opening and flow rates during the last four simulated cycles of vocal fold oscillation. The time, t_{N}, corresponds to the instant when the inferior margins of the vocal folds approach most and reduce the glottal opening to $ 5.58 \u2009 \u2009 mm 2$ ( $ g = 0.465 \u2009 \u2009 mm$). Time instant, t_{C}, is the maximum approach of the superior vocal fold margins, where the glottal opening drops to $ 4.98 \u2009 \u2009 mm 2$ ( $ g = 0.415 \u2009 \u2009 mm$). The third time instant, t_{O}, corresponds to the maximum glottal opening of $ 17.51 \u2009 \u2009 mm 2$ ( $ g = 1.459 \u2009 \u2009 mm$).
The subgridscale models affected the flow rates Q [l/s] (see Fig. 2): the predicted peak flow rate in the laminar case is higher than that in the OE, WALE, and AMD subgridscale models by 16.76%, 5.26%, and 9.3%, respectively. This is caused by the different values of the turbulent viscosity, ν_{t}, which adds to the molecular viscosity and limits the flow rate through the glottal constriction. The laminar model does not capture the influence of smallscale turbulence, which corresponds to $ \nu t = 0$. The WALE and OE compute with nonzero turbulent viscosity, with the latter significantly higher due to the already mentioned deficiency of the OE, which overestimates the turbulent viscosity near the vocal fold surfaces. The minimum and maximum flow rates are $ 0.122 \u2009 \u2009 l / s$ by using OE and 0.358–0.434 l/s by using the WALE and LAM models, respectively. The peak flow rate predicted by the AMD model occurs slightly sooner than in other simulations.
2. Vorticity field
Vorticity ( $ \omega = \u2207 \xd7 u \xaf$) is commonly used for characterizing turbulent structures in cases with no entrainment rotation. The vorticity fields reveal the shear layers, where vortices are shed as a consequence of KelvinHelmholtz instability. The vortices may undergo successive instabilities, leading to a direct kineticenergy cascade toward the small scales.
Figure 3 shows vorticity fields presented in the midcoronal plane (xy). The supraglottal jet deflects stochastically toward either of the ventricular folds. This behavior is not a consequence of the subgridscale model, it is caused by the bistability of the flow in this symmetric geometry (Erath and Plesniak, 2010; Lodermeyer , 2015). Detailed analysis of the vorticity within the glottal region shows that the average value of vorticity in the glottal region is similar for all of the subgridscale models.
Figure 4 shows a complementary view on the magnitude of the vorticity vector, $  \omega $, in the midsagittal plane (xz). The simulation with the AMD model predicts low vorticity near the glottis. The absence of vorticity may imitate the situation in the realistic larynx where the jet is frequently stopped and renewed and, thus, the turbulent eddies are forced to be dissipated.
3. Turbulent viscosity field
The effect of the unresolved turbulent subgrid scales on the resolved scales is carried by the turbulent viscosity, ν_{t}, represented by Eq. (8) for $ \nu t O$, Eq. (9) for $ \nu t W$ and Eq. (17) for $ \nu t A$.
Figure 5 shows that the turbulent viscosity predicted by the simulation with the OE model is very high in regions of pure shear, especially within the glottis. This is probably the reason why the simulation with the OE model usually predicted the lowest intraglottal velocity. In contrast to this, WALE and AMD subgridscale models predicted considerably lower turbulent viscosity in the shear layers at $ t C$. The fields computed by the AMD model seem to be similar to fields computed by WALE with spots of gently higher turbulent viscosity at $ t C$. The other situation occurs at $ t O$ when the turbulent viscosity predicted by the AMD model is around 2× higher than that predicted by OE and 5× higher than that predicted by WALE.
Figure 6 shows turbulent viscosity fields in the midsagittal plane. The simulation with OE predicted 2× higher turbulent viscosity located at the vicinity of the inferior margin of the vocal folds than that at others. The narrow barrier of turbulent viscosity at t_{N} in the case with AMD reduced the flow rate just by 0.8% (1 ml/s) compared to the case with WALE.
III. CAA MODEL
A. Mathematical model
B. Geometry, mesh, and boundary conditions
Figure 7 shows the geometry used for aeroacoustic simulations of vowel /ɑ/, where they are (from left to right) the perfectly matched layer (PML) with a thickness of 0.3 cm at the inlet, larynx with vocal folds, and false vocal folds (2 cm), vocal tract (vowels /ɑ, o/, 17.46 cm; /u/, 18.25 cm; /i, æ/, 16.67 cm) and the radiation zone (RZ) protected by 2–3 PMLs with a thickness of 0.5 cm. PML is a technique published first by Berenger (1994); the original method was modified by Kaltenbacher (2013) by adding damping layers to guarantee that no wave reflections occur at boundaries.
The geometry of the vocal tract was modeled from frustums (0.397 cm long) that were concatenated one after another. The shape of the frustums was defined according to the vocal tract area function measured by MRI (Story , 1996). About 33 000 hexahedral firstorder finite elements were used for each vowel.
The vocal tract was conformally attached to the larynx. The connection is formed by two layers of hexahedral cells with minor influence on wave propagation. The right patch of the vocal tract was attached to the RZ.
The element length, $ \Delta l a$, of the coarse acoustic mesh and time step, $ \Delta t a$, for the aeroacoustic simulation are given by estimations, $ \Delta l a \u2264 c 0 / ( 20 f max )$ and $ \Delta t a \u2264 1 / ( 20 f max )$, assuming that 20 linear finite elements per 1 acoustic wavelength are sufficient (Hüppe, 2012). In this case, the spatial discretization is limited by $ 3.43 \u2009 \u2009 mm$ and time step is limited by $ 1 \xd7 10 \u2212 5 \u2009 \u2009 s$ to resolve properly acoustic frequencies up to $ f max = 5 \u2009 \u2009 kHz$.
The partial differential equation (19) for the acoustic potential, ψ^{a}, which is solved numerically in the acoustic domain, is equipped with zero initial conditions and boundary condition $ \u2207 \psi a \u22c5 n = 0 ,$ where n is the outward unit normal. This boundary condition can be interpreted as a perfect reflection of a sound wave from a barrier; the condition is also called “sound hard” (Schoder, 2018). In following CAA simulations, the sound hard condition is applied at all of the solid boundaries except the inflow and outflow, where PML is used.
C. Numerical solution
The numerical solution of the aeroacoustic problem proceeds in the following steps (see Fig. 8):

The unsteady flow field in the larynx is computed in OpenFOAM on a fine CFD mesh over 20 periods of vocal fold oscillation;

the sound sources in the larynx are computed by openCFS and conservatively interpolated onto the coarse CAA mesh, whereas the interpolation back to the fine original CFD mesh is used to visualize the results in high resolution (Figs. 9–12), and

in the last step, the aeroacoustic sources are conservatively interpolated onto the coarse CAA mesh and the wave propagation is simulated by openCFS.
The computational time needed for one CAA simulation is much lower than that needed for one CFD simulation, about 5 h on a single central processing unit (CPU) core compared to 27–37 days on 20 cores. The conservative interpolation of sound sources (Schoder , 2019; Schoder , 2020; Schoder , 2021b) from the CFD mesh to the CAA mesh was performed by the cfsdat tool, which is part of the openCFS library.
D. CAA results
1. Sound sources (time domain)
The distribution of the aeroacoustic sources in the computational domain covering the larynx varies throughout the vocal fold oscillation period. The jet is surrounded by spots of strong positive and negative acoustic sources related to turbulent eddies created from shear layers of the jet; see also Schoder (2021a). Figure 9 shows the aeroacoustic sound source distribution corresponding to the closedconvergent position of vocal folds, which are visualized on the CFD mesh with the closeddivergent position of vocal folds. The aeroacoustic sources computed on the moving geometry with oscillating vocal folds are mapped to a fixed geometry. This is performed because the current version of the acoustic solver cannot handle moving meshes.
Figure 10 shows sound source distribution when vocal folds are fully open, and the results are again mapped to the closeddivergent domain. Strong sound sources within the glottis and 1 cm from the glottis are observed. The 3D view shows uniform distribution of sound sources in the glottis. This can be caused by the presence of high turbulent viscosity.
2. Sound sources (frequency domain)
The conversion from the time to frequency domain was made by the field fast Fourier transform (field FFT), which brings a better insight into the spatial distribution of the sound sources at distinct frequencies. Figure 11 shows that the results obtained from the LES without subgridscale model (LAM) show slightly more intensive sound sources within the glottis at $ f o = 100 \u2009 \u2009 Hz$ than that at other cases. This correlates with the flow rate amplitude, which is also higher in the LAM case (see Fig. 2). On the other side, the intensity of sound sources at $ f = 1000 \u2009 \u2009 Hz$ is opposite, and AMD is 2.5–4× higher compared to the others. At these higher frequencies, dominant sound sources do not occur within the glottis but in the places where the fast glottal jet interacts with the ventricular folds and the slowly moving recirculating air in the supraglottal volume.
Figure 12 shows sound sources at a random nonharmonic frequency, $ f = 1235 \u2009 \u2009 Hz$, where the sound sources are distributed further downstream of the glottis. The integrity of sound sources can be observed along with several local sound spots at the superior edge of the vocal folds.
3. Wave propagation (time domain)
In the following, the acoustic pressure fields, $ p a ( x , t )$, will be analyzed as a solution of Eq. (19). The acoustic pressure has been tracked at distances of 1 and 16 cm from the lips to create audio recordings; see Fig. 13. Two PML layers around the RZ damped the acoustic waves on walls properly as no reflections are observed.
Table III compares individual cases in terms of $ p rms a = ( 1 / T ) \u222b 0 T ( p a ) 2 d t$ and $ SPL = 20 \u2009 \u2009 log 10 ( p rms a / p ref a )$, where $ p ref a = 20 \u2009 \u2009 \mu Pa$ is the hearing threshold. The total sound pressure level (SPL) value of the radiated sound can be used as a measure of the acoustic energy transferred from the larynx through the vocal tract. Minor differences in SPL are observed between the back close vowel, /u/, and front open vowel, /ɑ/. The simulations of front open/mid vowel, /æ/, transferred most energy of all of the simulated vowels. The simulations based on the AMD model predicted the highest or similar SPLs of all of the vowels, except the frontclose vowel, /i/. This may lead to the conclusion that WALE model can transfer more energy in simulations with the close vowels.
Case .  .  $ p rms a$ .  SPL .  .  $ p rms a$ .  SPL .  .  $ p rms a$ .  SPL . 

LAM  ɑ  0.0031  43.69  æ  0.0052  48.37  i  0.0026  42.12 
OE  ɑ  0.0016  37.89  æ  0.0021  40.45  i  0.0011  35.16 
WALE  ɑ  0.0028  42.80  æ  0.0049  47.84  i  0.0022  40.68 
AMD  ɑ  0.0032  44.13  æ  0.0051  48.12  i  0.0016  38.01 
LAM  o  0.0019  39.48  u  0.0019  39.69  
OE  o  0.0016  38.31  u  0.0013  36.58  
WALE  o  0.0018  38.89  u  0.0017  38.43  
AMD  o  0.0026  42.23  u  0.0018  39.22 
Case .  .  $ p rms a$ .  SPL .  .  $ p rms a$ .  SPL .  .  $ p rms a$ .  SPL . 

LAM  ɑ  0.0031  43.69  æ  0.0052  48.37  i  0.0026  42.12 
OE  ɑ  0.0016  37.89  æ  0.0021  40.45  i  0.0011  35.16 
WALE  ɑ  0.0028  42.80  æ  0.0049  47.84  i  0.0022  40.68 
AMD  ɑ  0.0032  44.13  æ  0.0051  48.12  i  0.0016  38.01 
LAM  o  0.0019  39.48  u  0.0019  39.69  
OE  o  0.0016  38.31  u  0.0013  36.58  
WALE  o  0.0018  38.89  u  0.0017  38.43  
AMD  o  0.0026  42.23  u  0.0018  39.22 
4. Wave propagation (frequency domain)
This section deals with frequency spectra of simulated signals. This kind of analysis can highlight fundamental, harmonic, and nonharmonic frequencies, which are accompanied by broadband noise. The FFT analyses were performed on the signal spanning over 20 periods of the vocal fold oscillation (1 period = 10 ms = 1000 samples) and, thus, the frequency resolution is $ \Delta f = 5 \u2009 \u2009 Hz$.
Influence of the end of the vocal tract (lips) was tested first, comparing two signals at 1 and 16 cm; see Fig. 14. At 16 cm, the SPLs on all of the frequency components are uniformly lower by 19 dB. Our model of phonation prescribing 350 Pa at inlet of the larynx can be considered to be quiet phonation while, in the other case, it is recommended to use a probe for tracking a signal at least 3 cm from the lips. The envelope overlaying black peaks corresponds to the linear predictive coding (LPC) curve (Pavlidi , 2013), which is widely used to provide a smooth spectral envelope of audio and voice recordings for the purpose of finding positions of formants.
In the following, the frequency spectra at the distance of 16 cm will be analyzed vowel by vowel, and the influence of the subgridscale models will be investigated. Note that the LES without subgridscale model (LAM) should be considered as the least accurate case, which disregards the effect of smallscale turbulence. However, this approach is often still used in the modeling of laryngeal flow, and the comparison against more accurate LESs is interesting.
a. Vowel /ɑ/.
The spectrum is presented in Fig. 15, where SPLs at fundamental frequency, $ f o = 100 \u2009 \u2009 Hz$, and higher harmonics, $ f 1 = 200 \u2009 \u2009 Hz , \u2009 \u2009 f 2 = 300 \u2009 \u2009 Hz$, and so forth are well visible. Peaks at f_{o} are lower than those at f_{1} and f_{2}. If the signal is processed immediately behind the vocal folds, the SPL at f_{o} is definitely the highest, whereas the signal tracked in RZ includes the effect of the vocal tract for vowel /ɑ/. The peaks (formants) are key parameters, which define a vowel. The close distance between the first two formants, $ F 1$– $ F 2$, is typical for vowels /ɑ, o, u/. It should be noted that the first two formants are important for vowel distinctness. The AMD model resulted in the higher peak at the first formant around 900 Hz by 8 dB and, thus, contributes to the better vowel clarity compared to other models.
b. Vowel /æ/.
Figure 16 shows the frequency spectrum for this front and open vowel, /æ/, and corresponds to the highest total SPL from all of the simulated vowels, up to 48 dB.
c. Vowel /i/.
The frequency spectrum for this front and close vowel is plotted in Fig. 17. The spectrum shows that the simulations with OE predicted the lowest total SPL (around 35 dB), which is related to the wellknown overpredicting of turbulent viscosity by the OE model and can be observed in all of the spectra.
d. Vowel /o/.
Figure 18 shows the frequency spectrum for this back and midclose vowel for which it is typical that $ F 2$– $ F 3$ are far apart. Simulations based on WALE predicted the most visible third formant of all of the vowels. To distinguish vowels, human ears are most sensitive to the two lowest formants. For that reason, the AMD model can be more valuable for voice research.
e. Vowel /u/.
The frequency spectrum for this back and close vowel is in Fig. 19. The spectrum shows that the second formant of the LAM case at 1100 Hz is higher by 8 dB compared to the second formants of the WALE and AMD. However, the audio recording from the simulation with LAM is subjectively not much clearer compared to the others.
The formant locations, $ F 1$– $ F 2$, computed from local maxima of the LPC curves in Figs. 15–19, are compared with measurements of natural speech by Ireland (2015), Peterson and Barney (1952), and Story (1996) in Fig. 20. All of the simulated vowels lie inside of the measured ranges, which confirms the usability of the acoustic grids based on the circular cross sections.
Readers can assess the clarity of simulated vowels by downloading the audio recordings.^{2} Please note that for comfortable, listening recordings at the distance of 1 cm from the lips were used.
IV. CONCLUSION
In this article, models of phonation based on LESs with subgridscale models contributing to the solution of turbulent flow were presented. It was shown that the OE model overpredicts the turbulent viscosity in the boundary layer adjacent to the vocal folds and shear layers of the glottal jet (Fig. 5). The difference in glottal flow rate among the simulations is clearly induced by the subgridscale model, which adds the turbulent viscosity to the molecular viscosity of air and hinders the airflow in the glottis (Fig. 2). The WALE model produced zero eddy viscosity in cases of pure shear flow (Fig. 6) and, hence, the flow simulation with WALE predicted by 5% higher the maximum transglottal flow rate than did AMD. Despite this fact, the phonation simulation based on the AMD model transferred more or equal energy in terms of total SPL than WALE for all of the vowels except the frontclose vowel, /i/ (Table III). The WALE model, which is known to handle turbulent viscosity at the nearwall and highshear regions more precisely than the OE model (Figs. 5 and 6), resulted in higher total SPLs than OE in all of the cases (Table III). The OE model gives acceptable results, in general, but peaks of frequency formants are hardly visible and weaker compared to the WALE or AMD models (Figs. 16–19). The WALE model resulted in higher third formants in highfrequency bandwidth, most of all, for the subgridscale models (Figs. 16–19). However, the third formant is not crucial for vowel characterization. On the other side, the AMD model resulted in slightly higher SPLs at harmonic and formant frequencies up to the second formant for all of the vowels (Figs. 16–19). First formants are higher in cases /ɑ, æ/ at least by 8 dB (Figs. 15 and 16). These positive findings can be attributed to beneficial features of the AMD model: consistency with the exact subgridscale stress tensor, τ_{ij}, no requirements on the approximation of the LES filter width, and usability on anisotropic meshes.
Based on recordings of simulated vowels, it can be concluded that the model of phonation employing the AMD subgridscale model is much closer to natural speech than largeeddy simulations with conventional subgridscale models.
ACKNOWLEDGMENTS
The research was supported by the Student Grant Scheme at the Technical University of Liberec through Project No. SGS20223016. The Graz group acknowledges support from the ÖAW research grant “Understanding voice disorders,” received from Dr. Anton OelzeltNewin'sche Stiftung.
See https://gitlab.com/mlasota/myFoam (Last viewed January 21, 2023).
See supplementary material at https://www.scitation.org/doi/suppl/10.1121/10.0017202 to download the (.wav) recordings of simulated vowels sorted by different subgridscale models.