We present two major improvements over the Compressible High-ORder Unstructured Spectral difference (CHORUS) code published in Wang *et al.*, “A compressible high-order unstructured spectral difference code for stratified convection in rotating spherical shells,” J. Comput. Phys. **290**, 90–111 (2015). The new code is named CHORUS++ in this paper. Subsequently, we perform a series of efficient simulations for rotationally constrained convection (RCC) in spherical shells. The first improvement lies in the integration of the high-order spectral difference method with a boundary-conforming transfinite mapping on cubed-sphere grids, thus ensuring exact geometric representations of spherical surfaces on arbitrary sparse grids. The second improvement is on the adoption of higher-order elements (sixth-order) in CHORUS++ vs third-order elements for the original CHORUS code. CHORUS++ enables high-fidelity RCC simulations using sixth-order elements on very coarse grids. To test the accuracy and efficiency of using elements of different orders, CHORUS++ is applied to a laminar solar benchmark, which is characterized by columnar banana-shaped convective cells. By fixing the total number of solution degrees of freedom, the computational cost per time step remains unchanged. Nevertheless, using higher-order elements in CHORUS++ resolves components of the radial energy flux much better than using third-order elements. To obtain converged predictions, using sixth-order elements is 8.7 times faster than using third-order elements. This significant speedup allows global-scale fully compressible RCC simulations to reach equilibration of the energy fluxes on a small cluster of just 40 cores. In contrast, CHORUS simulations were performed by Wang *et al.* on supercomputers using approximately 10 000 cores. Using sixth-order elements in CHORUS++, we further carry out global-scale solar convection simulations with decreased rotational velocities. Interconnected networks of downflow lanes emerge and surround broader and weaker regions of upflow fields. A strong inward kinetic energy flux compensated by an enhanced outward enthalpy flux appears. These observations are all consistent with those published in the literature. Furthermore, CHORUS++ can be extended to magnetohydrodynamic simulations with potential applications to the hydromagnetic dynamo processes in the interiors of stars and planets.

## I. INTRODUCTION

Turbulent thermal convection is ubiquitous in the interiors of rotating stars and planets, such as the Sun and Jupiter. With advances in computing powers, numerical simulations of rotationally constrained convection (RCC) are considered for studies of many astrophysical hydrodynamic phenomena, such as differential rotation, solar granulation, and meridional circulation. However, global-scale simulations of the convection zone with realistic conditions are challenging. Given that the fluid Reynolds number (Re) and Rayleigh number (Ra) in the Sun are on the order of 10^{12} and 10^{20},^{1,2} it is hard to resolve the broad range of length scales in turbulent solar convection, but substantial progress has been made in simulations with reduced flow parameters. Early laminar convection simulations^{3–5} reproduced many salient features of solar convection, e.g., the large differential rotation, which takes the form of equatorial acceleration. More recent studies,^{6–9} which were based on turbulent RCC simulations at higher Reynolds numbers, reproduced structures of convective cells on the Sun and revealed how differential rotation and meridional circulation are established. Further research found that convective patterns in laminar and turbulent states are significantly different.^{6} The convective pattern in the laminar case is dominated by banana cells, which is consistent with observations in the early simulations. However, the turbulent case exhibits broad and weak upflow regions surrounded by narrow and strong downflow lanes. This pattern resembles solar granulation but on a much larger scale. A subsequent study based on turbulent RCC simulations with $ Re = 400$ revealed that coherent downflow structures associated with giant cells play a significant role in maintaining differential rotation.^{9} Purely hydrodynamic simulations of global-scale RCC were also extended to magnetohydrodynamic (MHD) simulations, and the interplay between convection, rotation, and magnetic fields was investigated.^{8}

Although many efficient codes for RCC simulations have been developed, they may still have some limitations. The first limitation is that most of them are based on pseudo-compressible approximations. For example, the anelastic spherical harmonic (ASH) code,^{10} one of the most successful codes for RCC simulations in spherical shells, uses the anelastic approximation.^{11} This approximation is valid when characteristic velocities remain substantially subsonic, and the stratification does not become appreciably superadiabatic. It does not hold near the photosphere of the Sun, where the Mach number approaches unity. The second limitation is that many of them are based on a specific gridding system and cannot be implemented on arbitrary unstructured grids. For example, the ASH code is based on the spherical harmonic basis functions, which only apply to the spherical geometry. The PENCIL code,^{12,13} a popular code for compressible astrophysical fluid dynamics, is designed for structured grids. Although it was successfully applied to a spherical star immersed inside of a Cartesian tube,^{14} its early applications^{15,16} were concentrated on the spherical wedge geometry which omits polar regions. The third limitation is that parallel slowdown may appear as a result of a communications bottleneck. The reason is that most codes use anelastic or incompressible approximations and, therefore, need computationally expensive global communication to enforce the divergence-free constraint of the velocity field.

To avoid aforementioned three limitations, a Compressible High-ORder Unstructured Spectral difference (CHORUS) code was developed,^{17} which uses fully compressible models, unstructured grids, and compact computational stencils favored by massively parallel computing. To provide high fidelity, the high-order accurate spectral difference (SD) method^{18–22} is used. This is a compact high-order method, which uses element-wise polynomials for reconstruction. Similar methods, like the spectral element method (SEM),^{23} were applied to direct numerical simulations of rotating convection in a cube.^{24} By comparing with the SEM, the discontinuous Galerkin (DG) method, which solve the integral form of the Navier–Stokes (NS) equations, the SD method solves the differential form of the NS equations. It is especially attractive since explicit integral computations are not needed, and all interpolation and differentiation are performed in efficient one-dimensional (1D) procedures.

By comparing with the original CHORUS code, which uses the isoparametric mapping to perform coordinate transformations, the new code, CHORUS++, employs exact transfinite mapping and implements the SD method on a cubed-sphere grid. With transfinite mapping, geometric errors for capturing the curvature of spherical surfaces are exactly zero on arbitrary sparse grids. Dense distribution of computational elements near curved boundaries is no longer needed. We can use element-wise high-order polynomials to reconstruct solutions and fluxes on sparse grids to achieve high accuracy and save computational costs. The superiority of higher-order reconstruction compared with lower-order counterparts is shown for RCC simulations in terms of accuracy and computing costs in Sec. VI. Moreover, the topology of the cubed-sphere grid is optimal since the grid resolution on each horizontal surface is the same in polar and equatorial regions. By contrast, grids used in ASH and PENCIL distribute grid points uniformly in latitudinal, longitudinal, and radial dimensions. Thus, dense grid points are concentrated in polar regions, and sparse grid points are scattered in equatorial regions. However, this distribution of grid points is not optimal since the polar regions, compared to the equatorial regions, are less susceptible to the Coriolis force and may exhibit fewer intricate small-scale flow structures.

CHORUS++ suffers from severe Courant–Friedrichs–Lewy (CFL) constraint when the compressible flows are in the regimes of very low Mach numbers since it is based on fully compressible models and explicit time stepping. To circumvent this issue, the stellar luminosity is artificially increased to achieve higher Mach numbers,^{17,25,26} and the rotational velocity is increased correspondingly to keep a solar-like Rossby number.

This paper is organized as follows. In Sec. II, RCC equations and simulation setups are introduced. Procedures for cubed-sphere boundary-conforming grid generation are given in Sec. III. The implementation of the SD method based on the transfinite mapping is described in Sec. IV. The parallelization of CHORUS++ is discussed in Sec. V. Numerical results on the laminar solar benchmark and its variants with different rotational velocities are presented in Secs. VI and VII.

## II. SIMULATION SETUP

To model global-scale RCC, governing equations for the hydrodynamics of an ideal gas are solved in a rotating spherical shell. The radius is equal to *r* = *r _{b}* at the inner spherical surface and

*r*=

*r*at the outer spherical surface. These boundaries are impenetrable. A constant heat flux is continuously fed into the spherical shell from the bottom boundary to induce thermal instability and ensuing convection.

_{t}### A. Governing equations

*z*axis with rotational velocity $ \Omega 0$. The governing equations for the RCC in a rotating reference frame are expressed in the form of hyperbolic conservation law as

*ρ*is the density, $ U = ( u , v , w )$ is the velocity vector, and

*E*is the total energy. Since an ideal gas is assumed, we have $ p = \rho R T$, where $R$ is the gas constant. $ F \xaf = ( F , G , H )$ is the total flux vector;

**F**,

**G**, and

**H**are three components of the flux in the

*x*,

*y*, and

*z*directions; and

**M**is the source term. The total flux $ F \xaf = F \xaf inv \u2212 F \xaf vis$ consists of the inviscid flux minus the viscous flux. The inviscid and viscous flux vectors are

*ρ*,

*p*, and

*E*are the density, hydrodynamic pressure, and total energy, respectively. The total energy

*E*is defined as

*γ*is the specific ratio of ideal gas and $ | | \xb7 | |$ is the Euclidean vector norm. $ \tau \xaf$ is the shear stress tensor,

*ν*is the kinematic viscosity, and $ \lambda = \u2212 2 / 3 \mu $ based on the Stokes' hypothesis. The heat flux vector $ q = \u2212 \kappa \rho T \u2207 S \u2212 \kappa r \rho C p \u2207 T$, where

*T*is the temperature,

*κ*is the entropy diffusion coefficient,

*κ*is the radiative diffusivity, and

_{r}*S*is the specific entropy, which is defined as

*C*is the specific heat at constant pressure. Here, the entropy diffusion term parameterizes the energy flux due to unresolved, subgrid-scale convective motions, which tend to mix entropy.

_{p}^{6,17,27,28}The source term

**M**is the combination of Coriolis force term and the gravitational force term, which is defined as

*G*is the gravitational constant, $ M \u2299$ is the total mass of gases in the solar convection zone, and $ r \u0302$ is the radial unit vector. $ \rho U \xb7 g$ is the work done by buoyancy.

### B. Boundary conditions

**n**is the outward-pointing unit normal vector. The temperature at the top boundary is fixed.

### C. Initial conditions

In this part, how to initialize the flow field to trigger thermal instability and initiate convection is introduced.

^{27}can give the initial profiles for pressure

*p*, density

*ρ*, and temperature

*T*as

*ρ*is an input to the codes, which refer to the gas density near the base of the solar convection zone and is set as $ \rho b = 0.21$ g cm

_{b}^{−3}; the subscripts

*b*and

*t*denote the bottom and top boundaries, respectively; and $ N \rho = ln ( \rho b / \rho t )$ is the number of density scale heights. However, the profiles in Eq. (9) do not satisfy the thermal equilibrium. To achieve an approximate thermal equilibrium, an almost flux balance approach

^{17}is utilized. If the simulation starts from a thermally relaxed state, the total flux through horizontal surfaces at any radial position should be constant,

^{8}

*p*,

*ρ*, and

*T*in Eq. (9), a target entropy gradient is computed,

*S*is the specific entropy at the bottom boundary. After this step, the polytropic solutions are not needed anymore in the following computations. Taking spatial derivative of the specific entropy in Eq. (4) along the radial dimension and using Eq. (8), we have

_{b}*ρ*,

*S*and entropy gradient Γ are already computed. Numerical integration of Eq. (15) gives the initial density profile, which should be a little different from the polytropic solution of the density in Eq. (9). Based on the initial density and entropy profiles, initial profiles of other thermodynamic variables, like pressure

*p*and temperature

*T*, can be computed.

The simulation starts from a thermally relaxed state. Thus, convection will not occur immediately. However, since a constant heat flux flows into the convection zone from the bottom boundary and the temperature at the top boundary is fixed, the temperature gradient will increase and gradually become supercritical. Then, thermal instability can be triggered and convection starts. In fact, how the thermal convection in a rotating spherical shell transitions and enters symmetry-breaking state depends on a wide range of parameters. However, this is not the focus of the current paper. To investigate the dependence of the thermal instability on non-dimensional parameters, Sánchez and Juan^{29} gave an efficient estimation based on dynamical system tools, which can serve as a parameter setting guide before direct numerical simulations.

## III. GRID GENERATION

The physical domain for simulations is a spherical shell. It is partitioned into non-overlapping unstructured hexahedral elements. Traditional grid generation uses isoparametric elements, which are constructed by polynomial-based shape functions defined on nodal points. However, shapes of the hexahedral elements in the current study are described by analytical functions. The current study ensures that elements seamlessly fill in the spherical shell without any singularity.

*r*( $ r b \u2264 r \u2264 r t$) in the radial dimension with equidistant spacings. The number of spacings is

*N*. Cubed-sphere gridding technique

_{r}^{30}is used to generate the surface meshes. It is based on a decomposition of the sphere into six identical regions, which are obtained by projecting the sides of a circumscribed cube onto a spherical surface, as shown in Fig. 1. Using this projection, tiling a cubed face with $ N a \xd7 N a$ square elements uniformly is equivalent to constructing the angularly equidistant grids on one of the six identical regions of the sphere. For example, for region,

^{1}angular variables

*α*and

*β*are chosen,

*N*= 20 and

_{a}*N*= 8. The total number of hexahedral elements in the grid is $ 6 N a 2 N r$.

_{r}## IV. NUMERICAL METHODS

### A. Transfinite mapping

*x*,

*y*,

*z*) into a standard element

^{31}

*ξ*and the arc angle ϑ. For 2D mappings, all elements have six faces, which can be divided into two types, as illustrated in Fig. 2. There are two faces of type 1. They are located on spherical surfaces and generated by the cubed-sphere surface meshing. Therefore, the 2D mapping $ P ( \xi , \eta , 0 )$, which maps a standard square face onto a face enclosed by the semicircular curves $ \alpha = \alpha i$ and $ \alpha i + 1$ and $ \beta = \beta j$ and $ \beta j + 1$ with radius

*r*, can be done by

When mapping one point $ ( \xi , \eta )$ in the standard square face to a point (*x*, *y*, *z*) in 3D space on a 1/6 spherical surface with radius *r*, one face among the six faces in Fig. 1, geometric errors are measured by $ \Delta r = x 2 + y 2 + z 2 \u2212 r$. Figure 5 shows that geometric errors introduced by the isoparametric mappings are exactly zero at nodal points where the shape functions are defined, for example, like corner points and mid-edge points. However, errors are extremely large in the center of the face. The maximum non-dimensional geometric error $ | \Delta r | / r$ is $ 42 % , \u2009 16 %$, and less than $ 10 \u2212 10 %$ for linear isoparametric mapping, quadratic isoparametric mapping, and transfinite mapping, respectively. Such large geometric errors introduced by the isoparametric mappings cannot be removed without massive meshing. By contrast, the geometric error introduced by the transfinite mapping is always near the error of machine precision and is irrelevant with the mesh resolution.

*J*for the mapping $ P ( \xi , \eta , \zeta )$ is

*q*,

_{x}*q*, and

_{y}*q*are three components of the heat flux vector

_{z}**q**in the

*x*,

*y*, and

*z*directions and

*τ*are the components of the shear stress tensor in Eq. (3).

_{ij}### B. Spectral difference method

*ξ*,

*η*or

*ζ*dimension,

*N*solution points (SPs) are required in each dimension. In each dimension, the SPs are chosen as the Chebyshev–Gauss points,

*N*. Hence, $ ( N + 1 )$ flux points (FPs) are needed in each dimension. They are denoted as $ X s + 1 / 2 , s = 0 , 1 , \u2026 , N$ and chosen as $ X 1 / 2 = 0$ and $ X N + 1 / 2 = 1$ plus $ ( N \u2212 1 )$ roots of the

*N*th order Legendre polynomial. In each dimension, Lagrange interpolation is used to construct solution polynomials and reconstruction flux polynomials with nodal Lagrange basis functions at the SPs and FPs,

^{32}(also known as Local Lax–Friedrichs solver) is employed to compute common inviscid fluxes at interfaces to ensure flux conservation. The Rusanov solver is implemented in the computational domain along interface normal direction. At the same time, to ensure that solutions are continuous across element interfaces, the common solution at element interfaces is computed as the average of values interpolated from the left and right elements (also known as the BR1 scheme

^{33}). Sun

*et al.*

^{34}proposed an efficient way to compute solution gradients in 3D unstructured grids,

### C. Time stepping scheme

^{35}

## V. PARALLEL COMPUTING

The CHORUS++ code is written in Fortran 90, and the message passing interface (MPI) is used for interprocessor communication. It uses the METIS package^{36} for domain decomposition. Before the parallel computations start, the root processor calls the mpmetis program, a serial program, to partition the mesh into parts. Then, each processor reads in its own part of the mesh. Due to the compactness of the SD method, most computations are done locally, except information communications occurring at element interfaces whose two neighboring elements are located in different processors. By comparing with the load for local computations, the load for communications is negligible since the number of FPs on interprocessor element interfaces is much smaller than the number of other FPs.

Figure 6 shows the strong scaling for CHORUS++ using third-, fourth-, and sixth-order elements. This test is carried on a cluster whose nodes are equipped with 2-sockets Intel Xeon Gold 6148 CPU with 20 cores each at 2.40 GHz for a total of 40 cores per node. Linear speedup is well achieved up to 40 processors for all orders. With the increase in the number of processors, CHORUS++ gradually deviates from the linear speedup due to increased portion of interprocessor communications in overall computational costs. When the number of processors becomes very large, CHORUS++ using sixth-order elements has better speedup than CHORUS++ using lower-order elements. This is consistent with the fact that the computational cost in the interiors of elements is proportional to *N*^{3} while the communication cost on the element interfaces is proportional to *N*^{2}.

## VI. LAMINAR SOLAR BENCHMARK

To test the accuracy of the CHORUS++ code, it is applied to the laminar solar benchmark proposed by Wang *et al.*^{17} This benchmark has been investigated thoroughly using both the original CHORUS code and ASH code and can serve as a test case for verification of CHORUS++. Parameters for the laminar solar benchmark are listed in Table I. We use the Sun values for the length of the solar convection zone, as reflected by values of *r _{t}* and

*r*. The radiative diffusivity is parameterized as $ \kappa r = \lambda r ( c 0 + c 1 \omega + c 2 \omega 2 )$, where $ c 0 = 1.560 \u2009 097 \u2009 5 \xd7 10 8 , \u2009 c 1 = \u2212 4.563 \u2009 171 \u2009 8 \xd7 10 7 , \u2009 c 2 = 3.337 \u2009 036 \u2009 8 \xd7 10 6$, and $ \omega = r \xd7 10 \u2212 10$. The parameter

_{b}*λ*is set by making sure that $ F r = L \u2299 / ( 4 \pi r 2 )$ at

_{r}*r*=

*r*, where

_{b}*F*is the radiative flux in Eq. (39). The current study artificially increases the stellar luminosity $ L \u2299$ 1000 times to achieve higher Mach numbers

_{r}^{17,25,26}so that the severity of the CFL constraint can be alleviated to some extent. However, with the boosted luminosity, the convective flow is still in a low-Mach regime since the maximum Mach number in the whole domain during simulations is less than 0.05. The flow velocities in the near-photospheric layers would not become supersonic since the number of density scale heights is chosen as 3, a small value representing that the change in the Mach number along the radial dimension would not be as drastic as the real Sun. As a consequence of the boosted luminosity, the root mean square (rms) of fluid velocity is enlarged ten times according to the mixing-length theory of convection that $ U rms \u2032 \u221d L \u2299 1 / 3$, where $ U rms \u2032 = \u27e8 U \u2032 2 \u27e9 \xaf$, $ \u27e8 \xb7 \u27e9$ represents the time-averaged operation, the overbar represents the spatially averaged operation, $ U \u2032 2 = u \u2032 2 + v \u2032 2 + w \u2032 2$, and $ u \u2032 = u \u2212 \u27e8 u \u27e9$. To preserve a solar-like Rossby number $ Ro = U rms \u2032 / ( 2 \Omega 0 d )$, the rotational velocity should be enlarged for ten times accordingly. Given that the rotational velocity for the real Sun is $ \Omega 0 Sun = 2.6 \xd7 10 \u2212 6$ s

^{−1}, the laminar solar benchmark considers a fast-rotating Sun and uses $ \Omega 0 = 8.1 \xd7 10 \u2212 5$ $ s \u2212 1 \u2248 31.2 \Omega 0 Sun$. Note that even if the value of $ ( L \u2299 ) 1 / 3 / \Omega 0$ in the simulation matches that of the real Sun, the convective structure and predicted convective velocities cannot match that on the Sun exactly. Much evidence suggests that the large-scale convective velocities obtained by solar convection simulations might be over-estimated. This is called the convective conundrum.

^{37}There are many ongoing works

^{38–40}about exploring factors that may affect the amplitude of the convective velocities and approaching the solar-like convection regime. Table II shows that the non-dimensional parameters for the Sun and laminar solar benchmark and disparity in Reynolds number can be noticed. Data for the Sun is from Jones

*et al.*

^{41}

Top boundary $ r t = 6.61 \xd7 10 10$ cm, bottom boundary $ r b = 4.87 \xd7 10 10$ cm, depth of the convection zone $ d = r t \u2212 r b = 1.74 \xd7 10 10$ cm, stellar luminosity $ L \u2299 = 3.846 \xd7 10 36$ ergs s^{−1}, gravitational constant $ G = 6.67 \xd7 10 \u2212 8$ g^{−2} cm^{2}, stellar mass $ M \u2299 = 1.988 \u2009 91 \xd7 10 33$ g, $ \rho b =$ 0.21 g cm^{−3}, $ N \rho = 3 , \u2009 \gamma = 5 / 3$, gas constant $ R = 1.4 \xd7 10 8$ ergs g^{−1} K^{−1}, $ C p = 3.5 \xd7 10 8$ ergs g^{−1} K^{−1}, kinematic viscosity $ \nu = 6.0 \xd7 10 13$ cm^{2 }s^{−1}, entropy diffusion coefficient $ \kappa = 6.0 \xd7 10 13$ cm^{2 }s^{−1}, $ \Omega 0 = 8.1 \xd7 10 \u2212 5$ s^{−1} |

Top boundary $ r t = 6.61 \xd7 10 10$ cm, bottom boundary $ r b = 4.87 \xd7 10 10$ cm, depth of the convection zone $ d = r t \u2212 r b = 1.74 \xd7 10 10$ cm, stellar luminosity $ L \u2299 = 3.846 \xd7 10 36$ ergs s^{−1}, gravitational constant $ G = 6.67 \xd7 10 \u2212 8$ g^{−2} cm^{2}, stellar mass $ M \u2299 = 1.988 \u2009 91 \xd7 10 33$ g, $ \rho b =$ 0.21 g cm^{−3}, $ N \rho = 3 , \u2009 \gamma = 5 / 3$, gas constant $ R = 1.4 \xd7 10 8$ ergs g^{−1} K^{−1}, $ C p = 3.5 \xd7 10 8$ ergs g^{−1} K^{−1}, kinematic viscosity $ \nu = 6.0 \xd7 10 13$ cm^{2 }s^{−1}, entropy diffusion coefficient $ \kappa = 6.0 \xd7 10 13$ cm^{2 }s^{−1}, $ \Omega 0 = 8.1 \xd7 10 \u2212 5$ s^{−1} |

Parameters . | Sun . | benchmark . |
---|---|---|

Rayleigh number $ Ra = G M \u2299 d \Delta S / ( \nu \kappa C p )$ | 10^{20} | $ 1.429 \xd7 10 6$ |

Reynolds number $ Re = U rms \u2032 d / \nu $ | 10^{12} | 11.72 |

Ekman number $ Ek = \nu / ( \Omega 0 d 2 )$ | $ 10 \u2212 14$ | $ 2.447 \xd7 10 \u2212 3$ |

Taylor number $ Ta = 4 \Omega 0 2 d 4 / \nu 2$ | $ 10 19 \u223c 10 27$ | $ 6.682 \xd7 10 5$ |

Prandtl number $ Pr = \nu / \kappa $ | $ 10 \u2212 6 \u223c 10 \u2212 4$ | 1 |

Rossby number $ Ro = U rms \u2032 / ( 2 \Omega 0 d )$ | 0.1–1 | $ 1.433 \xd7 10 \u2212 2$ |

Number of density scale heights $ N \rho = ln ( \rho b / \rho t )$ | 16 | 3 |

Parameters . | Sun . | benchmark . |
---|---|---|

Rayleigh number $ Ra = G M \u2299 d \Delta S / ( \nu \kappa C p )$ | 10^{20} | $ 1.429 \xd7 10 6$ |

Reynolds number $ Re = U rms \u2032 d / \nu $ | 10^{12} | 11.72 |

Ekman number $ Ek = \nu / ( \Omega 0 d 2 )$ | $ 10 \u2212 14$ | $ 2.447 \xd7 10 \u2212 3$ |

Taylor number $ Ta = 4 \Omega 0 2 d 4 / \nu 2$ | $ 10 19 \u223c 10 27$ | $ 6.682 \xd7 10 5$ |

Prandtl number $ Pr = \nu / \kappa $ | $ 10 \u2212 6 \u223c 10 \u2212 4$ | 1 |

Rossby number $ Ro = U rms \u2032 / ( 2 \Omega 0 d )$ | 0.1–1 | $ 1.433 \xd7 10 \u2212 2$ |

Number of density scale heights $ N \rho = ln ( \rho b / \rho t )$ | 16 | 3 |

Six different grids, G1, G2, …, G6, are used for computations of the laminar solar benchmark, as shown in Table III. According to the number of solution degrees of freedom (DoFs), they can be divided into two groups. G1, G3, and G5 belong to the first group, and G2, G4, and G6, which are refined 1.5 times along all dimensions, belong to the second group. For the SD method, DoFs measure computational costs per time step. Comparisons within a group allow us to investigate the superiority of high-order accuracy objectively.

No. of elements . | G1 $ 6 \xd7 ( 24 2 \xd7 16 )$ =55 296 . | G2 $ 6 \xd7 ( 36 2 \xd7 24 )$ =186 624 . | G3 $ 6 \xd7 ( 18 2 \xd7 12 )$ =23 328 . | G4 $ 6 \xd7 ( 27 2 \xd7 18 )$ = 78 732 . | G5 $ 6 \xd7 ( 12 2 \xd7 8 )$ = 6912 . | G6 $ 6 \xd7 ( 18 2 \xd7 12 )$ = 233 328 . |
---|---|---|---|---|---|---|

SD order | Third-order | Third-order | Fourth-order | Fourth-order | Sixth-order | Sixth-order |

DoFs | 1 492 992 | 5 038 848 | 1 492 992 | 5 038 848 | 1 492 992 | 5 038 848 |

Max $ \Delta t$ (s) | 17.9 | 11.5 | 13.8 | 8.6 | 8.5 | 5.0 |

CPU hours per time step | $ 1.03 \xd7 10 \u2212 3$ | $ 3.48 \xd7 10 \u2212 3$ | $ 1.05 \xd7 10 \u2212 3$ | $ 3.54 \xd7 10 \u2212 3$ | $ 1.01 \xd7 10 \u2212 3$ | $ 3.41 \xd7 10 \u2212 3$ |

CPU hours to run 10 days | 44.6 | 234.7 | 59.0 | 319.3 | 92.2 | 529.0 |

Valid or not | × | × | × | × | ✓ | ✓ |

No. of elements . | G1 $ 6 \xd7 ( 24 2 \xd7 16 )$ =55 296 . | G2 $ 6 \xd7 ( 36 2 \xd7 24 )$ =186 624 . | G3 $ 6 \xd7 ( 18 2 \xd7 12 )$ =23 328 . | G4 $ 6 \xd7 ( 27 2 \xd7 18 )$ = 78 732 . | G5 $ 6 \xd7 ( 12 2 \xd7 8 )$ = 6912 . | G6 $ 6 \xd7 ( 18 2 \xd7 12 )$ = 233 328 . |
---|---|---|---|---|---|---|

SD order | Third-order | Third-order | Fourth-order | Fourth-order | Sixth-order | Sixth-order |

DoFs | 1 492 992 | 5 038 848 | 1 492 992 | 5 038 848 | 1 492 992 | 5 038 848 |

Max $ \Delta t$ (s) | 17.9 | 11.5 | 13.8 | 8.6 | 8.5 | 5.0 |

CPU hours per time step | $ 1.03 \xd7 10 \u2212 3$ | $ 3.48 \xd7 10 \u2212 3$ | $ 1.05 \xd7 10 \u2212 3$ | $ 3.54 \xd7 10 \u2212 3$ | $ 1.01 \xd7 10 \u2212 3$ | $ 3.41 \xd7 10 \u2212 3$ |

CPU hours to run 10 days | 44.6 | 234.7 | 59.0 | 319.3 | 92.2 | 529.0 |

Valid or not | × | × | × | × | ✓ | ✓ |

. | CHORUS . | ASH . | Third-order CHORUS++ . |
---|---|---|---|

DoFs | 19 660 800 | $ 100 \xd7 256 \xd7 512 = 2 \u2009 907 \u2009 000$ | $ 6 \xd7 ( 75 2 \xd7 50 ) \xd7 3 3 = 45 \u2009 562 \u2009 500$ |

$ \Delta t$ (s) | 4 | 20 | 5.3 |

CPU hours per time step | $ 8.03 \xd7 10 \u2212 2$ | $ 5.32 \xd7 10 \u2212 3$ | $ 3.16 \xd7 10 \u2212 2$ |

CPU hours to run 10 days | 17 352.0 | 230.0 | 4624.9 |

. | CHORUS . | ASH . | Third-order CHORUS++ . |
---|---|---|---|

DoFs | 19 660 800 | $ 100 \xd7 256 \xd7 512 = 2 \u2009 907 \u2009 000$ | $ 6 \xd7 ( 75 2 \xd7 50 ) \xd7 3 3 = 45 \u2009 562 \u2009 500$ |

$ \Delta t$ (s) | 4 | 20 | 5.3 |

CPU hours per time step | $ 8.03 \xd7 10 \u2212 2$ | $ 5.32 \xd7 10 \u2212 3$ | $ 3.16 \xd7 10 \u2212 2$ |

CPU hours to run 10 days | 17 352.0 | 230.0 | 4624.9 |

All simulations run up to the time when the equilibration of components of the energy flux almost reaches. Neglecting structural stellar evolution, the thermal relaxation occurs in the thermal timescale or Kelvin–Helmholtz timescale $ T K H = E t / L \u2299$, where *E _{t}* is the total thermal energy. For rotating stars,

*T*is generally very large and can exceed 10

_{KH}^{5}years. However, a more relevant timescale for the equilibration of the convection is the thermal diffusion timescale $ T d = d 2 / \kappa $, which equals to $ 58.44 T \u2299$ for the solar benchmark, where $ T \u2299 = 2 \pi / \Omega 0$ is one rotational period of time for stars. To ensure final equilibration, all simulations run up to $ t = T f = 4.8 \xd7 10 5$ s $ \u2248 61.88 T \u2299$. This simulation time is much longer than the affordable simulation time for the original CHORUS code,

^{17}which is $ 15 T \u2299$. The time interval $ T f < t < T f + 30 T \u2299$ is chosen as the sampling time for computing time-averaged values. $ U rms \u2032 = 4.0398 \xd7 10 4$ cm s

^{−1}for the laminar solar benchmark, and non-dimensional parameters in Table II are computed based on this value.

*F*, radiative flux

_{e}*F*, entropy flux

_{r}*F*, and kinetic energy flux

_{u}*F*. Appendix B in Ref. 26 derives the equation of flux balance in detail. When the system reaches a statistically steady state, the sum of these four fluxes should equal to the full luminosity imposed at the bottom boundary

_{k}*t*=

*T*for solutions computed from CHORUS++ on the six grids. To make a comparison, results based on CHORUS and ASH are also attached.

_{f}^{17}Due to prohibitive computational costs, even with Yellowstone high-performance computing clusters at National Center for Atmospheric Research (NCAR), CHORUS fails to run the benchmark until equilibration of the energy flux, as $ max ( ( F e + F r + F u + F k ) / F \u2299 \u2212 1 ) \u2248 30 %$ in Fig. 9(g). Using sixth-order CHORUS++, predicted components of the energy flux are smooth in Figs. 9(e) and 9(f). High consistency between them verifies the convergence of simulations. The energy flux balance is almost achieved for G5 and G6 at

*t*=

*T*since $ max ( ( F e + F r + F u + F k ) / F \u2299 \u2212 1 ) \u2248 5 %$. Results show that the radiative flux

_{f}*F*dominates the energy flux in the lower convection zone while the entropy flux

_{r}*F*dominates in the upper convection zone. Both

_{u}*F*and

_{e}*F*decrease to 0 at boundaries due to the impenetrable boundary conditions. The enthalpy flux

_{k}*F*peaks at about $ r = 0.95 r t$ with a peak value close to $ 0.3 L \u2299$. By contrast, all third- and fourth-order CHORUS++ fail at the same DoFs since spurious oscillations of

_{e}*F*appear. These oscillations are not physical and stem from lack of spatial discretization accuracy. Increasing mesh resolution can alleviate this issue. Refining G3 to G4, the amplitude of unphysical oscillations in

_{u}*F*decreases significantly. Spurious oscillations of

_{u}*F*computed from G1 and G2 are too violent to show with the same scale of other components. Refining G1 to G2 also results in a physical prediction of the enthalpy flux

_{u}*F*. With improved mesh resolution, boundary conditions are better satisfied. It can be noticed that for G3, the boundary condition of heat flux at the bottom boundary is not even satisfied well. After refining the mesh from G3 to G4, $ F r + F u = F \u2299$ satisfies extremely well at both the bottom and top boundaries. However, the sixth-order CHORUS++ does not have all these problems at the same DoFs. It demonstrates that given the same DoFs, sixth-order accuracy provides enhanced simulation fidelity. To achieve the same level of accuracy, higher-order CHORUS++ needs much fewer DoFs.

_{e}Table III compares the computing efficiency of CHORUS++, CHORUS, and ASH. Data for CHORUS and ASH is collected from Ref. 17. Sixth-order CHORUS++ is far more efficient than CHORUS. However, since CPU hours for CHORUS++ and CHORUS are measured on different computing architectures, this conclusion needs to be further verified. Based on the current third-order CHORUS++, we refine the mesh resolution until the energy flux is fully resolved. Numerical experiments show that *N _{r}* = 50 is needed to get converged components of the energy flux. The sixth-order CHORUS++ runs $ 4624.9 / 529.0 \u2248 8.7$ times faster than the third-order CHORUS++. This amazing speedup enables us to carry out the RCC simulation on G6 until equilibration of components of the energy flux using just one Intel Xeon Gold 6148 Processor about 1.5 weeks. Here, one Intel Xeon Gold 6148 Processor has 40 total threads and 2.4 GHz base frequency. If we neglect the difference of computing facilities, the computing efficiency of the sixth-order CHORUS++ code is even close to that of the ASH code. However, CHORUS++ is implemented on unstructured grids, based on equations of compressible flows, and orders of spatial accuracy can be chosen arbitrary.

^{42}is formulated as

*h*is the element size, and

*p*is the degree of the polynomials for local reconstruction. For the SD method, $ p o = N \u2212 1$, where

_{o}*N*is the order of the SD method. Given the computational domain size and the number of computational elements in the radial direction is

*l*and

*N*and the total number of DoFs in the radial direction is fixed as $ N DoFs$, it follows that $ h = l / N e , \u2009 N = N DoFs / N e$. We then reach the expression that

_{e}*κ*. Overall, increasing the polynomial orders significantly improves the spatial accuracy at the sacrifice of acceptably shortened time steps.

_{r}## VII. VARYING THE ROTATIONAL VELOCITY

Vasil *et al.*^{43} predicted that the dynamical Rossby number for real solar convection is less than unity below the near-surface shear layer, and, therefore, the convection is rotationally constrained. Investigation of the rotational effect is of great important to studies of the solar convection. Rotation tends to suppress convection.^{44} For a weak rotation, small-scale flow structures are reduced with increased rotational velocity.^{45}

By varying the rotational velocity $ \Omega 0$ in the laminar solar benchmark, we investigate the rotationally constrained effect on convection. Five simulations with $ \Omega 0 = 2.6 \xd7 10 \u2212 5 , \u2009 3.7 \xd7 10 \u2212 5 , \u2009 4.8 \xd7 10 \u2212 5 , 5.9 \xd7 10 \u2212 5$, and $ 7.0 \xd7 10 \u2212 5$ s^{−1} are computed on G6 using sixth-order elements. Along with the original solar benchmark with $ \Omega 0 = 8.1 \xd7 10 \u2212 5$ s^{−1}, six cases are listed in Table IV with non-dimensional parameters quoted.

$ \Omega 0$ (s^{−1})
. | $ 2.6 \xd7 10 \u2212 5$ . | $ 3.7 \xd7 10 \u2212 5$ . | $ 4.8 \xd7 10 \u2212 5$ . | $ 5.9 \xd7 10 \u2212 5$ . | $ 7.0 \xd7 10 \u2212 5$ . | $ 8.1 \xd7 10 \u2212 5$ . |
---|---|---|---|---|---|---|

$ U rms \u2032$ (cm s^{−1}) | $ 8.573 \xd7 10 4$ | $ 7.548 \xd7 10 4$ | $ 6.697 \xd7 10 4$ | $ 5.745 \xd7 10 4$ | $ 4.922 \xd7 10 4$ | $ 4.040 \xd7 10 4$ |

Reynolds number Re | 24.86 | 21.89 | 19.42 | 16.66 | 14.27 | 11.72 |

Rossby number Ro | $ 3.041 \xd7 10 \u2212 2$ | $ 2.678 \xd7 10 \u2212 2$ | $ 2.376 \xd7 10 \u2212 2$ | $ 2.038 \xd7 10 \u2212 2$ | $ 1.746 \xd7 10 \u2212 2$ | $ 1.433 \xd7 10 \u2212 2$ |

$ \Omega 0$ (s^{−1})
. | $ 2.6 \xd7 10 \u2212 5$ . | $ 3.7 \xd7 10 \u2212 5$ . | $ 4.8 \xd7 10 \u2212 5$ . | $ 5.9 \xd7 10 \u2212 5$ . | $ 7.0 \xd7 10 \u2212 5$ . | $ 8.1 \xd7 10 \u2212 5$ . |
---|---|---|---|---|---|---|

$ U rms \u2032$ (cm s^{−1}) | $ 8.573 \xd7 10 4$ | $ 7.548 \xd7 10 4$ | $ 6.697 \xd7 10 4$ | $ 5.745 \xd7 10 4$ | $ 4.922 \xd7 10 4$ | $ 4.040 \xd7 10 4$ |

Reynolds number Re | 24.86 | 21.89 | 19.42 | 16.66 | 14.27 | 11.72 |

Rossby number Ro | $ 3.041 \xd7 10 \u2212 2$ | $ 2.678 \xd7 10 \u2212 2$ | $ 2.376 \xd7 10 \u2212 2$ | $ 2.038 \xd7 10 \u2212 2$ | $ 1.746 \xd7 10 \u2212 2$ | $ 1.433 \xd7 10 \u2212 2$ |

^{−1}, downflows become so strong that an inward kinetic energy flux indicated by negative

*F*is witnessed and the inward

_{e}*F*grows in the case of $ \Omega 0 = 2.6 \xd7 10 \u2212 5$ s

_{e}^{−1}. A strong inward kinetic energy flux must be compensated by an enhanced outward enthalpy flux,

^{9}which is consistent with our results. To investigate differential rotation, we define longitudinally averaged angular velocity of the fluid around the

*z*-axis

*t*=

*T*for varied rotational velocity of the spherical shell. Although all current simulations predict a fast-rotating equator and slow-rotating poles, the predicted contrast of angular velocity is far from that in the Sun. For the Sun, $ \Omega 0 Sun = 2.6 \xd7 10 \u2212 6$ s

_{f}^{−1}, corresponding to $ T \u2299 Sun \u2248 28 T \u2299 Earth$. The rotational period of time for the Sun is approximately 25 $ T \u2299 Earth$ near the equator and 35 $ T \u2299 Earth$ near the poles. This corresponds to $ \Delta \Omega / \Omega 0 \u2248$ 0.12 near the equator and −0.2 near the poles. Decreasing $ \Omega 0$ would enhance the contrast of angular velocity in general, and there is one exception. The amplitude of contrast does not change significantly for cases of $ \Omega 0 = 2.6 \xd7 10 \u2212 5$ and $ \Omega 0 = 3.7 \xd7 10 \u2212 5$ s

^{−1}. To further enhance the differential rotation, decreasing the viscosity of the fluid and increasing the Rossby number can be choices.

## VIII. CONCLUSIONS AND DISCUSSIONS

CHORUS is the first stellar convection code that employs unstructured grids. In the current research, we further improve it to CHORUS++ for rotationally constrained convection simulations in spherical shells. Like CHORUS, CHORUS++ is based on fully compressible models, flexible on grids and suitable for massively parallel computing. However, compared with CHORUS which uses the isoparametric mapping, CHORUS++ embeds a boundary-conforming transfinite mapping into the spectral difference (SD) method on cubed-sphere grids, thus achieving exact representations of spherical surfaces on arbitrary sparse grids. This allows us to carry out high-fidelity simulations of a laminar solar benchmark using sixth-order elements on very coarse grids. Banana-shaped convective cells are well resolved without spurious oscillations. Given a relatively small number of solution degrees of freedom, using sixth-order elements produces smooth and converged predictions for components of the radial energy flux while using third- and fourth-order elements fails. To give converged predictions, using sixth-order elements in CHORUS++ is 8.7 times faster than using third-order elements. Moreover, using high-order elements is shown to have enhanced speedup when the number of processors becomes very large. It takes only 1.5 weeks to run the global-scale laminar solar convection until equilibration of energy flux using just one Intel Xeon Gold 6148 Processor of 40 threads, which is not possible for the original CHORUS code even on supercomputers. This efficiency is even comparable to the anelastic spherical harmonic (ASH) code. To test the applicability of using sixth-order elements in CHORUS++ to cases with more complex convective patterns, CHORUS++ is applied to rotating convection simulations with decreased rotational velocity. Asymmetry between upflows and downflows enhances. Downflow lanes become narrow and strong, separating broad and weak upflow regions. Calculations of the energy flux are aligned with observations of strong downflow lanes: a strong inward kinetic energy flux appears, and as a compensation, the outward enthalpy flux enhances. These observations are consistent with decreasing rotational constraints and other results from the literature.

In reality, the interiors of many stars, including the Sun, and gas giants, planets like Jupiter and Saturn, consist of ionized gases (plasmas), which behave as conducting fluids. To model motions of conducting fluids and ensuing evolution of magnetic fields inside these fluids, CHORUS++ needs to be extended to magnetohydrodynamic (MHD) simulations. To achieve this goal, a potential challenge is an efficient way to deal with the divergence-free constraint of the magnetic field on unstructured grids. An increase in the divergence error would lower the accuracy, affect the stability, and even induce unphysical behaviors of the MHD system.^{46} Chen and Liang^{47} showed that the divergence cleaning approach proposed by Derigs *et al.*^{48} can be coupled with the SD method to control the divergence error on unstructured grids. With the assist of the artificial dissipation method, the resulting method can simulate both subsonic and supersonic MHD flows efficiently. It paves the way for simulations of high-Mach number solar dynamo, for example, dynamo in red giants and photosphere of the Sun.

## ACKNOWLEDGMENTS

The research work of the first two authors has been supported by a National Science Foundation (NSF) CAREER Award (No. 1952554) and an Air Force Office of Scientific Research (AFOSR) grant (Award No. FA9550-20-1-0374) monitored by Dr. Fariba Fahroo. M. W. acknowledges the support from NSFC (Grant No. 12225204), Department of Science and Technology of Guangdong Province (Grant No. 2020B1212030001), and the Shenzhen Science and Technology Program (Grant No. KQTD20180411143441009). Numerical simulations have been supported by the ACRES cluster at Clarkson University and the Center for Computational Science and Engineering at Southern University of Science and Technology. The first author would like to acknowledge helpful discussions with Dr. Junfeng Wang at Simerics Inc.

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

### Author Contributions

**Kuangxu Chen:** Conceptualization (equal); Formal analysis (lead); Investigation (lead); Validation (lead); Writing – original draft (lead). **Chunlei Liang:** Conceptualization (equal); Formal analysis (supporting); Funding acquisition (equal); Project administration (equal); Resources (equal); Supervision (equal); Writing – review & editing (equal). **Minping Wan:** Formal analysis (supporting); Funding acquisition (equal); Project administration (equal); Resources (equal); Supervision (equal); Writing – review & editing (equal).

## DATA AVAILABILITY

The data that support the findings of this work are available from the corresponding author upon reasonable request.

## REFERENCES

*Basic Structured Grid Generation: With an Introduction to Unstructured Grid Generation*

*Colloquium*: Unusual dynamics of convection in the Sun

*Nodal Discontinuous Galerkin Methods: Algorithms, Analysis, and Applications*