Benchmark problems for transcranial ultrasound simulation: Intercomparison of compressional wave models

Computational models of acoustic wave propagation are frequently used in transcranial ultrasound therapy, for example, to calculate the intracranial pressure field or to calculate phase delays to correct for skull distortions. To allow intercomparison between the different modeling tools and techniques used by the community, an international working group was convened to formulate a set of numerical benchmarks. Here, these benchmarks are presented, along with intercomparison results. Nine different benchmarks of increasing geometric complexity are defined. These include a single-layer planar bone immersed in water, a multi-layer bone, and a whole skull. Two transducer configurations are considered (a focused bowl and a plane piston operating at 500 kHz), giving a total of 18 permutations of the benchmarks. Eleven different modeling tools are used to compute the benchmark results. The models span a wide range of numerical techniques, including the finite-difference time-domain method, angular spectrum method, pseudospectral method, boundary-element method, and spectral-element method. Good agreement is found between the models, particularly for the position, size, and magnitude of the acoustic focus within the skull. When comparing results for each model with every other model in a cross-comparison, the median values for each benchmark for the difference in focal pressure and position are less than 10% and 1 mm, respectively. The benchmark definitions, model results, and intercomparison codes are freely available to facilitate further comparisons.

Computational models of acoustic wave propagation are frequently used in transcranial ultrasound therapy, for example, to calculate the intracranial pressure field or to calculate phase delays to correct for skull distortions.To allow intercomparison between the different modeling tools and techniques used by the community, an international working group was convened to formulate a set of numerical benchmarks.Here, these benchmarks are presented, along with intercomparison results.Nine different benchmarks of increasing geometric complexity are defined.These include a single-layer planar bone immersed in water, a multi-layer bone, and a whole skull.Two transducer configurations are considered (a focused bowl and a plane piston), giving a total of 18 permutations of the benchmarks.Eleven different modeling tools are used to compute the benchmark results.The models span a wide range of numerical techniques, including the finite-difference time-domain method, angular-spectrum method, pseudospectral method, boundary-element method, and spectral-element method.Good agreement is found between the models, particularly for the position, size, and magnitude of the acoustic focus within the skull.When comparing results for each model with every other model in a cross comparison, the median values for each benchmark for the difference in focal pressure and position are less than 10% and 1 mm, respectively.The benchmark definitions, model results, and intercomparison codes are freely available to facilitate further comparisons.(10 February 2022)  Pages: 1-18 Ultrasound is increasingly used for therapeutic applications in the brain, including for tissue ablation, 1 opening the blood-brain barrier, 2 and for the modulation of brain activity. 3One challenge is the non-invasive delivery of ultrasound through the skull bone, which can significantly distort and attenuate the transmitted waves. 4o account for this, computer simulations are now frequently used to make predictions of the intracranial pressure field, 5 and to correct for phase aberrations due to the skull. 6This is particularly important for transcranial ultrasound stimulation (TUS), as the low ultrasound intensities make it highly challenging to measure the delivered energy in vivo, e.g., using MR-guided thermometry. 7t a high level, there are four main steps in the setup of an acoustic model for transcranial ultrasound: (1)  defining the medium parameters, including the skull and soft tissue geometry and the acoustic properties (using a medical image, for example); (2) defining the transducer characteristics, including the geometry, driving parameters, and relative position; (3) defining the numerical parameters for the model, including the grid resolution and boundary conditions; and (4) processing and interpreting the simulated results.One challenge for the community is that there is a large variation in these steps in the published literature, and there is currently little consensus on the best approach or the uncertainties associated with numerical modeling more generally.
As part of the International Transcranial Ultrasonic Stimulation Safety and Standards (ITRUSST) consortium, a working group focused on simulation and planning was convened.The primary goal was to perform a modeling intercomparison to systematically evaluate the steps involved in transcranial simulation, with a view to establishing best practice.A number of researchers active in the development of tools for transcranial ultrasound simulation were invited to take part.The first phase, which is reported here, was a model intercomparison using a series of numerical benchmarks relevant to transcranial ultrasound where the medium parameters and transducer characteristics were well defined.The primary research question was: do different modeling techniques and computer codes give the same answer when the inputs to the model are well specified?This was taken as the first step to ensure that any differences in more complicated scenarios (e.g., where the skull properties are mapped from a medical image, or the transducer properties are mapped from a hydrophone measurement) could a) Authors listed alphabetically b) Corresponding author; b.treeby@ucl.ac.uk be evaluated as systematically and independently as possible.
The working group met regularly throughout 2021.The list of benchmarks (discussed in Sec.II) was iteratively refined, including the source definitions, the medium geometry, the material properties, and the output domain size.File submission formats, mechanisms for data sharing, and comparison metrics (along with codes to compute them 8 ) were also defined.Benchmark submissions were non-blinded with multiple resubmissions allowed.The goal was not a competition to establish which model was the "best" by some definition.Rather, it was to establish consensus on different approaches to transcranial ultrasound modeling, and how to implement these correctly using a range of modeling tools available to the community.In this spirit, work-inprogress results and comparison metrics were discussed at regular intervals.These discussions, along with the sharing of code, approaches, and processing steps, etc, ultimately allowed the benchmarks to be computed with a wide set of simulation tools with excellent agreement (see Sec. IV).
The primary goal of this phase of the intercomparison exercise was to establish a series of benchmarks relevant to transcranial ultrasound, along with consensus on the correct numerical solutions for these benchmarks.Consequently, simulations were typically performed with very high sampling to maximize accuracy.Because of this emphasis, and the different computational resources available to each group, a comparison of the computational performance of the individual models was considered out-of-scope from the outset.However, it is still important to note that some models used in the intercomparison, in particular those based on the angular spectrum method, have an efficiency/accuracy trade-off inherent in their formulation.This should be considered when interpreting the intercomparison metrics presented in Sec.IV and the Supplementary Information.
The final output from the intercomparison exercise is a set of 9 well-defined numerical benchmarks relevant to transcranial ultrasound (with a total of 18 permutations of these benchmarks), along with publicly available simulation results for these benchmarks computed using 11 different modeling codes. 8,9.BENCHMARKS

A. Overview
The benchmarks were defined considering typical TUS scenarios, although they are also relevant to other applications of transcranial ultrasound.Simulations were single frequency (time-harmonic) and performed assum- ing linear wave propagation (the maximum pressure amplitude was less than 1.1 MPa for all benchmarks).Only compressional waves were considered, which is appropriate when the ultrasound waves are close to normal incidence to the skull bone. 10All simulations were conducted in 3D.

B. Transducer characteristics
Two transducer definitions were used (see Fig. 1).The first was a focused bowl transducer with a 64 mm radius of curvature and a 64 mm aperture diameter.This is representative of the single-element transducers frequently used for TUS. 11The second was a plane piston transducer with a diameter of 20 mm.Piston transducers are often used in multi-element arrays.While the typical diameter of an element in a multi-element array is smaller than 20 mm, this diameter was used to provide identifiable beam characteristics within the simulation domain.For some numerical techniques, piston transducers are easier to model, particularly when aligned with the computational grid which avoids staircasing artifacts. 12oth transducers were driven at 500 kHz with a constant surface velocity of 0.04 m/s. 13Assuming an acoustic impedance of 1.5 MRayls, this is equivalent to modeling the sources as a distribution of free-field monopole radiators with a source pressure of 60 kPa.

C. Material properties
The material properties used for the benchmarks are given in Table I.5][16][17][18][19][20][21] For the simulations including absorption, the loss is defined to be non-dispersive, i.e., either frequency independent, or for power law models, dependent on frequency squared.

D. Simulation outputs
The simulation results were stored as two variables named p_amp and p_phase.These represent the amplitude and phase of the complex pressure field at 500 kHz over the specified comparison domain.For time domain models, these parameters can be extracted precisely by setting the time step to an integer number of points per period (PPP), recording the steady state pressure field for an integer number of periods, and then extracting the amplitude and phase at the driving frequency using a Fourier transform.Note, the phase is optional and was not used for the comparisons presented in Sec.IV, but was included for completeness.The results were saved either as MATLAB .matfiles using the '-v7.3'flag where possible (this format can be easily opened as a HDF5 file outside MATLAB), or as HDF5 .h5files with the variables saved as datasets in the root group.
Regardless of the sampling or mesh used for the simulations, the outputs stored in the comparison files were re-sampled onto a uniform Cartesian grid with 0.5 mm grid sampling.This corresponds to six points per wavelength (PPW) in water.For benchmarks 1 to 6, the comparison domain size was a 120 × 70 mm (axial × lateral) slice through the central z-coordinate, corresponding to a grid size of 241 × 141 grid points.For benchmark 7, the comparison domain was 120 × 70 × 70 mm (241 × 141 × 141 grid points).For benchmark 8, the comparison domain was 225 × 170 × 190 mm (451 × 341 × 381 grid points).For benchmark 9, the comparison domain was 212 × 224 × 184 mm (425 × 449 × 369 grid points).
For all benchmarks, the transducer was oriented such that the beam axis pointed in the x-dimension, with the transducer positioned in the center of the y/z-dimensions.Using 1-based indexing, the center of the source (rear of the bowl or center of the piston) relative to the output grid was positioned at [1, 71] for benchmarks 1 to 6, [1, 71, 71] for benchmark 7, [1, 171, 191]  for benchmark 8, and [1, 225, 185] for benchmark 9. Note, the comparisons for benchmarks 1 to 6 were made in 2D due to the axisymmetry of the geometry.All simulations were conducted in 3D.

E. Naming convention
The benchmarks were given unique identifiers in the following format: PH<NUM>-BM<NUM>-SC<NUM>. PH (phase) identifies the intercomparison phase (in this case 1).
BM (benchmark) identifies the benchmark number within the phase.SC (source) identifies the source condition, where 1 is the bowl source, and 2 is the plane piston source.A summary of the different benchmarks is given in Table II.File names for the intercomparison results follow the same convention with the model name appended (see Table IV): PH<NUM>-BM<NUM>-SC<NUM>_<MODELNAME>.The simulation outputs for each model for each benchmark are publicly available. 9

F. Benchmarks
A total of 9 benchmarks relevant to transcranial ultrasound were devised.These are summarized in Table II.The benchmarks gradually increase in complexity, adding both additional tissue layers, and increasing the  I.
geometric complexity of the skull.Benchmarks 1 to 7 are illustrated in Fig. 1 while benchmarks 8 and 9 are illustrated in Fig. 2. Benchmark 1 considers the bowl and piston transducers in water (free-field) using the properties given in Table I.Benchmark 2 adds uniform artificial absorption of 1 dB/cm at 500 kHz.During the intercomparison exercise, these benchmarks served as a helpful reference to ensure the transducer properties, absorption units, and comparison domain were correctly specified.3][24] Calculations using FOCUS were performed using 5000 integration points to give a high level of precision.Several models used the fields computed using FOCUS across a transverse y-z plane as the source definition (see Sec. III).
Benchmark 3 introduces a single flat 6.5 mm layer of cortical bone immersed in water, positioned 30 mm from the transducer as shown in Fig. 1.Benchmark 4 extends this to include a 4 mm skin layer, a three-layered skull (consisting of 1.5 mm cortical bone for the outer table, 4 mm trabecular bone, and 1 mm cortical bone for the inner table, giving the same overall skull thickness and position as benchmark 3), with water on the exterior and brain on the interior as shown in Fig. 1.The thickness values are based on average values for parietal bone 25 and scalp. 26enchmark 5 increases the geometric complexity of benchmark 3 by using a curved 6.5 mm layer of cortical bone immersed in water, with inner and outer radii of 68.5 mm and 75 mm, respectively.Note, the bone layer is spherically (not cylindrically) curved, meaning the curvature in the out-of-plane dimension is the same as that shown in Fig. 1.Benchmark 6 is a curved extension of benchmark 4, where the thickness values correspond to differences in the curvature radii.
Benchmarks 7 to 9 increase the geometric complexity further by using a homogeneous skull mesh generated from the MNI152_T1_1mm magnetic resonance imag- ing template brain. 27,28The template image was run through an adapted version of SimNIBS headreco. 29Additional smoothing of the tissue surfaces while simultaneously preventing intersections between neighboring surfaces was performed using SimNIBS functions.Benchmarks 7 and 8 use a transducer position targeted at the foveal representation of the primary visual cortex, while benchmark 9 uses a transducer position targeted at the hand area of the primary motor cortex.
The skull mesh was stored as two .stlfiles representing the inner and outer surfaces of the skull bone.Position transforms were stored as 3D affine transformations which position the transducer relative to the coordinates in the .stlfiles.Grid-based discretizations containing a binary skull mask were also generated using the iso2mesh MATLAB toolbox. 30,31These were generated on a regular Cartesian mesh at a range of resolutions after applying the appropriate inverse position transforms (to move the skull mesh relative to the transducer), and were truncated to the appropriate comparison domain (see Sec. II D).The skull files and position transforms are available alongside the simulations results. 9

G. Intercomparison metrics
A number of metrics were chosen to compare the simulated fields.Mathematical definitions for some metrics are given in Table III.Metrics based on the entire field were taken from the exit plane of the source, excluding the first grid point in the x-direction for the piston transducer, and the first 19 grid points in the x-direction for the bowl transducer.The relative L 2 and L ∞ errors provide a useful (and strict) measure of the overall differences between simulations.However, for more complex geometries, these become dominated by differences in the rapidly-varying near-field region between the source and the skull.For this reason, differences in the focal characteristics were also computed.This included the  Here p1 and p2 are the amplitude of the pressure field over the 2D or 3D comparison domains for the reference field and comparison field, respectively (these are assumed to be positive).Sums and maximum values are assumed to be over all values in the comparison domain starting from the exit plane of the transducer.Focal values are taken from inside the brain (or post-skull) region only.pos max is used to denote the position of the maximum value in the comparison domain.

Metric Definition
Relative L 2 Focal (peak) pressure Focal position pos max(p1) − pos max(p2) 2 comparison domain (i.e., in the x and y dimensions for benchmarks 1 to 6, and in the x, y, and z dimensions for benchmarks 7 to 9).For benchmarks 3 to 9 for the piston transducer, the acoustic field gradually decays within the brain, thus there is no natural focus in the axial direction.In this case, the axial focal position, and lateral profiles and FWHM values were taken at x = 60 mm (corresponding to a grid index of 121).For benchmarks 7 to 9, differences in the -6 dB focal volume were also computed.The focal volume was calculated by thresholding the pressure field inside the brain to 50% of the maximum value, and then counting the voxels in the largest connected component.Code to compute the intercomparison metrics is available on GitHub. 8

III. MODELS A. Overview
A total of 11 modeling tools were used for the intercomparison, in addition to the free-field reference values calculated using FOCUS discussed in Sec.II F. These are summarized in Table IV.A short description of each model is given in the following sections, with additional details given in the Supplementary Material.

B. BABELVISCOFDTD
BabelViscoFDTD solves the viscoelastic wave equation expressed in stress tensors and displacement vectors, where the bone material is modeled as a viscoelastic isotropic medium. 19The term "Babel" refers to the multiple computing backends (CUDA, OpenCL, Apple Metal and X86-64) that are supported for calculations.Nodes of stress and displacement are placed in a staggered grid arrangement. 32Calculations are solved using a 4th-order in space and 2nd-order in time finite-difference time-domain (FDTD) scheme in Cartesian coordinates. 33,34tress tensors and displacement vectors are solved a half time step separated from each other.Attenuation losses are modeled using a quality factor for narrowband conditions. 34,35Liquid-bone interfaces and heterogeneity of tissue material are modeled using averaging operators. 36Optional reduction of staircasing artifacts can be enabled using a superposition operator. 37A perfectly matched layer (PML) condition for viscoelastic propagation is used to absorb waves at the boundaries. 38ll benchmarks were computed using a resolution of 12 grid PPW.Sources were modeled as stress nodes using the same staircase-free formulation and dispersion correction as in the k-Wave model (see Sec. III G).The time PPP for benchmarks 1 and 2 was 25, and 48 for benchmarks 3 to 9. Benchmarks 1 to 7 used a total grid size of 305 × 305 × 521 grid points including the PML.For benchmarks 8 and 9 the grid size was, respectively, 785 × 705 × 941 and 761 × 921 × 889.Simulation outputs were resampled to the comparison grid using a spline interpolation of order 3.

C. FULLWAVE
Fullwave2 3D solves the wave equation with quadratic nonlinearity and multiple relaxations using a staggered-grid FDTD approach with fourth-order accuracy in time and variable accuracy in space. 39,40This model uses a staggered-grid Cartesian mesh with a convolutional PML at the boundaries, utilizing high-order adaptive stencils that minimize dispersion and dissipation errors.The source and output can take the shape of any arbitrary geometry that can be defined on a Cartesian grid, with sources modeled either as free-field particle displacement, velocity, or a monopole pressure source.
For the benchmark comparison, the bowl and piston geometries were modeled as monopole pressure sources on a Cartesian grid, emitting a continuous sinusoidal wave.All benchmarks were computed with 12 PPW and 60 PPP, giving a Courant-Fredrichs-Lewy condition (CFL) 41 of 0.2.This created a simulation grid 2× the size of the comparison grid.To account for this, the simulations were run with a spatial step size of 2 voxels in each direction, downsampling the output grid to the comparison grid size.The output over one steady-state cycle was then scaled based on the CFL and driving signal to account for the use of additive sources.

D. GMFDTD
The GMFDTD model simulates acoustic wavepropagation based on coupling of the second-order acoustic and viscoelastic wave-equations using a combined grid method and FDTD method.The model operates using a regular Cartesian mesh.For fluid simulations (as described in this work), GMFDTD solves the acoustic waveequation using a FDTD approach with fourth-order spatial and second-order time stencils.First-order absorbing boundary conditions are used on exterior boundaries of  42 .All simulations were computed using 12 PPW and 75 PPP.Grid sizes for the simulations were 576 × 376 × 376 grid points for benchmarks 1 to 7, 996 × 776 × 856 for benchmark 8, and 944 × 992 × 832 for benchmark 9.The grid sizes include a 48 grid point absorbing layer surrounding the domain, which had attenuation linearly increasing from zero to 50 Np/m corresponding to about 94% amplitude attenuation for a normally incident reflected wave.Simulations were computed for 3,900 time steps for benchmarks 1 to 7, 15,075 time steps for benchmark 8, and 15,075 time steps for benchmark 9.The simulations produced a complex valued steady state pressure field which was resampled to the comparison grid using spline interpolation before computing the pressure amplitude and the phase angle.

E. HAS
The hybrid angular spectrum (HAS) method is a generalization of the angular spectrum method, enabling propagation of pressure fields in heterogeneous media. 43,44An initial pressure distribution is first defined on a plane perpendicular to the direction of propagation.To produce the full 3D steady-state pressure field, pressures on subsequent planes are calculated in the spatial-frequency domain by solving the Helmholtz equation using the angular spectrum method.Errors due to local variations in attenuation and acoustic velocity are corrected for using a spatial step between each spatial-frequency step.Reflected pressures are saved, backpropagated, and summed with the incident pressure field, and this process is repeated until convergence to produce the final steady-state pressure field.
Initial pressure fields were computed using the fast near-field method as implemented in the FOCUS toolbox (see Sec. II F).Benchmarks 1 to 6 were computed using a grid size of 1001 × 1001 × 1001 with 6 PPW in the transverse directions and 24 PPW in the axial direction.Benchmarks 7 and 8 were computed using a grid size of 1401 × 1401 × 501 with an isotropic resolution of 12 PPW.Benchmark 9 was computed using a grid size of 1201 × 1201 × 501 with an isotropic resolution of 12 PPW.Calculated pressure fields were resampled to the comparison grid using bilateral interpolation.

F. JWAVE
JWAVE simulates the solution of time-harmonic wave propagation problems by solving the heterogeneous Helmholtz equation in the complex domain, using a regular Fourier spectral collocation method and linear iterative solvers such as restarted GMRES. 45Absorbing boundary conditions are enforced using a PML, 46 while the definition of the sources is done by projecting them on the discrete collocation grid by approximately convolving them with the band-limited interpolant. 47The source field is modeled as a mass source.JWAVE is a Python software written using JaxDF, 48 which in turn is based on JAX. 49The code is just-in-time compiled for the hardware at hand (e.g.GPUs or TPUs) and allows for automatic differentiation to be applied with respect to any continuous parameter.
Benchmarks 1 and 2 were computed using 6 PPW, while benchmarks 3 to 7 were computed using 12 PPW.The PML size was fixed to 30 voxels.To reduce the com-Benchmark problems for transcranial ultrasound putation time of the FFTs, the domain dimensions were padded to the nearest integers with prime factors smaller than 7.When required, the results were resampled to the intercomparison grid using Fourier interpolation.Benchmarks 8 and 9 were too large for the available computational resources, so results for these benchmarks were not computed.

G. KWAVE
k-Wave solves three coupled equations equivalent to a generalized Westervelt equation, where spatial gradients are calculated using a Fourier collocation spectral method, and time integration is performed using a dispersion-corrected finite-difference scheme. 50,51Calculations are performed on a regular Cartesian mesh with a space and time staggered grid.A split-field PML is used to absorb the waves at the domain boundaries.Sources are modeled as free-field monopoles (injection of mass) using a staircase-free formulation to represent the bowl and piston geometries, 47 and a dispersion-corrected timestepping scheme. 52enchmarks 1 to 6 were computed using the axisymmetric version of k-Wave to provide a high-resolution reference simulation. 53Benchmarks 1, 2, 3, and 5 used 60 PPW and 2400 PPP, while benchmarks 4 and 6 used 60 PPW and 6000 PPP.In both cases, the total grid size was 2700 × 864 grid points including the PML and the simulation time was 120 µs, giving 144,000 and 360,000 time steps, respectively.Benchmarks 7 to 9 were computed using the 3D version of k-Wave optimized for highperformance computing clusters. 54Benchmark 7 used 30 PPW and 1200 PPP, with a grid size of 1296 × 768 × 768 grid points and 72,000 time steps (120 µs simulation time).Benchmarks 8 and 9 used 18 PPW and 360 PPP with 72,000 time steps (400 µs simulation time).The grid sizes were 1458 × 1080 × 1200 and 1350 × 1440 × 1152, respectively.The simulation times were sufficient to reach steady state and were chosen via a convergence test.All simulations used a grid spacing that was an integer division of the comparison resolution (0.5 mm), thus simulations outputs were resampled to the comparison grid using decimation.

H. MSOUND
mSOUND solves the Helmholtz equation with the absorption term for linear acoustics cases. 55For layered media, the conventional angular spectrum approach coupled with the analytical plane wave transmission and reflection coefficients is used.For arbitrarily heterogeneous media, a split-step Fourier method with interpolation is used.Calculations are performed on a regular Cartesian mesh in space.A non-reflecting layer can be used to reduce the spatial aliasing error.Sources are modeled by assigning the complex pressure distribution on the initial plane.In these simulations, the initial plane pressure fields were obtained by FOCUS, as mSOUND currently only considers the pressure-release boundary condition (p=0) for the region outside the source.All benchmarks were computed using the function Forward3D_fund.Benchmark 1 and 2 were computed using 6 PPW in all directions.Benchmarks 7 to 9 were computed using 12 PPW in all directions.Benchmarks 3 and 4 were computed using 6 PPW in the lateral directions, and 48 PPW in the axial (propagation) direction.Benchmarks 5 and 6 were computed using 6 PPW in the lateral directions, and 24 PPW in the axial direction.For benchmarks 3 to 9, simulation outputs were down-sampled to the comparison grid.

I. OPTIMUS
OptimUS is a full wave solver based on the boundary element method (BEM). 56The BEM employs the Green's function of the Helmholtz equation to reformulate the volumetric wave problem into a boundary integral equation at the interfaces of piecewise homogeneous domains embedded in free space. 57Benchmarks 3, 5 and 7 were modeled with the PMCHWT formulation, 58 benchmarks 4 and 6 were solved with a multi-trace formulation, 59 and a nested version of the PMCHWT formulation solves benchmarks 8 and 9.The numerical discretization leads to a dense system of linear equations, whose computational footprint is reduced through hierarchical matrix compression. 60The convergence of the iterative GMRES linear solver was improved with OSRC preconditioning. 61All models were implemented in Python, using version 3 of the open-source BEMPP library. 62The triangular surface meshes were created with Gmsh 63 for benchmarks 3 to 6 and using Meshmixer 64 for benchmarks 7 to 9.
The size of the mesh elements was specified as 4.3 PPW (0.7 mm) in benchmark 3, 6 PPW (0.5 mm) in benchmarks 4 and 5, 4 PPW (0.75 mm) for benchmark 6, and 10 PPW (0.3 mm) for benchmark 7. A compromise in terms of memory requirements and accuracy of results had to be sought on benchmarks 8 and 9, and a value of 4 PPW (0.75 mm) was used on the skull mesh in the vicinity of the transducer with a value of 2.4 PPW (1.25 mm) elsewhere.The bowl and piston transducers were implemented using a Rayleigh integral formulation, consisting of a summation of evenly spaced monopole radiators positioned on their surface.The transducer surfaces were discretized using 23 and 6 monopole sources per wavelength in water for benchmarks 1 to 7 and benchmarks 8 and 9, respectively.In cases where the position of monopole sources coincided with a field evaluation point, NaN was assigned to the acoustic pressure.The acoustic field was evaluated from the surface potentials by interpolation for points on, or very close to, the material interface and with Green's functions for points in the material volume.

J. SALVUS
Salvus solves the second-order linear wave equation in the time-domain and can handle acoustic and elastic media. 65It utilizes a matrix-free implementation of the continuous-Galerkin spectral-element method 66 and an explicit second-order Newmark time-stepping scheme.The computational domain is discretized using unstructured conforming hexahedral meshes, 67 which enable the exact representation of interfaces and discontinuities in the tissue parameters.Absorbing boundaries are imposed using the first-order Sommerfeld radiation condition in addition to sponge layers. 68The transducers are modeled as a collection of monopole point sources distributed evenly over the surface of the transducer.
Spectral elements of order 4 were utilized for all simulations; this corresponds to 125 nodes per element.Due to the interfaces being represented precisely using hexahedral meshes generated within Coreform Cubit 2021.5, 69 utilizing 2 to 3 elements per wavelength for all benchmarks proved to be sufficient.The maximum pressure distributions were computed by propagating the wavefield in the time domain and then applying the onthe-fly temporal Fourier transform. 70All simulation results were output on the same hexahedral discretizations used as inputs and were subsequently resampled onto the comparison grid using fourth-order Lagrange polynomials in the spectral-element basis.

K. SIM4LIFE
Sim4Life solves acoustic pressure wave equations (linear, or Westervelt-Lighthill, which considers dispersion and frequency mixing), using a multi-GPUaccelerated FDTD method on adaptive, rectilinear meshes (to adapt grid-steps to the local wavelength and refine relevant geometric features) with cell-centered pressure degrees-of-freedom.Flux conserving virtual auxiliary points are used to improve accuracy at interfaces and boundaries, and PMLs -according to the stretched coordinate formulation 71 -are used to avoid reflections at domain boundaries (for more details on the numerical methods, see 72 ).Results can be recorded as phasors (at the base frequency and, if relevant, higher harmonics) or transient 3 + 1D fields, and the solver has been verified and validated, 73 also for transcranial focused ultrasound modeling. 74,75The original hard sources (imposed pressure; sinusoidal with rise-time, or user-defined transient profiles) were extended for the purpose of this work by soft sources (cosine function to avoid slowly decaying low frequency components).
The present benchmarks were simulated using isotropic voxel meshes (24 voxels per wavelength, 0.125 mm resolution) over the prescribed simulation domain padded with 96 layers of inhomogeneous PML, with a time step chosen to satisfy the CFL stability criterion (0.026 us or 76.9 PPP with bone, 0.048 us or 41.7 PPP without).50 periods were simulated for benchmarks 1-7 (561 × 561 × 961 voxels), while 200 periods were simulated for benchhmark 8 (1521 × 1361 × 1801) and benchmark 9 (1473 × 1793 × 1697).To facilitate comparison, voxeling was offset by half a cell compared to the defined transducer surface, such that transducer grid points cor-respond to voxel cell centers and material interfaces to voxel faces.

L. STRIDE
Stride solves the second-order, isotropic, linear acoustic wave equation using a FDTD approximation over a rectangular Cartesian grid. 76Spatial derivatives are calculated using a 10th-order finite-difference approximation, while time integration is performed using a 4th-order time-stepping scheme optimised for increased stability. 77Acoustic waves at the boundaries are absorbed using either a sponge absorbing boundary 78 or a complex frequency-shifted PML. 79Sources are introduced as free-field monopoles, which can be defined at locations both on and off the grid. 80enchmarks 1 to 7 were computed using 24 PPW and 120 PPP, resulting in a grid size of 1061 × 661 × 661 including absorbing boundaries.Benchmarks 8 and 9 were computed using 18 PPW and 90 PPP, with grid sizes of 1451 × 1121 × 1241 and 1373 × 1445 × 1205, respectively.A complex frequency-shifted PML was used as the absorbing boundary for all benchmarks.Computed results were resampled onto the comparison mesh using linear interpolation.

A. Field characteristics
Representative simulation results for all benchmarks are given in Figs. 3 and 4.These illustrate the pressure amplitudes over the comparison domains given in Table II.The beam shapes for benchmark 1 and 2 are characteristic of focused bowl and unfocused piston transducers.The introduction of a flat skull bone with a single layer (benchmark 3) or multiple layers (benchmark 4) causes a drop in the focal pressure.Hot-spots (localized regions of increased pressure) are introduced on the skull surface, and the reflected waves generate a complex interference pattern between the transducer and the skull.For the focused bowl transducer (PH1-BM3-SC1 and PH1-BM4-SC1), the reflected waves also generate a secondary focus near the rear surface of the transducer.When a curved skull is used (benchmarks 5 and 6), the hot-spots and secondary focus are reduced.For all benchmarks with the piston transducer, a distinct last-axial maximum is no longer present after the introduction of the skull.Instead, the spatial peak pressure is typically either inside or immediately adjacent to the skull bone, and the acoustic beam gradually diverges after the skull surface.The introduction of a more complex skull geometry in benchmarks 7 to 9 generates additional features in the pressure fields.For benchmarks 7 and 8, the internal occipital protuberance of the skull bone causes a noticeable deflection of the acoustic beam.The use of the whole skull for benchmarks 8 and 9 also introduces small amplitude reflections from the opposite skull surface (e.g., see PH1-BM8-SC2 in Fig. 4).
Benchmark problems for transcranial ultrasound

B. Difference metrics
Aggregated difference metrics are given in Figs.5-7.These were calculated by comparing each model with every other model in a cross-comparison, and then computing the metrics described in Sec.II G.The box plots (generated using boxchart in MATLAB) illustrate the minimum, maximum, median, and first and third quar-tiles, along with any outliers.The same metrics were also computed for each model and benchmark using KWAVE as a reference.This reference was used due to the very high spatial and temporal sampling possible for the KWAVE simulations, particularly for benchmarks 1 to 6 which allowed an axisymmetric formulation to be used.Field plots, axial and lateral profiles, difference plots, and summary tables against FOCUS (for benchmarks 1 and 2) and  II.
KWAVE (for benchmarks 1 to 9) for each model are given in the Supplementary Material.These outputs are grouped both by benchmark and by model for ease of reference.Note, the simulation results and the comparison codes are freely available, 8,9 thus it is straightforward to generate other comparisons as required, or add new modeling results to the intercomparison.Figure 5 gives a summary of the L ∞ and L 2 intercomparison metrics computed across the comparison domains outlined in Table II.Results are presented for each benchmark (summarizing the cross-comparison results across all codes) and for each code (summarizing the cross-comparison results across all benchmarks).For benchmark 1 and 2 (water and water with artificial absorption), the level of agreement is very high.For the bowl transducer, seven models have L ∞ values of less than 1% when compared to FOCUS, and all values are less than 10% (see Supplementary Material).For the piston transducer, the simulations are slightly less accurate.Four models have L ∞ values of less than 1% when compared to FOCUS, and the maximum L ∞ value against FOCUS is 15%.Examining the difference plots (see Supplementary Material), the largest differences are in the complex near-field pattern close to the transducer surface where the pressure varies rapidly.
For benchmarks 3 to 9, the L ∞ and L 2 metrics both increase noticeably, with median values for the crosscomparison between 10% and 100% (Fig. 5(a)).There still close agreement between some models, for example, three models have median L ∞ values less than 10% Benchmark problems for transcranial ultrasound    across all benchmarks when compared to KWAVE (see Fig. 5(b)).However, in general, the differences are larger than those found for benchmarks 1 and 2. Examining the difference plots (see Supplementary Material), the largest variations are in the region between the transducer and skull bone.These arise due to a combination of errors in modeling the near-field of the transducer, even in freefield (described above), along with errors in modeling the reflection from the bone and soft-tissue surfaces (e.g., due to errors in the positions of the interfaces, and amplitude and phase errors in the reflected waves).Overall, the L ∞ and L 2 intercomparison metrics demonstrate that, on a pixel-by-pixel basis, there are often large variations between the model outputs.This is true, despite there being no uncertainty in the material parameters and transducer characteristics.This highlights the inherent uncertainties when using computational models for transcranial ultrasound simulation, which must be considered when interpreting model results.
Figures 6 and 7 give a summary of the intercomparison metrics for the focal position, size, and pressure.Despite the variations in the full-field error norms discussed above, there is very close agreement in the focal metrics.When compared by benchmark, the median values for the difference in focal pressure are all less than 10% (see Fig. 6(a)).Similarly, when compared by code, 10 out of 11 models have median differences less than 10%.Differences of this level are on par with experimental repeatability and reproducibility measurements conducted using similar ultrasound transducers and a range of hydrophones. 81Compared to KWAVE, seven models have maximum differences in the focal pressure across all benchmarks of less than 10%, and five models have median differences across all benchmarks on the order of 1% or less (see Fig. 6(b)).Considering the focal position, all values including outliers are within 2.5 mm (see Fig. 6(a)), with median values for all benchmarks of 1 mm or less.Compared to KWAVE, the median values for all models are less than 0.5 mm, with seven models having a median value of 0 mm (see Fig. 6(b)).
Figure 7 gives a summary of the intercomparison metrics for focal size.Note, as mentioned in Sec.II G, the axial focal size for the piston transducer (SC2) for benchmarks 3 to 9 is not calculated as there is no focus after propagation through the skull.For reference, in water (BM1), the axial and lateral focal size for the focused bowl transducer is 26.2 and 4.1 mm, respectively, and the lateral focal size for the piston transducer at x = 60 mm is 13.2 mm.For all benchmarks, the median differences in the axial focal size for the focused bowl transducer are less than 0.6 mm (Fig. 7(a)), although there are a small number of outliers with differences up to 2.3 mm.The median differences in the lateral focal size for the focused bowl transducer for all benchmarks is 0.2 mm or less.Variations in the lateral focal size for the piston transducer are generally larger, noting the lateral focal size is also larger for this transducer.Similar results are evident for the comparison against KWAVE (Fig. 7(b)).
Overall, there is very close agreement for all benchmarks in the characteristics of the focal pressure field after propagation through the skull bone.Larger differences are evident in the full-field metrics, dominated by differences in the field between the transducer and the skull.The most relevant metrics to compare, along with acceptable limits on the differences between models, depends strongly on the intended application of the computational results.For example, calculating phase delays, calculating the approximate position and size of the acoustic focus in the brain, and calculating the pressure in the skin and skull to subsequently estimate skull heating may each have different constraints and accuracy requirements.An analysis of these factors is beyond the scope of the current work.However, it is hoped that the benchmarks and computational results presented here may help to facilitate such investigations in the future.

V. SUMMARY
A series of numerical benchmarks relevant to transcranial ultrasound simulation are presented, along with intercomparison results for 11 modeling tools used in the community.The intercomparison results show close agreement between the models, particularly for the position, size, and magnitude of the acoustic focus after propagating through the skull.When comparing each model with every other model in a cross comparison, the median values for the difference in focal pressure and focal position are less than 10% and 1 mm for all benchmarks.These results build confidence in the use of computational modeling to support transcranial ultrasound therapies.The benchmark definitions and associated data files, simulation results, and codes to compute the intercomparison metrics are all freely available. 8,9This allows the results to be replicated or further analysis to be conducted.Additional model results can also be easily added to the intercomparison, for example, to validate newly developed solvers.More generally, the intercomparison exercise provides a framework for creating benchmarks and performing model cross-comparisons.Further phases of the intercomparison exercise are currently under discussion, including benchmarks for elastic wave models, and model comparisons when using material parameters derived from CT images.

VI. DESCRIPTION OF SUPPLEMENTARY MATERIAL
SuppPub1.xlsx[URL to be inserted by AIP] gives an alternate table form of the model summaries given in Sec.III.SuppPub2.zip[URL to be inserted by AIP] provides summaries of the comparison results (including metrics, field plots, axial profiles, and difference plots) for each model compared against FOCUS (for benchmarks 1 and 2) and KWAVE (for benchmarks 1 to 9).The .zip file contains separate pdf files for each model and for each benchmark, as a well as a summary of the crosscomparison metrics.The raw data files and MATLAB codes to process the results also freely available. 8,9nchmark problems for transcranial ultrasound

FIG. 2 .
FIG. 2. Simulation layouts for benchmarks 8 (top row) and 9 (bottom row) showing the central x-y and x-z slices.The position of the bowl transducer is shown for reference.Benchmark 7 (shown in Fig. 1) uses a subset of the skull mask and the same relative transducer position as benchmark 8, with a reduced comparison domain size as shown with the dashed line.The material properties used are given in TableI.

FIG. 3 .
FIG. 3. Pressure amplitudes computed using KWAVE for benchmarks 1 to 6 showing x-y slices through the central z-plane for a comparison domain of 120 mm (axial) by 70 mm (lateral).

FIG. 4 .
FIG.4.Pressure amplitudes computed using KWAVE for benchmarks 7 to 9 showing x-y (left) and x-z (right) slices through the location of the peak pressure.The approximate location of the skull is shown with the white overlay.The size of the comparison domain for each benchmark is given in TableII.

FIG. 5 .
FIG. 5. Summary of relative L ∞ and L 2 difference metrics computed across the entire field taken from the exit plane of the transducer.(a) Cross comparison (all codes compared with all codes).(b) Comparison with KWAVE.

FIG. 6 .
FIG. 6. Summary of focal (peak) pressure and focal position metrics.(a) Cross comparison (all codes compared with all codes).(b) Comparison with KWAVE.

FIG. 7 .
FIG.7.Summary of axial and lateral focal size metrics.Note, axial focal size was not computed for benchmarks 3 to 9 when using the plane piston source (SC2), as the field in this case did not have an axial maximum in the post-skull region.(a) Cross comparison (all codes compared with all codes).(b) Comparison with KWAVE.

TABLE II .
Summary of benchmarks in Phase 1 of the intercomparison.SC1 corresponds to the focused bowl transducer and SC2 the plane piston transducer.Outputs are resampled to a regular Cartesian mesh with a grid spacing of 0.5 mm.Simulation layouts are shown in Figs.1 and 2. gp = grid points.

TABLE III .
Difference metrics used for the intercomparison.

TABLE IV .
Summary of models used to calculate the benchmark results.Additional details given in the Supplementary Material.Authors correspond to the authors of the current manuscript directly contributing to the intercomparison exercise, not necessarily the authors of the model.
the simulation domain.A finite thickness absorbing layer was placed on the exterior boundaries to further reduce acoustic reflections.A heterogeneous Neumann boundary condition is used to model the sound sources making the source-medium interface work as an acoustically hard reflector for incoming sound waves.For a more thorough description of the model, see Ref.