A new method for solving the wave equation is presented, called the learned Born series (LBS), which is derived from a convergent Born series but its components are found through training. The LBS is shown to be significantly more accurate than the convergent Born series for the same number of iterations, in the presence of high contrast scatterers, while maintaining a comparable computational complexity. The LBS is able to generate a reasonable prediction of the global pressure field with a small number of iterations, and the errors decrease with the number of learned iterations.

## 1. Introduction

Computational solvers for the Helmholtz equation with heterogeneous material properties are used in many fields that require single frequency wave simulations, including optics, seismology, acoustics, and electromagnetics. One approach is to compute the terms of the Born series (Born, 1926) through the recursive application of the known Green's operator for a homogeneous medium. However, in its conventional form, the Born series converges only for low contrast scattering potentials—a major limitation in many practical applications. As an example, the application that has motivated this work is the propagation of ultrasound waves through the skull, which has a sound speed about double that of the surrounding soft tissue. A convergent Born series (CBS) was proposed by Kleinman (1988, 1990) based on a generalised overrelaxation method. More recently, Osnabrugge (2016) introduced another CBS through the use of a preconditioner based on the scattering potential, which has since found many applications (Jakobsen , 2020; Kaushik , 2020; Krüger , 2017; Lee , 2022). However, for strongly scattering media many iterations are needed, as the convergence rate is limited by the range of spatial wavenumbers in the problem. To overcome the slow convergence rate, we present a learned Born series (LBS), in which a generalised preconditioner and the modified Green's operator are found, for a given number of unrolled iterations (Monga , 2021), through training. We show that the convergence rate in the presence of high contrast scatterers is considerably improved.

## 2. Learned Born series

*ω*is the angular frequency, and

*c*

_{0}is the spatially-varying sound speed. To solve this equation, following Osnabrugge (2016), we start by defining the scattering potential $ v = k 2 \u2212 \kappa 2 \u2212 i \epsilon $, where $ \kappa 2 , \epsilon \u2208 \mathbb{R} +$ are positive scalars, and rewrite the differential equation as

*u*leads to the classical form of the Born series,

*ε*. Despite the convergence guarantees of Eq. (4), the convergence rate is inversely proportional to the maximum wavenumber difference between any two points in the domain. This means that for media with sound-speed high contrast, a large number of iterations are necessary.

*A*is the learned version of the preconditioner

*q*,

*B*is the learned version of the scattering potential

*v*, and

*C*is now a general additive term (that is not enforced to be $ 1 \u2212 A$). Each of

*A*,

*B*, and

*C*are the outputs of a neural network dependent on the input sound speed and the spatial location maps, i.e.,

*θ*to denote the learnable parameters for each operator at each iteration, without implying that they are the same set. The learned fields (

*A*,

*B*,

*C*) are returned by three fully connected neural networks (FCNN) acting channel wise on the $ ( c , x )$ map, also known as 1 × 1 convolutions, with two hidden layers. The size of the hidden layers is equal to the number of channels. Similar to convolutional networks, we generalise Eq. (6) to

*c*channels by associating the action of each (

*A*,

*B*,

*C*) field to a matrix product; for example, if $ u \u2208 \mathbb{R} c = ( u 1 , u 2 , \u2026 , u c )$, then $ ( A u ) i = \u2211 j A i j u j$. Note that the values of (

*A*,

*B*,

*C*) change for each pixel location.

*ε*, we parameterise the Green's operator as

*α*and $ \kappa 2$ are learnable parameters, which is equivalent up to a scaling factor (that is absorbed by the

*A*term). Similarly, the input to the Green's operator is an $ n \u2212$dimensional field, therefore the Green's function is a mapping $ \mathbb{R} c \u2192 \mathbb{R} c$, thus $ \alpha , \kappa 2$ are

*c*×

*c*, where

*c*is the number of channels. Last, we apply a projected non-linearity of the form $ \sigma \u0302 ( u ) = D 1 \sigma ( D 2 u )$, where

*D*

_{1},

*D*

_{2}are two dense layers acting channel-wise and

*σ*is a non-linear function, in our case a GeLU function (Hendrycks and Gimpel, 2016). The final learned iteration is therefore

*n*to highlight that $ \sigma \u0302$ and $ M \u0302$ have different parameters at each iteration. We set $ u 0 = 0$. We note the similarity between Eq. (9) to the Fourier Neural Operators (FNO) framework (Li , 2020), where the network is set up as the sum of a Fourier filter and a local convolution operator as

*K*is a learnable Fourier filter and

*C*is a learnable local convolution filter. Similar to the FNO, we pad the input by a fixed amount to accommodate the artifacts generated by the periodicity assumed by the discrete Fourier transform and afterwards unpad the output. A block diagram of the network is given in Fig. 1.

## 3. Training and results

A dataset was constructed from 2000 MNIST (Deng, 2012) samples upsampled to 128 × 128 pixels, which were rescaled and used as sound speed distributions. MNIST is a dataset of binary images of the digits 0 to 9 which is widely used in machine learning research. Here, we use it to create a diverse range of highly scattering spatial distributions. The two-dimensional (2 D) simulations were performed in normalised units with a background sound speed of 1 m.s^{–1}, maximum sound speed of 2 m.s^{–1}, and *ω* = 1 rad.s^{–1} (giving $ 2 \pi $ grid points per wavelength). The dataset was split between train, validation, and test in 80%, 10%, and 10%-sized subsets. For each sound speed map, a reference solution was generated using the Helmholtz solver in j-Wave (Stanziola , 2023) using GMRES with a fixed source position. The loss function was the mean squared error against the reference solution. All networks were trained with a batch-size of 16 and learning rate of $ 3 \xb7 10 \u2212 3$: the network corresponding to the epoch with the lowest validation loss was saved, although we did not observe any over-fitting during training. The network was trained for 6, 12, and 24 unrolled iterations, with *c* = 8. Here, by unrolled iterations, we mean the value of *n* in Fig. 1, which is analogous to the number of leading terms retained in the truncated scattering series. We also trained an extremely light version with *c* = 2 (matching the number of components on a complex field, as in the original CBS) with linear activation functions.

The simulated pressure field for a representative example from the test set using a network trained with 12 iterations is shown in Fig. 2(a). Even with a very small number of iterations, the LBS is able to generate a reasonable prediction of the global pressure field. The errors against j-Wave for the whole test set are shown in Fig. 3(a). As expected, the errors decrease with the number of learned iterations. Compared to the CBS run for the same number of iterations, the error is significantly reduced. This is because the pseudo propagation speed in the CBS is limited due to the strong contrast, as shown in Fig. 3(b).

The outliers correspond to the speed of sound maps that exhibit resonant modes, e.g., “0” digits for which the field exhibited a mode-like structure similar to whispering gallery modes (Stanziola , 2021). The error for the CBS was also proportionally higher for these examples. To match the same error on the test set of the LBS using 6, 12, and 24 iterations, on average the CBS required 13.1, 8.6, and 5.0 times more iterations, respectively [see Fig. 3(b)]. Even when the number of channels is reduced to 2 (equivalent to the complexity of the CBS), the convergence of the LBS is still much faster than the CBS [see yellow boxplot in Fig. 3(a)].

## 4. Summary and discussion

We present a learned Born series, inspired by the convergent Born series presented in Osnabrugge (2016), that significantly improves convergence rates for single-frequency wave simulations with strong sound speed contrast. While for channel counts *c* > 2 the LBS requires more fast Fourier transforms (FFTs) at each layer compared to the CBS, such FFTs can be computed in a parallel fashion, which still makes the method more efficient in practice. While we have only shown results for a single source position, results for more complex sources can be obtained by superposition, or by retraining with a given source geometry; see Stanziola , 2021. The code for reproducing our results is available online in Stanziola, 2023.

In the future, it would be interesting to explore several potential extensions of this work. One possibility is to train on sub-sampled problems that are otherwise unsolvable with classical methods in order to further reduce the computational complexity of the LBS. This approach could have applications in areas such as transcranial treatment planning and full waveform inversion. Another interesting direction would be to apply the method to other partial differential equations (PDEs) for which a scattering potential can be defined, such as the Schrödinger equation.

One potential way to improve the LBS would be to substitute the fixed padding with a learnable “perfectly matched layer” layer, instead of feeding the spatial grid $x$. This could lead to an architecture that is resolution and grid-size independent, making it a neural operator as described in Kovachki (2021). It would also be interesting to train the method as an iterative solver, as described in Stanziola (2021). Finally, it would be interesting to explore whether the LBS can be extended to heterogeneous density. While the method presented in this paper does not support it, the CBS does not either. Allowing so would open up new applications in areas where the density of the medium being modeled is nonuniform, such as ultrasound applications with the presence of bone or geophysics.

## Acknowledgments

This work was supported by the Engineering and Physical Sciences Research Council (EPSRC), UK, Grant Nos. EP/V026259/1, EP/S026371/1, EP/T022280/1, EP/W029324/1, and EP/T014369/1.