Ambisonics is an established framework to capture and reproduce spatial sound fields based on the spherical harmonics representation [Gerzon (1973). J. Audio Eng. Soc. **21**(1), 2–10]. A generalization—*spheroidal ambisonics*—based on spheroidal wave functions is proposed for use with spheroidal microphone arrays. Analytical conversion from spheroidal ambisonics to spherical ambisonics are derived to ensure compatibility with the existing ambisonics ecosystem. Numerical experiments verify spheroidal ambisonics encoding and transcoding for spatial sound field recording. The sound field reconstructed from the transcoded coefficients has a zone of accurate reconstruction which is prolonged towards the long axis of a prolate spheroidal microphone array.

## 1. Introduction

Immersive multimedia technologies such as augmented reality (AR) and virtual reality (VR) are receiving much attention. Audio is indispensable in these, and it is essential to be able to capture, process, and render spatial sound fields with high precision for presentation of plausible AR/VR and the creation of immersive experiences. The spatial audio framework of ambisonics (Gerzon, 1973) as well as higher-order ambisonics (HOA) (Daniel *et al.*, 2003) is receiving much attention due to the popularization of AR/VR, as well as the ability to stream this representation using standard platforms (Facebook, YouTube), and its compatibility with first-person view AR/VR. Ambisonics spatial audio capturing and processing consists of a microphone array and signal processing algorithms that are used to encode the raw microphone array signals to the spherical harmonics-domain spatial description format, which is referred to as the ambisonics signal. This ambisonics signal is decoded to the signals which is fed to loudspeaker arrays to render the spatial sound field. Such loudspeaker arrays can also be virtualized by means of binaural technologies (Kaneko *et al.*, 2016; Noisternig *et al.*, 2003; Zotkin *et al.*, 2004) and played over headphones. Hence the high compatibility of ambisonics with AR/VR applications that usually use headphones for audio playback. Due to its formulation in the spherical harmonics-domain, the most natural implementation of ambisonics recording devices is employing spherical microphone arrays (Daniel *et al.*, 2003; Gerzon, 1973; Kaneko *et al.*, 2018; Meyer and Elko, 2002). In this work, we generalize the framework of ambisonics into spheroidal coordinates and define *spheroidal ambisonics*, which uses spheroidal wave functions for the representation of spatial sound fields. While the use of spheroidal microphone arrays for sound field recording was claimed in a patent (Elko *et al.*, 2003), its description was limited to the case of a spherical embodiment. We describe a formulation for the case of prolate spheroidal ambisonics, allowing the use of prolate spheroidal microphone arrays in an analytical manner in contrast to a recently proposed approach which allows arbitrary shaped microphone arrays but relies on numerical simulation to encode the captured field (Zotkin *et al.*, 2017). In addition to the basic formulation, an analytical conversion formula from spheroidal ambisonics to spherical ambisonics is derived. This allows the utilization of the existing ecosystem around spherical ambisonics after recording the spatial audio with a spheroidal microphone array. The overview of the proposed schemes of spheroidal ambisonics encoding and transcoding is shown in Fig. 1. Numerical experiments are performed to validate and demonstrate spheroidal ambisonics encoding and transcoding when used for spatial sound field recording.

## 2. Background: Spherical ambisonics

The conventional framework of ambisonics using spherical basis functions is reviewed here. Ambisonics encoding and decoding can be performed by either relying on solving a linear system using least squares (Daniel *et al.*, 2003) or relying on spherical harmonic transformations using numerical integration (Poletti, 2005). Since the first approach allows more flexibility of the microphone array configuration, this approach is adopted here. Throughout the paper, only microphone arrays mounted on surfaces of rigid scatterers are considered. This avoids the instability arising in encoding filters for hollow microphone arrays due to singularities originating from the roots of the spherical Bessel function (Daniel *et al.*, 2003). All formulations are presented in the frequency-domain, which can be converted into the time-domain representations by inverse Fourier transform. The spherical harmonics used are defined as

with *θ* and $\phi $ the polar and azimuthal angle, respectively, and $Pnm(x)$ and $Pn(x)$ the associated and regular Legendre polynomials, respectively,

The above definition of spherical harmonics provides an orthonormal basis,

with $\delta ij$ the Kronecker delta.

### 2.1 Encoding in spherical ambisonics

The process of obtaining the ambisonics signal $Anm$, the weights of the spherical basis functions of the three dimensional sound field representing an arbitrary incident field to the microphone array, from the signal captured by the microphone array is referred to as ambisonics *encoding*. An arbitrary incident field to the spherical microphone array mounted on a rigid sphere with radius *R* and located at *O*, the origin of the spherical coordinate system $(r,\theta ,\phi )$, can be expanded in terms of the regular spherical basis functions $jn(kr)Ynm(\theta ,\phi )$ of the three-dimensional Helmholtz equation

with $jn(x)$ the spherical Bessel function of degree *n* and *k* the wavenumber. The total field $ptot$, which is the sum of the incident field and the scattered field is given by

with $hn(x)$ the spherical Hankel function of the first kind with degree *n*. On the surface of the rigid sphere, i.e., *r* =* R*, this total field is evaluated as

The total field captured by the *q*-th microphone located at $(R,\theta q,\phi q)$ is therefore given by

Truncating the infinite series at *n* =* N*, the equation can be written in vector form as

where $ptot$ is a vector holding $ptot(q)$ in its *q*-th entry (in this paper, indices are 0-based), $A\u2022$ is a vector holding $Anm(k)$ in its $(n2+n+m)$-th entry, and $\Lambda \u2022$ is the “inverse” encoding matrix for rigid sphere microphone arrays which is a matrix holding $[i/(kR)2hn\u2032(kR)]Ynm(\theta q,\phi q)$ in its $(q,n2+n+m)$ entry. The superscript dot is used here to indicate variables associated with the spherical case, in order to distinguish them from the spheroidal case introduced later. The goal of ambisonics encoding is to obtain $Anm(k)$ from the observation $ptot$. Typically, this is solved via regularized least squares minimization of

with *σ* a regularization parameter, and the solution given by

where $E\u2261(\Lambda \u2022H\Lambda \u2022+\sigma I)\u22121\Lambda \u2022H$ is the regularized encoding matrix. The regularization parameter *σ* is used to prevent over-fitting via an output signal of excessive energy and can be determined by optimizing a cost metric of the user's choice.

A plane wave $pinpw=eik\xb7r$ with a wave vector in spherical coordinates $k=(k,\theta i,\phi i)$ can be written as below with ambisonics coefficients $Anm$,

## 3. Formulation of spheroidal ambisonics

The fact that the three-dimensional Helmholtz equation is separable in the spheroidal coordinate system allows us to formulate spheroidal ambisonics, and details are presented here.

### 3.1 Spheroidal coordinates

While there are two types of spheroidal coordinates—the prolate and the oblate—details for only the prolate spheroidal ambisonics are presented here. The case for oblate spheroidal coordinates can be derived in a similar fashion. The definition of prolate spheroidal coordinates itself has some variations (Flammer, 2014). The definition also used in Adelman *et al.* (2014) is employed here. The prolate spheroidal coordinate system has three coordinates *ξ*, *η*, and $\phi $, which is also characterized by the parameter *a*, where 2*a* is the distance between the two foci of the prolate spheroid. The domain of *ξ* and *η* is $\xi \u22651$ and $|\eta |\u22641$, respectively. The conversion with the Cartesian coordinates (*x*, *y*, *z*) is given by

The long radius $rlong$ and short radius $rshort$ of a prolate spheroid is related to *a* and *ξ*_{1} by

The prolate spheroidal coordinates *ξ* and *η* are illustrated in Fig. 2 (left) for *a =* 1.

### 3.2 Scattering of an arbitrary incident wave by a sound-hard prolate spheroid

An arbitrary incident wave can be expanded using radial spheroidal wave functions $Rmn(1)$ and angular spheroidal wave functions $Smn$ (Flammer, 2014),

where *c* = *ka* with *k* the wave number. The *spheroidal ambisonics coefficients* are defined as the collection of the ${Amn,Bmn}$ coefficients. A canonical example of an incident wave is a plane wave $pinpw=eik\xb7r$ with $k=k(sin\u2009\theta 0\u2009cos\u2009\phi 0,\u2009sin\u2009\theta 0\u2009sin\u2009\phi 0,cos\u2009\theta 0)$ the wave vector represented in the Cartesian coordinates. The incident plane wave can be expanded as

which yields

The truncation error of the expression in Eq. (16), given by $|eik\xb7r\u2212pinpw(N)|$, where $pinpw(N)$ is the series truncated at *n* = * N*, is shown for an example configuration in Fig. 2 (right).

The total field after scattering an arbitrary incident field characterized by ${Amn,Bmn}$ is then given by

On the surface of the spheroid, i.e., $\xi =\xi 1$, by using the Wronskian relation $W(1,3)=R(1)(c,\xi )R(3)\u2032(c,\xi )\u2212R(1)\u2032(c,\xi )R(3)(c,\xi )=i/c(\xi 2\u22121)=iW(1,2)$, the total field can be written as

### 3.3 Spheroidal ambisonics encoding

The goal of spheroidal ambisonics encoding is to estimate the spheroidal ambisonics coefficients from observations by a limited number of microphones mounted on the surface of a spheroid-shaped baffle. As mentioned earlier, it is assumed here that the baffle is a sound-hard prolate spheroid.

By truncating the expansion order by *N >* 0, Eq. (19) can be rewritten in vector form

Here, **A** and **B** are vectors holding *A _{mn}* and

*B*in their $l\u0303(A)=((n2+n)/2+m)$-th and $l\u0303(B)=((n2\u2212n)/2+m\u22121)$-th entry, respectively. The lengths of these vectors are $L(A)=(N+1)(N+2)/2$ and $L(B)=N(N+1)/2$, respectively. $R(A)$ and $R(B)$ are diagonal matrices holding $i/c(\xi 12\u22121)Rmn(3)\u2032(c,\xi 1)$ in their $l\u0303(A)$-th and $l\u0303(B)$-th diagonal entries, respectively. $S(A)$ and $S(B)$ are matrices with entries

_{mn}respectively, where *q* is the sensor index. $ptot$ is a vector holding $ptot(q)$, the observed sound pressure at the *q*-th microphone, in its *q*-th entry. $ptot,\u2009S(A)$, and $S(B)$ have shapes of $[Q],\u2009[Q\xd7L(A)]$, and $[Q\xd7L(B)]$, respectively, where *Q* is the number of microphones. For a truncation order *N*, the total number of coefficients in **A** and **B** is $L=L(A)+L(B)=(N+1)2$, which is the same as the total number of spherical ambisonics coefficients ${Anm}$ with maximum order *N*. $\Lambda (P)$ is referred to as the “inverse” encoding matrix for sound-hard prolate spheroidal ambisonics.

The unknowns *A _{mn}* and

*B*can be estimated from observations of the sound field with multiple sensors mounted on the spheroidal baffle, by solving Eq. (20) with least squares. This process is referred to as

_{mn}*spheroidal ambisonics encoding*. The regularized least squares solution is given by

with *σ* a regularization constant and $E(P)\u2261(\Lambda (P)H\Lambda (P)+\sigma I)\u22121\Lambda (P)H$ the encoding matrix for sound-hard prolate spheroidal ambisonics.

## 4. Transcoding from spheroidal to spherical ambisonics

### 4.1 The transcoding formula

The sound field encoded as a spheroidal ambisonics signal can be converted into a conventional spherical ambisonics representation. This process is referred to as *transcoding*. The following relation between spheroidal wave functions and spherical Bessel functions and associated Legendre polynomials (Flammer, 2014):

can be utilized for the derivation of the transcoding formula, where % is the modulo operator and $drmn(c)$ are the expansion coefficients,

It can be shown that the analytical transcoding formula from spheroidal ambisonics coefficients ${Amn,Bmn}$ to spherical ambisonics coefficients $An\u2032m\u2032$ is given as the following:

where $\alpha m\u2032=(\u22121)m\u2032$ for negative $m\u2032$ and $\alpha m\u2032=1+\delta m\u2032,0$ otherwise. The transcoded signal $Anm$ can be then stored/transmitted/processed with any existing signal processing pipeline for spherical ambisonics signals, e.g., rotation/filtering/decoding, and any technique or knowledge established for spherical ambisonics can be applied here.

### 4.2 Mixed-order transcoding

It can be noticed from Eq. (25) that the truncation number of the transcoded spherical ambisonics signal, which is hereafter referred to as $N\u2032$, does not need to be the same as the truncation number *N* of the spheroidal ambisonics signal. In fact, a truncated approximation of $An\u2032m\u2032$ with $|m\u2032|\u2264N$ can be computed for any $n\u2032$ independently of *N*, as long as the truncation number *R* with respect to *r* in the table $drmn$ satisfies $n\u2032\u2212|m\u2032|\u2264R$. This consideration leads to the notion of *mixed-order transcoding*, which computes all spherical ambisonics coefficients $An\u2032m\u2032$ with a truncation number $N\u2032>N$, but only holding those coefficients that satisfy $|m\u2032|\u2264N$ and regarding $An\u2032m\u2032=0$ otherwise. The resulting transcoded ambisonics signal $An\u2032m\u2032$ can be seen as a special form of the mixed-order ambisonics scheme introduced for conventional ambisonics (Travis, 2009). Compression of the transcoded signal by discarding a subset of the $An\u2032m\u2032$ coefficients similarly to conventional mixed-order ambisonics based on the direction dependence of the perceptual sensitivity (Best, 2004) may be explored as well, which is out of the scope of the present work.

## 5. Experimental evaluation

Prolate spheroidal ambisonics encoding as well as its transcoding into spherical ambisonics was validated by numerical experiments. Encoding and transcoding of a plane wave with three different incident angles was performed with a sound-hard spherical microphone array as well as a sound-hard prolate spheroidal microphone array. The spherical array had a radius of 0.198 m. The prolate spheroidal microphone array had $rshort=0.05$ m and $rlong=1$ m. The arrays were designed to have the same surface area and both had 512 microphone capsules located on a 16-point grid of Gauss-Legendre quadrature nodes for *θ* and *η* and on a 32-point equispaced grid for $\phi $. The long axis of the prolate spheroidal array was set parallel to the *x* axis. Figure 1 shows the experimental procedure and the two microphone arrays used for the experiments. Spherical and spheroidal ambisonics encoding was performed using Eqs. (10) and (22), respectively. Computation of the coefficient tables of spheroidal wave functions were performed using the software library *Spheroidal* (Adelman *et al.*, 2014), which is relying on arbitrary precision arithmetic using gnu mpfr (Fousse *et al.*, 2007) for accurate computation of the spheroidal wave functions. The truncation number was set to $N\u2032=N=12$ for the baseline spherical ambisonics and spheroidal ambisonics. The regularization parameter *σ* was set to zero for both spherical and spheroidal encoding, i.e., no regularization was applied. Transcoding from spheroidal ambisonics to spherical ambisonics was performed using Eq. (25), truncated at $n\u2264N\u2032=12$, while $N\u2032=16$ was used for mixed-order transcoding. The estimated incident field for the encoded spherical ambisonics coefficients was reconstructed and compared to the ground truth incident field. The reconstruction of the estimated incident fields was performed using Eq. (4) truncated at $n\u2264N\u2032$. The signal-to-distortion ratio (SDR) of the reconstructed fields was computed for evaluation points in the *x*-*y* plane. The region with SDR higher than 30 dB was considered as the sweet-spot of accurate reconstruction.

Figures 3 and 4 shows the results for incident waves with normalized wave vectors, expressed in the Cartesian coordinates, of (1, 0, 0), (0, 1, 0), and $(2/2,2/2,0)$, respectively. The frequency of the incident wave was set to 541.8 Hz. It can be observed that the width of the sweet-spot of precise reconstruction in spheroidal ambisonics is shorter in the shorter axis of the spheroid, but longer in the longer axis of the spheroid, compared to the width in the baseline spherical ambisonics case. With mixed-order transcoding, this prolongation is even more notable. This asymmetry of the sweet-spot shape could be useful in some applications, in which a non-spherical sweet-spot is desired. An example application is sound field reproduction for multi-person home-theater systems in which the sweet-spot should cover multiple listeners sitting next to each other.

Note that the presented reconstruction is theoretical. If the reconstruction is performed via playback of the decoded spherical ambisonics signal using a limited number of loudspeakers, additional accuracy limitations apply. This is a problem of the decoding stage of spherical ambisonics and is out of the scope of the present work which is focused on recording, encoding, and transcoding. Any established technique for spherical ambisonics decoding and playback can be applied here.

## 6. Conclusion

The framework of spheroidal ambisonics, a natural extension of ambisonics into spheroidal coordinates, was proposed. Spheroidal ambisonics enables analytical encoding of the spatial sound field into spheroidal ambisonics coefficients using spheroidal microphone arrays. An analytical transcoding formula from spheroidal ambisonics into conventional spherical ambisonics was derived, in order to ensure compatibility with the existing software ecosystem around spherical ambisonics. The concept of mixed-order transcoding which allows transcoding to spherical ambisonics of higher truncation numbers was introduced. The numerical experiments demonstrated that the sweet-spot of reconstruction in spheroidal ambisonics has an asymmetric shape which is prolonged towards the longer axis of the prolate spheroidal microphone array, realizing non-spherical sweet-spots in ambisonics reconstruction, which could be useful in some applications. The case of oblate spheroidal microphone arrays can be derived in a similar fashion and will be published elsewhere. A recently proposed microphone array for three-dimensional ambisonics recording, which uses a sound-hard circular disk as the scattering body (Berge, 2019), can be seen as a special case of an oblate spheroidal ambisonics microphone array. Another future research topic is the optimization of the microphone capsule configuration on the spheroid. In a practical setup, care must be taken for spatial aliasing (Rafaely *et al.*, 2007) and a careful design of the microphone array configuration is important. While the subject of optimizing the microphone array configuration for spherical arrays has been studied extensively in the past (Li and Duraiswami, 2007; Li *et al.*, 2004), optimization of the array configuration in the case of spheroidal microphone arrays requires further research.

## Acknowledgments

Shoken Kaneko thanks to Japan Student Services Organization and Watanabe Foundation for support via scholarship programs.