In this article, a structured acoustic holography technique in the self-positioning method of a single microphone from the monaurally recorded signals is proposed. A series of three-dimensional ultrasonic holograms, designed for positioning in a workspace, are sequentially projected. As a result, the microphone receives a position-dependent sequence of amplitude signals encoded with information on the observation position. Subsequently, the microphone position is determined by obtaining the peak position of the cross-correlation function between the received signal and the reference signal. Experiments were conducted using a custom-made phased array of 40-kHz ultrasound transducers to evaluate the positioning accuracy. It is demonstrated that when applied to a 100×100×50 mm^{3} workspace, the measurement error was less than 1 mm at all observation points in the numerical experiment, which was maintained for more than 96% of the points in the real-environment experiments. The proposed method is advantageous in that it does not use the phase information of the recorded signals, thus requiring no multiple synchronized recordings as the microphone-array-based methods. In addition, this scheme does not directly use the absolute value of the received amplitude as a positioning clue, which means that no amplitude-to-voltage calibration is required.

## I. INTRODUCTION

With the widespread use of mobile devices, such as smartphones, the technical demand for self-positioning systems has been increasing. Similar to the Global Positioning System (GPS) for outdoor use, indoor positioning has been gaining interest with the advent of autonomous robotic systems in daily applications, which are typified as pilotless multirotors. Many methods have been proposed for self-positioning, depending on the practical cases.^{1}

Among them, the use of waves is a major strategy for noncontact and nonconstrained three-dimensional positioning. A wavefield has the advantage that its spatiotemporal structure is maintained during propagation over a considerable distance. In addition, waves travel rapidly in the medium, resulting in prompt positioning. Owing to these physical properties, many wave-based techniques, including GPS, have been proposed and used in practice. They vary in the type of waves (acoustic,^{2–7} radio frequency,^{8,9} and lightwave^{10–13}) and the range of frequency used, depending on the application case. These factors are also determined by considering the technical aspects to use, such as the preferable wave reflection property, demanded spatiotemporal resolution, tolerable latency, or positioning algorithm. In the wave-based methods, essential clues for positioning are extracted from the amplitudes and phase delays in signals received by the sensors, which vary according to the relative positions of the signal sources and sensors. Therefore, either signal sources or sensors should be multiple and distributed across a certain spatial range.

In this article, we consider the positioning techniques of the miniature mobile devices, which are unlikely to be equipped with sensor arrays that range within a spatial tract. Hereinafter, we only discuss the situations in which a standalone mobile device with a single sensor receives wave emissions from multiple sources. When considering the use of monopole sources, they generate omnidirectional spherical wavefields, whose individual amplitude distributions are inversely proportional to the distance between the source and observation point. Therefore, self-positioning from a sufficient number of separate amplitude measurements with known source positions is theoretically possible. In practice, such an amplitude-based strategy is inaccurate when sources are located very far from the sensor because the power of the received signal seldom varies with respect to the distance changes in that situation. By contrast, the phase-based approaches in which the receiver uses the time of arrival or time difference of arrival (TDOA)^{3–7} information in the form of a phase of the received signals are significantly more accurate because the spatial phase structure of the wavefield is maintained through propagation over a long distance. However, the phase-based techniques are vulnerable to the reverberation and reflection of sound. In addition, all of the methods stated above require an individual signal identification process when only a single receiver is available. Current positioning methods for a single microphone are exceptional, but they still rely on the phase information of the recorded signals.^{14,15}

Here, we propose a new positioning method in which a temporal series of structured ultrasound holograms projected onto a workspace provides the microphones with positional information. We used an ultrasound phased array^{16} to create nonuniform structured sound pressure fields. Such phased-array-based localized sound pressure fields have gained increasing interest from researchers for various applications such as acoustic tweezers,^{17} midair odor control,^{18} noncontact tactile displays,^{19} and particle levitation.^{20} The microphones receive a position-dependent sequence of amplitude signals in which the information about an observation location is encoded. Position decoding from the received signal can be easily performed using the peak detection of the cross correlation with known reference signals. Such signal processing is readily implemented and requires a relatively low calculation cost. The proposed method requires no synchronization or phase information in the received signal to be processed. Despite this simplicity, our method offers a subwavelength accuracy of the self-positioning, which was less than 1.0 mm in most cases in our real-environment measurement with 40-kHz ultrasound.

The basic concept of structured midair ultrasound holography has been derived from “structured light” methods, which are categorized into the “space encoding” techniques^{11–13} that are mostly used in three-dimensional shape measurement. Our proposed ultrasound-holography-based technique in this context has several inherent features. First, ultrasound has a much longer wavelength than light, which makes the spatial structures of the holography less distorted against the partial occlusion of the source emission because of the diffraction effect. Such a longer wavelength also results in a greater depth of focus, which provides a larger workspace for the positioning. In addition, most of the acoustic power emitted from the phased array is confined within the holographic field, which is hardly affected by the superimposed reflected waves that have significantly less power in the hologram region. We previously proposed a self-positioning method based on the two-dimensional scanning of acoustic Bessel beams,^{21} which requires considerable measurement time. Subsequently, we proposed another self-positioning scheme using a holographic ultrasound field with a unidirectional slope-like amplitude profile,^{22} which offers instantaneous one-dimensional positioning. However, in this scheme, the analog values of the received absolute signal amplitudes directly correspond to the information of the microphone positions. Therefore, this method requires prior calibration to associate the observed acoustic signal level with the coordinates in the workspace. The method is also vulnerable to environmental factors that slightly change the received signal power, such as partial wave occlusion. Our newly proposed method overcomes these problems because it depends on no absolute values of the single-shot measurements. Instead, we sequentially project a set of ultrasound holograms as if they virtually shift to a certain direction moment by moment. As described above, we use a cross-correlation-based peak detection of the received signal sequence, which is independent of the overall signal intensity. To the best of our knowledge, the positioning methods based on the spatiotemporal coding of acoustic fields using ultrasound phased arrays have been only studied by our research group so far.

We describe the following technical aspects in the remainder of the article. We describe our positioning strategy using a monaural acoustic signal from a microphone positioned in a sequence of holograms, whose spatial amplitude distributions are designed based on the binary maximum length sequence profiles, a very common pseudorandom sequence. Hereinafter, we refer to the maximum length sequence as the M-sequence. In particular, we describe how to solve the “halfway value” in the received signal owing to the long wavelength of the hologram. Next, we describe a method for physically generating the structured holograms, followed by descriptions of the numerical and real-environment positioning experiments with quantitative evaluations. We performed one-directional positioning experiments, which can be theoretically extended to three-dimensional techniques by incorporating three or more separate measurements with independent holograms consecutively projected by one of the multiple phased arrays surrounding the workspace.

## II. METHOD

### A. Problem setting

Our aim is to estimate the coordinates in the direction of a certain axis in a three-dimensional space using an ultrasonic phased array (Fig. 1), which is a two-dimensional planar lattice of ultrasonic transducers whose output phase and amplitude can be individually adjusted. We consider the problem of one-dimensional positioning about a coordinate axis parallel to the phased-array aperture. In our scheme, a two-dimensional horizontal position can be estimated in the same manner for each phased array in practical use, which means that three-dimensional positioning can be achieved using two or more arrays whose positions and postures are known in a common world coordinate system. The coordinate system is defined as shown in Fig. 2, where the phased array is located with its center at the origin. The *x* and *y* axes of the coordinate system are parallel to the lattice of the phased-array transducers. Hereinafter, we refer to a cuboid region in which the positioning is performed as the “workspace.” The center of the workspace is located at point $(x,y,z)=(0,0,z0)$ (mm), and its dimensions are defined by two parameters *L _{xy}* (mm) and

*L*(mm). In Secs. II B–II F, we describe a method for obtaining the

_{z}*x*-coordinate from the waveforms received by a single microphone placed at an arbitrary three-dimensional position inside the workspace using structured acoustic holography with a phased array.

### B. Shifted sequence of ultrasound fields and cross-correlation-based positioning strategy

The phased array generates a spatial sound pressure distribution pattern $p(x,y,z)$, where *x*, *y*, and *z* denote the three-dimensional coordinates in the workspace. Let us assume that this pattern only varies in the *x* direction and is uniform in the *y* and *z* directions in the workspace (Fig. 2). Thus, the sound pressure field $p(x,y,z)$ can be expressed with a one-dimensional function as $p(x\u2212Lxy/2,y,z)=l(x)$, which we define as the “reference signal.” For the *x*-coordinate, an offset of $\u2212Lxy/2$ is added such that the left edge of the workspace corresponds to the beginning of the reference signal. Here, $p(\u2212Lxy/2,y,z)$ corresponds to $l(0)$, which means that $l(0)$ corresponds to the sound pressure value at the left end of the workspace. Note that *l*(*x*) is a known signal, which can be determined before positioning. The methods for generating such a sound pressure field are described in Secs. II E–II G.

Next, consider a case in which this sound pressure field $p(x,y,z)$ continuously travels toward the negative direction of the *x* axis with a velocity of *a* (mm/s) while maintaining its distribution. Here, the spatial sound pressure field at time *t* is described as $p(x\u2212Lxy/2,y,z,t)=l(x+at)$. Let us assume that the microphone is fixed at $(xr,yr,zr)$ during the entire positioning process. Subsequently, the receiver receives a signal of the sound amplitude corresponding to the pressure distribution pattern that travels over it. We denote the received signal at *x* = *x _{r}* as

*r*(

*t*).

Assuming that the receiver is a single microphone that can conduct point measurements without directivity, the relationship $r(t)=p(xr,yr,zr,t)=l(xr+Lxy/2+at)$ holds. Here, if the mapping from *x _{r}* to $l(xr+Lxy/2+at)$ is unique, the value of

*x*can be obtained in a form corresponding to the received signal. Let $x\u2032=at$ be the distance over which the sound pressure pattern travels until time

_{r}*t*. Thus, $r(x\u2032/a)=l(x\u2032+Lxy/2+xr)$ holds. Because

*a*is a known parameter, $r(x\u2032/a)$ is the observed signal, and $l(x\u2032)$ is a known signal, we can calculate the cross-correlation function

*R*(

*X*) between these two signals using

By substituting $r(x\u2032/a)=l(x\u2032+Lxy/2+xr)$, we can verify that *R*(*X*) attains its maximum value at $X=Lxy/2+xr$, indicating that we can theoretically obtain the *x*-coordinate of the receiver by detecting the peak of the cross-correlation function. This is an outline of the proposed holography-based positioning strategy.

Practically, the pattern shifting is not continuous because the phased array is immobilized. We switch the driving patterns of the array transducers that correspond to the acoustic pressure patterns, one at a time, to enable them to virtually gradually shift along the *x* axis in a discrete fashion. We set the pattern-switching period to $\Delta t$, resulting in a shift in the sound pressure field of $s\u2261a\Delta t$ (mm) per switching. Here, the reference and received signals are also represented as discrete value sequences as $l[i]$ and $r[i]$, respectively, where *i* denotes the time of the pattern switching. Then, the discretized reference signal can be expressed as $l[i]=l(ati)=l(is)$ at the time $ti\u2261i\Delta t$. Similar to the continuous cases discussed above, the signal sequence received by the microphone is expressed as $r[i]=p(xr,yr,zr,ti)=l(xr+Lxy/2+ati)=l(xr+Lxy/2+is)$. Subsequently, the discrete version of the normalized cross correlation ($R[n]$) between $r[i]$ and $l[i]$ is defined by

where $||\xb7||$ denotes the vector norm of “$\xb7$” which is expressed as $\Vert r\Vert :=\u2211i=0m\u22121r[i]2$ and $\Vert l(n)\Vert :=\u2211i=0m\u22121l[n+i]2$. Here, *m* and *N _{l}* denote the sequence lengths of

*r*and

*l*, respectively. By substituting $r[i]=l(xr+Lxy/2+is)$ and $l[n+i]=l(ns+is)$, we can observe that the correlation function $R[n]$ attains its maximum value when $n=(xr+Lxy/2)/s$. Because $xr+Lxy/2$ is generally not an integer multiple of

*s*, the maximum cross correlation is achieved at

*n*, which is closest to $(xr+Lxy/2)/s$. Note that we assume that the microphone is always situated in the workspace, resulting in the $x$-coordinate to be in the range of $\u2212Lxy/2\u2264xr\u2264Lxy/2$. Therefore, the index of the reference signal $l$ is always nonnegative.

### C. Binary M-sequence amplitude patterns to enhance the cross-correlation peak

In principle, cross-correlation-based positioning can be implemented with any reference signal sequence $l[n]$ that is acyclic or whose period (sequence length) is sufficiently longer than the observed signals. In practice, a reference signal with a sharp peak of its autocorrelation function is preferable for the robustness of positioning against measurement errors in real environments. In addition, to shorten the measurement time, the use of as short a subsequence as possible of the reference signal sequence is preferable.

To meet these needs, we use a binary M-sequence^{23} to create the reference signal. It is a periodic sequence of length $2n\u22121$, uniquely determined by a generator polynomial from an initial *n*-bit long binary sequence. Any subsequences of arbitrary length greater than or equal to *n* bits appear at one specific place in the sequence only once in a period. This means that correlation-based positioning can be achieved by providing a holography shift greater than a distance equivalent to *n* bits in the reference signal. The M-sequence is known for its strong autocorrelation; hence, the cross correlation between the M-sequence and its subsequence becomes stronger as the length of the subsequence increases. Therefore, when the received signals contain noise, accurate positioning can still be achieved when the signal corresponds to a subsequence with a sufficient length.

### D. Measurement in real environments and resampling

The received signal comprises a sequence of acoustic amplitudes $r[i],\u2009\u2009i=0,\u2026,m\u22121$, resulting from the discrete changes in the generated *m* hologram patterns. Therefore, it is expected to coincide with a subsequence of a discretized version of the reference signal ($l[n]$), which is derived from the resampling of a continuous reference signal [*l*(*x*)] at resampling intervals corresponding to the shift width. As described below, the reference signal ($l[n]$) can be provided with an arbitrary spatial resolution [*d* (mm)], which can be lower than the shift width [*s* (mm)]. Therefore, the positioning resolution can be improved by resampling $r[n]$, which originally has a spatial sampling period of *s*, to the sampling width *d* of the reference signal $l[n]$ and calculating the cross-correlation peak.

Let *λ* be the ultrasound wavelength emitted from the phased array. In most cases, the workspace is so far from the array that the evanescent waves do not exist in the workspace. For these cases, the highest spatial frequency of the acoustic field is limited to the order of 1/*λ* because of the diffraction limit. Ideally, according to the Nyquist-Shannon sampling theorem, if the condition $1/s>2/\lambda $ holds for the pattern shift width (*s*), a continuous spatial acoustic pressure pattern can be reconstructed from its discrete sampled measurements using the well-known sinc interpolation. As experimentally shown in Sec. II B, the pattern shift width can be less than the wavelength, keeping the pattern undistorted. This subwavelength pattern shift and the following sinc interpolation are our key techniques for achieving the subwavelength positioning resolution, combined with the subsequently operated subwavelength cross-correlation calculation.

When the above sampling pattern width vs wavelength condition holds, the procedure of this convolution-based reconstruction of the continuous function $r\u2032(x)$ from its sampled sequence $r[i]$ is expressed as

where “∗” denotes the convolution operator, and $\delta (x)$ is the Dirac delta function. Therefore, the positioning accuracy can be ideally increased to any desired extent using the cross correlation calculated about the resampled versions of the received and reference signals with the desired sampling period (*d*), based on the interpolation procedure. Practically, some factors hinder this errorless positioning, such as the generation of horizontally shifted holograms containing some fluctuations and the number of measured signal samples being finite. Another important insight here is that no shift width finer than half a wavelength is required to increase the positioning accuracy.

Because $r[i]=l(xr+Lxy/2+is)$, the reconstructed signal $r\u2032(x)$ is ideally identical to $l(xr+Lxy/2+x)$. Because *l*(*x*) is a known continuous function, the resampled discretized version can be obtained as $r\u2032[i]=r\u2032(id)=l(xr+Lxy/2+id)$ and $l\u2032[i]=l(id)$. Next, the cross-correlation function [$R\u2032(n)$] can be calculated about these two functions $l\u2032[i]$ and $r\u2032[i]$ in the same manner as in Eq. (2). Finally, the *x*-coordinate of the microphone is estimated as $xr=nrd\u2212Lxy/2$, where $R\u2032(n)$ assumes the maximum value when *n* = *n _{r}*.

In the above discussion, we have assumed that a binary sequence with only two values corresponding to “high” and “low” can be measured. However, the ultrasonic waves with a wavelength of 8.5 mm were used in our experimental setup, which generated rather blunted spatial patterns with a non-negligible percentage of samples measuring values that deviated from either high or low. The above interpolation for positioning improvement is still valid even in such cases. Figure 3 shows the actual process of positioning.

### E. Generation of structured acoustic holography using the ultrasonic phased array

We describe how to generate and shift the desired sound pressure amplitude field sequence $p(x,y,z,ti),\u2009\u2009i=1,\u2026,m$ using the controlled ultrasonic emission from the phase array in a workspace. The sound waves propagate through a medium according to the wave equation by which the sound fields in two mutually parallel planes are linked to each other. Hence, one field can be recovered from the other under the appropriate boundary conditions. This is the principle of holography.

In this article, we define “structured acoustic holography” as the information on the sound field in a phased-array plane that generates a predesigned pressure field in a workspace for positioning. Hence, designing structured acoustic holography is equivalent to solving for the driving pattern of the phased-array transducers required to generate the required sound pressure field. In this study, we used an airborne ultrasonic phased array of 40-kHz ultrasonic transducers with a diameter of 10 mm, arranged in a two-dimensional grid.

Several methods have been proposed for designing holography, including those using the Fourier transform^{24–26} and continuous or discrete optimization.^{27–30} In our previous investigation, we demonstrated that a continuous optimization-based method, called acoustic power optimization (APO), can be used to construct the desired amplitude patterns with good fidelity to the edge of a workspace. Therefore, we used this method in this study. The details of the APO method are formulated in our previous article;^{22} therefore, we only provide a brief description of it here.

The desired sound pressure amplitude field is ideally generated by controlling the sound pressure amplitude values on the infinitely fine and evenly spaced points in a workspace using a phased array. In practice, if the number of these points is significantly larger than the number of transducers in the phased array, the inverse problem described below cannot be solved properly; therefore, the interval between the points should be set to *c* (mm) close to the wavelength [Fig. 4(a)]. We set *M* discrete control points $qj,\u2009\u2009j=1,\u2026,M$ in the workspace and call them control points. The control points are arranged in a rectangular solid region that covers the entire workspace and is extended in the direction of the *x* axis by at least *c*.

We intend to generate a sound field corresponding to the partially extracted subsequence of the reference signal. Each control point (*q _{j}*) is assigned a target amplitude value ($p\u0302j$) based on the subsequence of the M-sequence. As stated earlier, we create pressure fields that change only in the

*x*direction. Therefore, we associate

*w*consecutive columns parallel to the

*x*direction with one bit of the M-sequence [Fig. 4(b)]. We determine $p\u0302j$ according to the sub-M-sequence bit on the column on which the control point

*q*is located: $p\u0302j=0$ for the bit “0” in the subsequence and $p\u0302j=P0$ for the bit “1”.

_{j}Next, we define the inverse problem to be solved from the equation relating the pressure at the control point to that of the output of the phased array. Let us assume that *N* transducers are embedded in the phased array. Let *z _{i}* for $i=1,\u2026,N$ be the output complex sound pressure of the

*i*th ultrasonic transducer. We define a transducer output vector $z:=[z1\cdots zN]T$, where “$\xb7T$” denotes the vector or matrix transpose of “$\xb7$”. We consider that all of the transducer positions are fixed. Then, we obtain a set of position vectors from the

*i*th transducer to the

*j*th control point, denoted as

*r*. Assume that all of the transducers can be considered to be omnidirectional point sources, and their emissions can be modeled as free space sound propagation. Subsequently, the sound pressure

_{ij}*p*at the control point

_{j}*q*is expressed as the superposition of the waves transmitted from all of the transducers

_{j}In our framework, a monaural recorded signal is the only clue for positioning, meaning that the phase information in the recorded signal cannot be straightforwardly exploited in contrast to the amplitude information. Therefore, a reasonable design criterion for the structured acoustic holography (** z**) is to minimize the error between the absolute value of the resultant sound pressure (

*p*) and the desired sound pressure distribution ($p\u0302j$). We define the objective function $J(z)$ to be minimized in the form of the squared error of those values summed about every control point as

_{j}where $|\xb7|$ denotes the absolute value of the complex value. The last term is a regularization term, which suppresses the ultrasound emission to the outside of the workspace.^{22} In our study, we set the regularization parameter *α* to 20 000. This value was experimentally determined so that the generated acoustic field contained satisfactory power while maintaining its high-low-pattern contrast. The appropriate value of *α* may vary according to the number of control points determined by the workspace size and wavelength. As in our previous study,^{22} the minimization of the objective function was performed using the Broyden–Fletcher–Goldfarb–Shanno (BFGS) method.^{31} Finally, the value of ** z** that minimizes $J(z)$ is obtained as the phased-array output pattern. Note that it is not verified if this iterative procedure guarantees the global minimization of $J(z)$.

### F. Achieving the shifts in a sound pressure amplitude pattern

In Secs. II B–II E, we assumed that the reference signals to be moved horizontally were much longer than the width of the workspace. In practice, some technical constraints, such as the aperture of the phased array and finite output energy, may deteriorate the fidelity of the generated sound field compared with the reference signal. In addition, the optimization would not be conducted properly if the number of control points was much larger than the number of transducers because it increased the degree of overdetermination of the problem. These factors become more critical when attempting to generate a pressure field that is too large compared with the aperture. Therefore, the desired sound pressure distribution must be achieved inside an appropriately sized workspace while avoiding unnecessary radiation outside the workspace.

We allocate the control points in a three-dimensional lattice with a common interval between the points along every axis set to *c*, which is comparable with the wavelength. After $J(z)$ is optimized with such a discrete sound pressure distribution, the sound pressure fields with blurred vertical stripes uniform in the depth direction are generated.

Next, we describe how to realize the shift of the generated sound pressure field using the displacement *s*, which is smaller than the control point interval *c*. We repetitively calculated the series of holograms while adding shifts of *s* in the *x* direction to the generated sound fields for each calculation. This repetitive shift is equivalently implemented by adding shifts by *s* to all of the control points and renewals of the pressure assignment defined from the subsequences of the M-sequence according to their relative position in the entire M-sequence.

When using an M-sequence with a length of $2n\u22121$, the minimum whole shift distance required for positioning is *wcn* as described above. Let $n\u2032\u2265n$ be the actual symbol numbers corresponding to the signal length received by the microphone. Then, a total of $m=\u230awcn\u2032/s\u230b+1$ pattern projections by the phased array are required, where $\u230a\xb7\u230b$ denotes the integer part of “$\xb7$”, which is called the floor function.

We give the sequence of resultant pressure fields such that their *x*-coordinate distributions coincide with the common reference signal shifted in the *x* direction by a certain distance. Figure 5 shows the relationship among the workspace range (*L _{xy}*), the control point interval (

*c*), and the number of columns corresponding to one bit. After calculating the outputs of the phased array for the required number of times (holograms), we normalize all of the holograms by the largest transducer output among all of the holograms in one positioning to increase the positioning accuracy. Let $zv,\u2009v=0,\u2026,m\u22121$ be the transducer output vector calculated for the $(v+1)$th projected pattern in the hologram sequence at

*t*=

*t*, and let $|max(zv)|$ be the absolute value of the element of $zv$ with the maximum absolute value. We multiply all of the vectors $zv$ by $max(|max(z0)|,\u2026,|max(zm\u22121)|)/|max(zv)|$ to complete the scale normalization process.

_{v}### G. Creating a reference signal for the positioning

A resampled version of the reference signal is required for the cross-correlation calculation, which we refer to as a “reference signal for positioning.” When the complex output amplitudes of the transducers $zv$ are determined, the sound pressure at an arbitrary location ** r** for the

*v*th projected pattern $p(r,tv)$ can be calculated using the following equation:

where *z _{vi}* denotes the

*i*th element of $zv$, and $ri$ denotes the

*i*th transducer position. Equation (6) is a generalized version of Eq. (4). Thus, a reference signal can be provided as a resampled version of the calculated amplitude field. The reference signal is defined as the pressure distribution calculated at $(y,z)=(0,0)$. The relation between the reference signal

*l*and amplitude field

*p*is described in Sec. II B.

The reference signal must be generated with respect to one period of the M-sequence such that any of its partial extracts correspond to the signals received by any points in the workspace in a single positioning. Additionally, a sound field calculated from each holography corresponds only to a certain subsequence of the M-sequence. Therefore, we calculated multiple sound fields from the set of holographies and combined them to generate the reference signal based on Eq. (6). First, we calculated the sound field generated in the workspace with respect to the first pattern. Next, we calculated another sound field that corresponds to a pattern with a certain shift added to the first one. We then connected the right end of the field, corresponding to the amount of shift, to the right end of the first field. The reference signal is generated by repeating this process until it is prolonged to encompass the entire M-sequence. Figure 6 shows the reference signal of the generation process at a certain length shifted from the first pattern projection.

## III. EXPERIMENTAL METHODS

### A. Experimental setup

The custom-made phased array used in the experiment consisted of 10 mm-diameter ultrasound transducers (T4010A1, Nippon Ceramic, Co., Ltd., Japan), 18 in the *x* axis direction and 14 in the *y* axis direction, arranged in a two-dimensional matrix, and three transducers were removed to attach the fixture as shown in Fig. 1. The array aperture was $181.3\u2009\u2009mm\xd7140.6\u2009\u2009mm$. The frequency of the generated sound waves was 40 kHz, resulting in a wavelength of $\lambda =8.5$ mm. The phase and output amplitude of each transducer were controlled in 256 discrete steps via communication with a personal computer (PC).

We used a standard microphone system (1/8-in. microphone, type 4138-A-015; pre-amplifier, type 2670; condition amplifier, type 2690-A; all products from Brüel and Kjær, Denmark) for recording. We operated three-dimensional linear actuators (IAI type ICSB3, Japan) to scan the acoustic field while the microphone attached to it sequentially measured the sound pressure at each observation point. The manufacturer of the microphone indicates that a 60° incident angle results in a 1–2 dB signal attenuation compared with the normal incidence for the 40 kHz ultrasound. In our experimental setup, no direct emissions from any of the transducers resulted in incident angles greater than 60°. Therefore, we consider that the microphone could be regarded as omnidirectional in our experimental setup.

In the holography calculation, we set the interval between the control points *c* to 8 mm and the number of columns corresponding to one bit of the M-sequence *w* to two. Therefore, the pattern shift width corresponding to one bit was 16 mm, and the entire shift distance was $wcn\u2032=16n\u2032$ (mm), where $n\u2032$ denotes the bit number used in the cross-correlation calculation. Additionally, we set the pattern shift per projection *s* to 4 mm and obtained the required number of measurements as $m=wcn\u2032/s+1=4n\u2032+1$.

The workspace dimensions were determined to be $(Lxy,Lz)=(50,50)$ and $(Lxy,Lz)=(100,50)$, where the control points were arranged in a three-dimensional lattice with 8 mm intervals for each of the *x*, *y*, and *z* axes. The reference M-sequence was required to have a period longer than the number of bits appearing in the workspace. In the experiment, the minimum number of given initial bits *n* to satisfy this condition was three, resulting in a period of $2n\u22121=7$. This corresponded to a pattern width of 16 × 7 = 112 mm, which could encompass both of the workspaces. We used an M-sequence $[1,1,0,1,0,0,0]$, obtained from the initial three bits $[1,1,0]$ and the generator polynomial $xn=xn\u22123+xn\u22121$.

### B. Numerical simulation

Before the real-world measurements, we performed numerical simulations to examine the following three aspects in two different workspaces: (1) whether it was possible to generate a sound pressure field with M-sequence-based stripe patterns toward the *x* axis and uniform with respect to the *y* and *z* axes, (2) whether a set of calculated holography could sequentially generate an acoustic pressure amplitude field that is virtually shifted in the workspace by a specified width while maintaining their shapes and amplitudes, and (3) how accurately self-positioning based on our method can be accomplished under a variety of experimental conditions (a)–(e), shown in Table I.

Case . | $(Lxy,Lz)$ (mm) . | Shift width s (mm)
. | Resampling width d (mm)
. | Number of received bits $n\u2032$ . |
---|---|---|---|---|

(a) | (50,50) | 4 | 1 | 4 |

(b) | (100,50) | 4 | 1 | 4 |

(c) | (50,50) | 2 | 1 | 4 |

(d) | (50,50) | 4 | 1 | 3 |

(e) | (50,50) | 4 | 0.5 | 4 |

Case . | $(Lxy,Lz)$ (mm) . | Shift width s (mm)
. | Resampling width d (mm)
. | Number of received bits $n\u2032$ . |
---|---|---|---|---|

(a) | (50,50) | 4 | 1 | 4 |

(b) | (100,50) | 4 | 1 | 4 |

(c) | (50,50) | 2 | 1 | 4 |

(d) | (50,50) | 4 | 1 | 3 |

(e) | (50,50) | 4 | 0.5 | 4 |

The numerical simulation was performed in matlab using the point-source model expressed in Eq. (4) with respect to the transducer output vector (equivalent to the holography) ** z**, obtained via the optimization with particular control points and sets of pressure distributions on them. A group of three-dimensional grid points placed at 5 mm intervals in the same area as the workspace were set as the observation points. We estimated the positions of these observation points in the experiments.

### C. Real-environment measurement

Next, we conducted a real-environment measurement. The ultrasound phased array, linear actuators, and microphone were placed as shown in Fig. 7. The experimental bench was situated near the room corner and, thus, the multi-pass reflections of the ultrasound are expected to some extent.

Similar to the numerical simulation, we examined three aspects (1)–(3) in the numerical simulation with two different workspaces. The positioning accuracy was evaluated under the two conditions shown in Table II. The sound pressure amplitude of the ultrasonic waves was measured in terms of its root mean square (RMS) value during 500 $\mu $ s of observation at each observation point. The positioning was conducted in an offline format using linear actuators to move the microphone over all of the observation points during the measurements with respect to each pattern projection. The observation points were placed at equal spacing in the three axial directions to encompass the workspace, including the edges. Note that the observation points did not necessarily coincide with the control points because the spacing between the observation points was 10 mm, whereas the spacing between the control points was 8 mm. This is because it is not guaranteed that the microphone position corresponds to one of the control points in the real application scenarios. We experimentally show that the positioning can be completed by the proposed method in these situations.

Case . | $(Lxy,Lz)$ [mm] . | Shift width s (mm)
. | Resampling width d (mm)(mm)
. | Number of received bits $n\u2032$ . |
---|---|---|---|---|

(a) | (50, 50) | 4 | 1 | 4 |

(b) | (100,50) | 4 | 1 | 4 |

Case . | $(Lxy,Lz)$ [mm] . | Shift width s (mm)
. | Resampling width d (mm)(mm)
. | Number of received bits $n\u2032$ . |
---|---|---|---|---|

(a) | (50, 50) | 4 | 1 | 4 |

(b) | (100,50) | 4 | 1 | 4 |

## IV. EXPERIMENTAL RESULTS

### A. Sound field generation

Figure 8 shows the two-dimensional sound pressure amplitude fields observed at *z* = 175, 200, and 225 mm for a pattern projection with the two workspaces. The graphs show the results calculated in the numerical simulation and those obtained from the real-environment measurements. The numerical simulations indicated that the generated sound field exhibited an M-sequence sound amplitude pattern in the *x* axis, whereas its amplitude distribution was fairly uniform with respect to the *y* and *z* axis directions. The real-environment measurements demonstrated that the fields were generated in a manner similar to that in the simulation. However, the patterns decayed at the edge of the workspace in the *y* direction. In addition, in the planes where *z* = 175 mm, the contrast between the strength and weakness of the pattern was slightly weakened in both the numerical simulation and real-environment measurements. The fidelity of the generated pattern in the real environment compared with the numerical simulation was highly dependent on the workspace size. With a larger workspace (*L _{xy}* = 100 mm), the uniformity of the sound amplitude deteriorated more. The amplitude difference between the “0” and “1” bits of the M-sequence was more ambiguous in the vicinity of the phased array. This tendency was particularly prominent for the real-environment measurements. The uniformity of the generated patterns with respect to the

*y*and

*z*directions may not be strictly satisfied. However, our method does not require uniformity because the positioning is completed via cross-correlation calculation, which is unaffected by the scaling of the received signals that may differ in the

*y*and

*z*directions.

Figure 9 shows cross-sectional plots of the amplitude distribution with respect to the *x* axis, where $(y,z)=(0,175\u2009\u2009mm)$ and $(y,z)=(0,225\u2009\u2009mm)$ and the corresponding scaled reference signal for *L _{xy}* = 100 mm. We observed that the distribution at

*z*= 175 mm exhibited a greater deviation from the reference signal than that at

*z*= 225 mm.

### B. Shifts of the sound field

We calculated a sequence of sound fields with a workspace $(Lxy,Lz)=(50\u2009\u2009mm,50\u2009\u2009mm)$ composed of five consecutive patterns that corresponded to a total of 20 mm movements of the reference signal. Figure 10 shows the sound pressure amplitude fields obtained from the first to fifth pattern projections in the numerical simulation as one-dimensional waveforms in the *x* axis direction with $(y,z)=(0,0)$. We confirmed that a common waveform corresponding to the reference signal was observed while being shifted by 4 mm for every consecutive projection, maintaining its approximate shape.

### C. Positioning

#### 1. Numerical simulation

Figure 11 shows the absolute positioning error averaged with respect to the *x*-coordinates of the observation points for each of the cases (a) to (e). In all of the cases, a much finer positioning resolution than the shift width of 4 mm was achieved. A closer examination of the results indicated that the workspace expansion was the most significant contributor to the error, as can be observed by comparing (a) with (b). Second, a comparison between (a) and (e) indicates that resampling with a width finer than 1 mm did not necessarily contribute to the performance improvement. Moreover, the shift width and number of bits did not have significant effects on the results. Figure 12 shows the relationship between the estimated and correct positions in cases (a) and (b). Almost no deviation from the correct position was observed at any of the observation points. The same result was obtained for cases (c)–(e).

#### 2. Real-environment measurement

The positioning accuracies in cases (a) and (b) are shown in Fig. 13. For (a), the error was within 1 mm for all of the 216 observation points, and the mean absolute error was 0.27 mm. For (b), the mean absolute error for all of the 726 points was 3.21 mm. The majority of the 726 points (699 points) were within the 5-mm error, whereas the remaining 27 points contained errors more than 40.0 mm, which we labeled as “outliers.” Excluding these outliers (3.71% of all of the observation points), the mean absolute error was recalculated as 0.66 mm. Figure 14 shows the relationship between the estimated and correct positions. For condition (b), the results were calculated with the outliers included or excluded. Then, we evaluated how the correlation function was obtained for a correctly positioned case (Fig. 15, top) and one with an outlier (Fig. 15, bottom). For an outlier, an erroneous correlation maximum was observed at a position that corresponded to an *x*-coordinate farther from the correct position. The outliers are presented in Table III, which shows that most of the outliers were at *z* = 175 mm.

## V. DISCUSSIONS AND CONCLUSIONS

As shown in Fig. 9, the sound amplitude distribution was more distorted at *z* = 175 mm than at *z* = 225 mm. This tendency was coherently observed when the workspace was shifted along the *z* axis in our preliminary experiments. This resulted in the observed signal $r\u2032$ becoming less similar to the reference signal $l\u2032$. Because most of the outliers were located on a plane where *z* = 175 mm, the distorted sound fields presumably deteriorated the positioning accuracy. A possible strategy to overcome this problem is to install more transducers on the phased array and place more control points in the workspace to reduce the errors in the objective function and generate sound fields with improved fidelity to the desired ones.

When using an M-sequence with a period of $2n\u22121$, the positioning is possible if the reference signal is moved by at least the corresponding number of *n* bits. However, we can move the reference signal by an extra amount greater than this minimum requirement. Here, the received value sequence contains redundancy, and the cross-correlation function is more likely to have a sharper peak, which could theoretically increase the positioning accuracy. This effect is demonstrated by comparing the simulation results under conditions (a) and (d).

Next, we evaluated the accuracy of the positioning results. The mean absolute errors obtained for the real-environment measurements were smaller than the diameter of the microphone head (1/8 in. = 3.175 mm) with respect to both workspaces, excluding the outliers. Hence, we considered it difficult to expect further improvement in the positioning accuracy using the same recording apparatus.

The positioning was completed offline in our study. However, considering that the RMS values calculated for 17 measurements of 500 $\mu $s were used, it is theoretically possible to perform the positioning of one observation point in 8.5 ms in a real-time format. Practically, the time required to switch the sound pressure pattern generated by the phased array has a significant effect on the time performance. The phased array used in the experiments required a minimum of 1 ms to switch the driving pattern of the transducers. Consequently, we estimate that at least 25 ms is required for one positioning, which means that its temporal resolution limit is approximately 40 times per second for the one-dimensional positioning. To extend the technique for the three-dimensional positioning, the same procedures must be performed 3 times in a row, resulting in a temporal resolution of 13 times per second. This number depends on the communication performance of the hardware as well as on the *Q*-value of the transducers, which cannot exceed the period of the ultrasound. The proposed technique is expected to be less vulnerable to the partial ultrasound occlusions than the conventional sound-based positioning methods. If emissions from some of the sources are blocked, the remaining sources can construct the holographic wavefield in a partially distorted form. In addition, when the shape and position of the obstacles are known, adaptive holographic reconstruction, which circumvents the obstacles, would be very practical and might be included in our next challenge.

In this article, we propose a method for achieving the self-positioning of a monaural microphone based on the structured acoustic holography technique and calculation of the cross-correlation function of the sequentially received position-dependent sound amplitude values and the common reference signal. We confirmed that it is possible to generate a sound field with a blurred binary pattern corresponding to a maximum length sequence, which can be shifted in parallel toward a certain axial direction with a width less than the ultrasound wavelength while maintaining its approximate shape. In both of the numerical simulations and real-environment measurements, we confirmed that we could achieve accurate positioning with errors of less than 1 mm in most cases. We verified the possibility of increasing the positioning accuracy by adding redundancy to the received signal sequence.

To extend our technique to three-dimensional positioning, a calibrated set of multiple phased arrays is required. A practical system to demonstrate the feasibility of this three-dimensional positioning will be included in our next challenge. Widening the workspace by using a more sophisticated holography design is also an important factor in terms of the actual applications. Additionally, last but not least, the pursuit of a new application of our technique, other than merely positioning mobile devices, is indispensable. For example, our technique can be applied to a camera-free real-time guidance of autonomous robots or a new type of motion capture system that is robust against small occlusions from sound sources resulting from the acoustic diffraction due to a much longer wavelength than light.

## ACKNOWLEDGMENTS

This research was supported by the Japan Society for the Promotion of Science Kakenhi (Grant No. 18H01458).

## References

*2017 25th European Signal Processing*