Active echolocation of sighted humans using predefined synthetic and self-emitted sounds, as habitually used by blind individuals, was investigated. Using virtual acoustics, distance estimation and directional localization of a wall in different rooms were assessed. A virtual source was attached to either the head or hand with realistic or increased source directivity. A control condition was tested with a virtual sound source located at the wall. Untrained echolocation performance comparable to performance in the control condition was achieved on an individual level. On average, the echolocation performance was considerably lower than in the control condition, however, it benefitted from increased directivity.

Orientation, navigation, and avoidance of hazards in everyday life are dominated by the human visual system in the frontal field of view (Matthis et al., 2018). Outside of the visual field, audition supports head orientation and gaze direction (Braga et al., 2016). In visually impaired or blind humans, auditory cues become important. Sound reflections and reverberation occur in (man-made) enclosed spaces, as well as in natural surroundings such as woods or caves (Małecki et al., 2020).

The direct to reverberant sound energy ratio (DRR; for a review see Kolarik et al., 2016) constitutes an important cue for auditory distance perception of sound sources in the environment. Sound reflections of self-generated sounds can be used for detection, localization, and discrimination of silent objects by blind individuals (Kolarik et al., 2014). It was shown that echolocation can also be used by sighted humans to discriminate between positions or shapes of single reflectors (Teng and Whitney, 2011; Fujitsuka et al., 2021; Tirado et al., 2021). Echolocation-based localization of objects is reduced with increasing distance for sighted and blind individuals (Rowan et al., 2013). Echolocation-based distance estimation of objects is improved by room reflections and reverberation compared to an anechoic environment for untrained sighted humans with similar performance for mouth-clicks or finger-snaps (Tonelli et al., 2016). Small changes in object distances (3 and 7 cm) can be detected by experienced echolocators at reference distances of 50 and 150 cm (Thaler et al., 2019). Estimations of larger distances (up to 32 m) with echolocation (untrained sighted) of a single concrete wall showed large variance across individuals and distances (Pelegrín-García et al., 2018). Flanagin et al. (2017) have demonstrated that sighted humans can also detect small changes in room size by analyzing echoes of their vocalizations. Moreover, the visual impression does not affect the percept of reverberation itself (Schutte et al., 2019).

For echolocation, the use of synthetic sounds can achieve comparable performance as self-generated mouth-clicks (Thaler and Castillo-Serrano, 2016). However, for self-generated sounds, loudness and click rate is typically dynamically adjusted, e.g., to compensate for distance attenuation or effects of emitter directivity (Thaler et al., 2018). A broadband and chest positioned emitter was found to be better suited for object size and position judgments than a narrowband and mouth positioned emitter (de Vos and Hornikx, 2018). Depending on the task, finger-snaps (Tonelli et al., 2016) or artificial footstep sounds (Andrade et al., 2021) can also be useful in echolocation.

Beyond localization and spatial awareness, it was exhibited that active echolocation can be used to avoid obstacles and navigate real (e.g., Kolarik et al., 2017; Tonelli et al., 2018) and virtual (Dodsworth et al., 2020; Andrade et al., 2021) spaces. Hereby, navigation and orientation in echo-acoustic space can be aided by head movements and body motions (Wallmeier and Wiegrebe, 2014). However, in contrast to echolocating mammals, specialized biosonar vocalizations and highly directed sound radiation, strongly limiting directions from which echoes can arrive, are absent in humans.

Given the everyday life prevalence of auditory localization based on cues elicited by active sound sources in the environment in contrast to echolocation, acoustics-based human navigation and orientation with a focus on visual-to-auditory sensory substitution and the potential application in augmented reality systems have been assessed by Massiceti et al. (2018) and Steffens et al. (2022). Blindfolded sighted participants were able to navigate in virtual mazes using only acoustic information from virtual sound sources (dynamically) placed at the room's boundaries. In contrast to echolocation, spatial directional and distance cues occurred for these sound sources just as they would occur for natural external sound sources in the environment.

A remaining open question is how human auditory distance and directional localization performance for active sound sources compares to that based on echolocation of silent objects using self-generated sounds. Baseline echolocation abilities of untrained participants provide important data for understanding the relevance of sound reflections for acoustical awareness of the environment, as well as for future development of training systems and supportive devices (see, e.g., Pulkki et al., 2021).

Here, we systematically assessed auditory distance and directional localization based on echolocation in comparison to a reference condition with active sound sources in the environment using low-latency, interactive virtual acoustics and headphone rendering. Three virtual rooms of I, U, and Z shapes were used to represent echoic indoor spaces. Two groups of eight untrained, normal hearing, and sighted human participants estimated (i) the distance to the frontal wall and (ii) oriented themselves toward the far walls in the three rooms. In the first group, predefined (synthetic) sounds were used, either virtually emitted from the participants's head or hand, triggered by a button press on a hand-held controller. In the second group, arbitrary own vocalizations were used, self-emitted and live recorded with a head-mounted near-field microphone placed close to the mouth. Simulated early reflections and late reverberation were presented over headphones. The use of virtual acoustics enabled assessment of the effect of (a) source directivity on echolocation, (b) head-mounted and hand-held sound emitters, (c) synthetic stimuli and self-vocalizations, and (d) reverberation time (RT60) and enabled (e) a direct comparison to acoustic reference conditions with active sound sources in the environment.

The hypotheses were that (i) echolocation benefits from increased source directivity, while (ii) echolocation leads to worse performance than for sound sources in the environment, regarding both distance estimation to the frontal wall and orientation towards the far wall.

Twelve sighted and normal-hearing (self-reported and hearing thresholds were evaluated in earlier studies) participants, between 23 and 44 years of age (eight females, four males, mean age 28.7 years old), took part in the study and were recruited by advertising at the university. Of the 12 participants, 2 received an hourly compensation. The other participants were employed by the University of Oldenburg (UOL) or Ludwig Maximilian University of Munich (LMU). All of the participants had prior experience in psychoacoustic measurements and participated voluntarily, providing informed consent. The study was approved by the Ethics Committee of the UOL (Drs.EK/2020/060) and the Ethics Committee of the Faculty of Medicine, LMU, Project No. 18–327.

Three virtual corridor-like rooms were used following basic I-, U-, and Z-shaped patterns (see insets in the upper three panels of Fig. 1). All of the rooms were built from five 3 m × 3 m × 3 m cubes and had the same floor areas of 45 m2 and volumes of 135 m3. The default reverberation time in all of the rooms was set to about 0.44 s. Additionally, a RT60 of about 4 s was realized in the I-room.

Fig. 1.

(Top row) Average orientation errors and interindividual standard deviations per subgroup (circle, group 1; star, group 2; diamond, both experiments) and room (I, U, and Z) from left to right. Individual data are indicated by the participant's number. The emitter types and source directivity are indicated on the horizontal axis. The chance error level is marked as the horizontal dashed line per room. The floor plans of the three rooms are indicated in the left upper corner of the respective panel. The fixed location for the orientation task is indicated by the gray dot. The possible locations for the distance estimation task are indicated by the gray line (dotted line in I-room for short range). The target wall for distance and direction estimation is highlighted in red. (Bottom row) Average distance errors and interindividual standard deviations per subgroup. The inset in the lower right panel shows the source directivity patterns at 0.25, 1, 4, and 16 kHz (blue, red, yellow, and purple, respectively) for a forward direction located at 0° with the directivity parameter d = 1 (weak) or d = 2 (strong).

Fig. 1.

(Top row) Average orientation errors and interindividual standard deviations per subgroup (circle, group 1; star, group 2; diamond, both experiments) and room (I, U, and Z) from left to right. Individual data are indicated by the participant's number. The emitter types and source directivity are indicated on the horizontal axis. The chance error level is marked as the horizontal dashed line per room. The floor plans of the three rooms are indicated in the left upper corner of the respective panel. The fixed location for the orientation task is indicated by the gray dot. The possible locations for the distance estimation task are indicated by the gray line (dotted line in I-room for short range). The target wall for distance and direction estimation is highlighted in red. (Bottom row) Average distance errors and interindividual standard deviations per subgroup. The inset in the lower right panel shows the source directivity patterns at 0.25, 1, 4, and 16 kHz (blue, red, yellow, and purple, respectively) for a forward direction located at 0° with the directivity parameter d = 1 (weak) or d = 2 (strong).

Close modal

A low-latency, real-time C++ implementation, liveRAZR, of the room acoustics simulator (Wendt et al., 2014) was developed and used to generate the virtual acoustic environments. Image sources (up to the fifth order) are calculated based on the room geometry, including diffraction at room corners (Kirsch and Ewert, 2021; Ewert, 2022), whereas the late reverberation is simulated with a feedback delay network. An assessment of various common room acoustical parameters and subjective ratings showed a good correspondence between auralizations with RAZR and real rooms (Wendt et al., 2014; Brinkmann et al., 2019; Blau et al., 2021).

The experiments were conducted at UOL and LMU in a sound-attenuating listening booth with participants seated on a rotating office chair, allowing for free whole-body rotations while translational movement was restricted to leaning forward (while still being seated) and moving the arms. Sounds were presented with semi-open AKG K240 studio headphones (AKG Acoustics GmbH, Vienna, Austria), driven by an RME ADI-2 Pro (predefined conditions), and an RME Fireface UCX (self-vocalized conditions) at sampling rates of 44.1 kHz (RME Audio AG, Haimhausen, Germany). For binaural rendering, a minimum phase processed head-related transfer function (HRTF) of M.S. was used. For the self-vocalized conditions, a head-mounted Sennheiser lavalier microphone (Sennheiser MKE Essential with MZA 900 P phantom adapter; Sennheiser electronic GmbH & Co. KG, Wedemark, Germany) was positioned about 5 cm to the right of the mouth and used as a live input to auralize room reflections and reverberation in response to the own vocalizations in real-time over the headphones.

A head-mounted display (HMD; Valve Index, Valve Corporation, Bellevue, WA) was used for visual presentation and motion tracking of the head in combination with the right-hand game controller, which were both calibrated with the accompanying SteamVR software. The visual information for the participants was limited to an abstract representation of the controller (position and orientation), a small panel attached to the controller to display task specific information, and a line on the floor (extending to the horizon) indicating the direction of the target wall in the distance estimation task.

LiveRAZR provided room acoustics simulation and sound rendering, controlled via open sound control messages. The controller application (Godot version 3.2.2) handled the visual rendering, motion tracking, and interaction with the virtual geometry.

In the predefined sound conditions, a synthetic “click” pulse (digital delta pulse, white spectrum) was used as default. A 20-ms pink noise burst (2 ms onset/offset, spectral peak energy slightly below 1 kHz, and a 3 dB/octave slope) was additionally tested in one predefined condition (see Sec. 2.4). In the self-vocalized conditions, participants were instructed to use any mouth-generated sound that they liked and were allowed to dynamically change the sounds during trials. Participants intuitively used combinations of tongue clicks and short mouth-generated hissing noises.

Calibration was performed with a GRAS ear simulator (IEC 60318‐1) and a GRAS 46DP-1 (reference) microphone (GRAS Sound & Vibration, Holte, Denmark), and the simulation was adjusted to correctly render a defined level of a virtual omnidirectional source in 1 m distance to an omnidirectional receiver in an anechoic condition. The level difference at the head-mounted near-field microphone (for self-vocalization) and reference microphone in 1 m distance was compensated for to correctly reproduce the level of simulated reflections and late reverberation (presented via headphones) in relation to the self-emitted direct sound (which was received via natural bone and air conduction).

The sound pressure levels (SPLs) depended on the source and receiver positions and never exceeded 78 dB(A) SPL (short-term) for the predefined sounds and a source placed 1.5 m at the wall to the left of the receiver. For the self-generated sounds, levels depended on the participant's vocalizations.

2.4.1 Experiment 1: Predefined sounds

A subgroup of eight participants (group 1) conducted the first experiment using predefined sounds. Participants found themselves blinded in the virtual I-, U-, or Z-room and generated sounds by pressing a button (i.e., at a self-chosen rate) on the hand-held controller, using one of three randomly chosen emitters. (i) In the reference (control) condition without echolocation (referred to as collider), an invisible virtual ray was cast in the pointing direction of the controller. Sounds were emitted by an omnidirectional source at the collision point of the ray with a wall boundary, offering spatial directional and distance cues as occurring for natural external sound sources. For echolocation, (ii) a hand-held and (iii) head-mounted source emitted virtual sounds from the respective positions. Two different source directivities were simulated using the pointing direction of the controller and viewing direction, respectively: The simplified (weak) high-frequency directivity of the human head (see, e.g., Thaler et al., 2017; de Vos and Hornikx, 2017 for click emissions) and a (strong) more directional characteristic was approximated with an extended spherical head model (Ewert et al., 2021, Eq. (5), directivity parameter of 1 and 2, respectively; see inset in the lower right panel of Fig. 1). Participants were allowed to generate sounds as often as they liked while performing the experimental tasks. They were instructed about the room layouts and sizes, as well as their general position in relation to the walls.

Each emitter was used in the orientation and distance estimation tasks in the I-, U-, and Z-room with the default RT60 of 0.44 s. The maximum image source model (ISM) order in the room acoustics simulation was set to three (ISM3) as a default in all of the rooms. A fifth-order condition (ISM5) was additionally simulated for each emitter in the I-room to probe the effect of an increased number of (early) specular room reflections on the participants' performance. The pulse was used as default stimulus. The burst stimulus was only used in half of the collider trials.

In the orientation task, participants had to find the direction of the farthest (target) wall by globally orienting themselves in the rooms. Responses were retrieved by pressing both buttons on top of the controller at the same time with the head oriented to directly face the target wall. The participants were randomly oriented in the virtual room at the beginning of the task such that the location of the target wall was uniformly distributed in the horizontal plane between ±60° of the initial view direction. Participants were instructed about the random orientation. The starting position was always fixed (see circles in floorplan insets in the upper panel of Fig. 1). The orientation error around the z axis in degrees was calculated from the head-tracked forward vector (viewing direction) and target wall normal vector.

In the distance estimation task, participants had to estimate how far away the virtual (target) wall was (direction indicated by a cyan colored strip on the floor). The estimated distance in meters was displayed on the virtual panel on top of the controller with a starting value of 5 m. Participants adjusted the fixed initial value in 0.25 m steps by moving the thumbstick on the controller and finally pressing both top buttons simultaneously. The starting position was randomly selected along a line with a uniformly distributed distance from 1.5 to 13.5 m from the back wall in the I-room and 1.5 to 7.5 m in the U- and Z-rooms (see lines in Fig. 1). The participants were informed about the random positioning, and the left sidewall was hinted as a reference of 1.5 m.

Six sessions with 48 trials per session (288 trials total per participant; 30–60 min) were performed.

2.4.2 Experiment 2: Self-vocalization

A subgroup of eight participants (group 2; including four participants from group 1) was asked to generate sounds of their choice using their mouth. The virtual source for the live-recorded vocalizations was fixed at mouth level with a front-facing directivity.

In addition to the default RT60 of 0.44 s, a long RT60 of about 4 s was simulated in the I-room. As in experiment 1, two source directivities (weak/strong) and two ISM orders (ISM3, all rooms; ISM5, I-room with default RT60) were simulated.

For the orientation task, the procedure was identical to experiment 1 using the I-, U-, and Z-rooms. Distance estimation was only performed in the I-room with a procedure identical to experiment 1. A condition with a shorter distance range was additionally introduced (I-short) with a random range between 0.5 and 2 m from the frontal wall. The range was chosen to be focused on the peripersonal space, better representing navigation in narrow corridors and obstacle avoidance. The short-range condition was visually indicated with a purple instead of cyan colored strip on the floor, and an initial distance value of 0.0 m was indicated.

Six sessions with 22 trials per session (132 trials total per participant; 10–20 min) were performed. The first session per participant was considered as a familiarization phase and excluded from the results.

Mean distance and orientation errors and interindividual standard deviations per group were calculated from the individual results per emitter type and room.

To obtain a reference for chance (guessing) performance for each task, a mean error per room was calculated using Monte Carlo simulations (N = 1 000 000) with random responses drawn from a probability distribution fitted to the pooled responses across all of the emitter types from both experiments. For the orientation task, a uniform target distribution between ±60° (random start rotation) and a limited normal distribution (I, μ = 1.48, σ = 35.16; U, μ = 6.49, σ = 36.61; Z, μ = 2.85, σ = 39.31; limits, ±150°) for the responses were used. For the (long) distance task, a uniform target distribution between 1.5 and 13.5 m for I, 1.5 and 7.5 m for U and Z, and a discretized gamma distribution (I, a = 3.60, b = 1.56; U, a = 3.79, b = 1.27; Z, a = 3.98, b = 1.16; steps, 0.25 m) for the responses were used. For the short-distance task, a uniform target distribution between 0.5 and 2 m and a discretized uniform distribution limited between 0.25 and 2 m for the responses were used. The ranges of the modeled distances corresponded with the full ranges during the actual experiment.

Statistical analysis was performed using a one-way repeated measures (1-WRM) or two-way repeated measures (2-WRM) analysis of variance (ANOVA) with an alpha level of 0.05 and post hoc tests with Bonferroni correction. Given that a 2-WRM ANOVA [emitter (5 levels) × ISM (2 levels)] showed no significant effect of the maximum ISM order, ISM3 and ISM5 conditions were pooled in the orientation and distance estimation results; for comparison, they are separately shown in the I-short condition. Likewise, a 2-WRM ANOVA [emitter (4) × RT60 (2)] showed no significant effect of RT60 in the I-room in the self-vocalization experiment and the data were pooled. For the collider, results with the burst and pulse stimulus were pooled given that a 1-WRM ANOVA [collider-stimulus (2)] showed no significant differences.

To assess individual performance, separate linear regression analyses between the presented (true) and individually estimated orientations and distances were conducted for each participant per task, room, and emitter condition. For the orientation task, the true target orientation re the (random) start rotation (equal to the inverse of the start rotation) and the estimated orientation re the start orientation were used. Significant results of the individual linear regression models were indicated with R-squared and corresponding p values, F statistic, and degrees of freedom (df) per room.1

The mean orientation errors and standard deviations per subgroup (group 1, group 2, group of the four participants of both experiments, indicated by circles, stars, and diamonds) are shown in the upper row of Fig. 1. Individual results are indicated by the participant's number (red colors indicate the four participants of both experiments). The different emitters are indicated with the simulated source directivity on the abscissa. The results are shown per room in three panels from left to right. The chance error level is marked as the horizontal dashed line per room.

In general, the mean orientation errors were slightly lower in the I-room when compared to the U- or Z-rooms. The mean orientation errors of the collider (reference) were found to be lowest in all of the rooms, whereas the head-mounted emitter with a weak directivity showed the highest errors. The highest errors with the hand-held and self-vocalized emitters were found to be slightly below those for the head-mounted emitter. In the I-room, the collider and emitters with strong directivity clearly reached mean errors below the chance level. For the U- and Z-rooms, the picture is less clear cut and mainly the collider and head-mounted emitter with strong directivity operated well below chance level.

For group 1 (predefined sounds), a 2-WRM ANOVA [room (3) × emitter (5)] showed significant main effects of room [F(2,14) = 9.05, p < 0.01], emitter [F(4,28) = 26.26, p < 0.001], and no significant interaction. For group 2 (self-vocalization), a 2-WRM ANOVA [room (3) × emitter (2)] showed significant main effects of room [F(2,14) = 4.80, p < 0.05], emitter [F(1,7) = 18.89, p < 0.01], and no significant interaction. For the group participating in both experiments, room [F(2,6) = 22.35, p < 0.01] and emitter [F(6,18) = 13.67, p < 0.001] showed significant main effects with no significant interaction [2-WRM ANOVA; room (3) × emitter (7)].

Post hoc tests showed an effect of source directivity given that the mean orientation error tended to be lower for strong source directivity. This was most pronounced for the head-mounted (p < 0.01) and self-vocalized (p < 0.01) emitters.

Regarding the individual performance, for the collider, a significant average correlation of the estimated orientation and presented (true) target orientation (R2 ≥ 0.85) was found in all of the rooms [individual p values < 0.05; F(dfm,dfr) ≥ 54; dfr,I = 22, dfr,U = 10, dfr,Z = 10], except for participants 10 (all rooms) and 11 (U-room). Correlations of the head-mounted and hand-held emitters with strong directivity were highest in the I-room [R2 ≥ 0.75; individual p values < 0.05; F(dfm,dfr) ≥ 30; dfr,I = 10, dfr,U = 4, dfr,Z = 4], except for participants 9 for head and 10–12 for hand. In the other conditions, correlations were highly dependent on the individual performance in each room. For the self-vocalizations with strong directivity, results of three participants (1, 5, and 7) showed consistent correlations over all of the rooms [R2 ≥ 0.78, individual p values < 0.05; F(dfm,dfr) ≥ 10; dfr,I = 13, dfr,U = 3, dfr,Z = 3, except for U-room, participant 1].

The lower panel of Fig. 1 shows the mean distance errors and standard deviations per subgroup. Here, mean distance errors were generally observed to be higher in the I-room, where the target wall distance was larger than those in the U- or Z-room. In the U- and Z-rooms, the mean distance errors of the collider reference were similar to those of the other emitters. In the I-room, the head-mounted emitter with a weak directivity clearly showed the highest mean distance error. Source directivity did only appear to have an effect in the I-room: The mean errors were below chance level for the collider and strong directivity in the I-room. Otherwise they were close to chance, particularly for the U- and Z-rooms.

For group 1 (predefined sounds), a 2-WRM ANOVA [room (3) × emitter (5)] showed significant main effects of room [F(2,14) = 34.82, p < 0.001], emitter [F(4,28) = 10.36, p < 0.001], and an interaction [F(8,56) = 9.91, p < 0.001]. A 1-WRM ANOVA [emitter (2)] showed a main effect of the emitter [F(1,7) = 12.67, p < 0.01] for group 2 (self-vocalization). For the group participating in both experiments, main effects of room [F(2,6) = 27.82, p < 0.001], emitter [F(4,12) = 4.05, p < 0.05], and an interaction [F(8,24) = 5.41, p < 0.001] were found [2-WRM ANOVA; room (3) × emitter (7)]. Post hoc tests showed an effect of source directivity only in the I-room for the head-mounted (p < 0.01) and self-vocalized (p < 0.01) emitters.

Average correlations of the perceived distance and presented distance for the collider were found to be moderate (R2 ∼ 0.5) in all of the rooms (individual p values < 0.05; dfr,I = 22, dfr,U = 10, dfr,Z = 10, except for participant 11). For the strong directivity in room I, some participants reached significant correlations. Otherwise, no further consistent correlations were observed.

Figure 2 shows the mean errors for the self-vocalized short-distance estimation (I-room only). All short-distance mean errors are within an equal range for the two tested groups. A small benefit with strong source directivity is observed when compared to the modeled chance level. A 1-WRM ANOVA [emitter + RT60 + ISM (6)] showed no significant effect of emitter type for both groups. However, average correlation results of the three participants (1, 5, and 7), who also showed good performance in the self-vocalized orientation task, were also observed to be moderately better (R2 ≥ 0.6, pooled over strong directivity; dfr,strong = 13) in comparison to the other participants.

Fig. 2.

Average distance errors and interindividual standard deviations per subgroup for the self-vocalized short-distance estimation (I-room only, otherwise, same layout as in Fig. 1). Results for ISM3 and ISM5 are shown in the left and right sides, respectively.

Fig. 2.

Average distance errors and interindividual standard deviations per subgroup for the self-vocalized short-distance estimation (I-room only, otherwise, same layout as in Fig. 1). Results for ISM3 and ISM5 are shown in the left and right sides, respectively.

Close modal

Taken together, in line with the relation of the group means and chance level, the correlation analysis indicated that the distance estimation results were mostly close to guessing performance. Exceptions were the collider, the strong directivity in the I-room for the (long) distance task, and some individual good performing participants in the short-distance task.

The sighted participants were able to grasp the current echolocation tasks without any explicit training during the experimental sessions. However, the averaged results should be interpreted with caution, given the limited sample size and pronounced individual differences. Distance estimation was for many conditions at chance level, except for the three best performing participants. Generally, pronounced individual differences were also found in other echolocation studies (e.g., de Vos and Hornikx, 2018; Andrade et al., 2021).

The overall lowest errors were observed in the reference collider condition with active sound sources in the environment. In this case, localization in azimuth is supported by interaural level differences (ILDs) and interaural time differences (ITDs) e.g., Middlebrooks and Green (1991), whereas distance can be estimated by level cues and DRR in a room (Kolarik et al., 2016). Without these natural everyday cues, orientation in the echolocation task was found to be more accurate than the distance estimations on the participants' individual levels. The current virtual setup allowed us to directly probe the effect of source directivity: Performance with strong directivity was closer to that obtained in the collider (reference) condition, particularly for the predefined sound sources. Orientation with self-vocalization was generally more difficult, and three participants were clearly able to perform the orientation task with the (unrealistic) strong source directivity. Similar to specialized echolocating mammals, such highly directed sound radiation likely reduced the directions from which echoes arrive, making localization and interpretation of ILD and ITD cues easier for, e.g., identifying the left and right sidewalls and orienting in parallel (toward the target wall). Similarly, echolocation experts might indirectly optimize directivity to some extent by adapting their emissions (Thaler et al., 2017; Thaler et al., 2018). The current virtual setup could be extended to account for vocalization-dependent directivity changes or probe effects of source directivity and related behavioral strategies for orientation and distance estimation.

Only for the strong directivity in the I-room and the (long) distance task, a few participants could reliably perform the distance estimation task using echolocation. Surprisingly, distance estimations in the I-room did not significantly differ with a longer RT60 and a corresponding change in DRR, which is in contrast to the findings by Tonelli et al. (2016), where a longer RT60 (1.4 s compared to 0.4 s) improved performance in depth echolocation.

It can be suspected that the presented ranges were generally too far and difficult to estimate without more training and without a visual reference. Furthermore, participants might not have reliably used the left sidewall of the corridor as acoustic reference. Nevertheless, the short-distance estimations of the three best participants from the orientation task were also found to be quite accurate. Potentially, optimally adjusted (dynamic) vocalization of echolocation experts (see Thaler et al., 2018) could have led to better performance. Without recording and detailed analysis of the self-vocalizations and movement patterns, it remains unclear how the performance of the current best individuals relates to that of echolocation experts. It is possible that individuals might have adapted vocalizations to conditions with weak and strong directivity, which, however, appears improbable given the randomization of the directivity conditions and relatively short overall duration of the experiment, making training and adaptation effects unlikely. For untrained echolocators, a variety of (artificial) sounds were found to be adequately useful (see Thaler and Castillo-Serrano, 2016; Tonelli et al., 2016; Andrade et al., 2021), similar to the current study.

In conclusion, orientation with the different emitter types was possible and distances could be estimated below chance level with the reference collider and strongly directed emitters, however, with different performance across individuals. Based on the current results with strong source directivity and without training, a stepwise reduction of sound source directivity in the virtual acoustic environment might be promising for future virtual and augmented reality (VR/AR) echolocation training applications.

This work was supported by Deutsche Forschungsgemeinschaft (DFG) Project No. 406277157 “Acoustical Awareness, Orientation, and Navigation in Rooms.” We would like to thank Birger Kollmeier for continuous support and the Valve Corporation for providing Valve Index HMDs.

1

See supplementary material at https://www.scitation.org/doi/suppl/10.1121/10.0016403 for the scatterplots of the underlying individual data for the orientation estimates, distance estimates (long and short), and tables of all individual regression results.

1.
Andrade
,
R.
,
Waycott
,
J.
,
Baker
,
S.
, and
Vetere
,
F.
(
2021
). “
Echolocation as a means for people with visual impairment (PVI) to acquire spatial knowledge of virtual space
,”
ACM Trans. Access. Comput.
14
,
1
25
.
2.
Blau
,
M.
,
Budnik
,
A.
,
Fallahi
,
M.
,
Steffens
,
H.
,
Ewert
,
S. D.
, and
van de Par
,
S.
(
2021
). “
Toward realistic binaural auralizations—Perceptual comparison between measurement and simulation-based auralizations and the real room for a classroom scenario
,”
Acta Acust.
5
,
8
.
3.
Braga
,
R. M.
,
Fu
,
R. Z.
,
Seemungal
,
B. M.
,
Wise
,
R. J. S.
, and
Leech
,
R.
(
2016
). “
Eye movements during auditory attention predict individual differences in dorsal attention network activity
,”
Front. Hum. Neurosci.
10
,
164
.
4.
Brinkmann
,
F.
,
Aspöck
,
L.
,
Ackermann
,
D.
,
Lepa
,
S.
,
Vorländer
,
M.
, and
Weinzierl
,
S.
(
2019
). “
A round robin on room acoustical simulation and auralization
,”
J. Acoust. Soc. Am.
145
,
2746
2760
.
5.
de Vos
,
R.
, and
Hornikx
,
M.
(
2017
). “
Acoustic properties of tongue clicks used for human echolocation
,”
Acta Acust. Acust.
103
,
1106
1115
.
6.
de Vos
,
R.
, and
Hornikx
,
M.
(
2018
). “
Human ability to judge relative size and lateral position of a sound reflecting board using click signals: Influence of source position and click properties
,”
Acta Acust. Acust.
104
,
131
144
.
7.
Dodsworth
,
C.
,
Norman
,
L. J.
, and
Thaler
,
L.
(
2020
). “
Navigation and perception of spatial layout in virtual echo-acoustic space
,”
Cognition
197
,
104185
.
8.
Ewert
,
S. D.
(
2022
). “
A filter representation of diffraction at infinite and finite wedges
,”
JASA Express Lett.
2
,
092401
.
9.
Ewert
,
S. D.
,
Buttler
,
O.
, and
Hu
,
H.
(
2021
). “
Computationally efficient parametric filter approximations for sound-source directivity and head-related impulse responses
,” in
2021 Immersive and 3D Audio: From Architecture to Automotive (I3DA)
, 8–10 September, Bologna, Italy, pp.
1
6
.
10.
Flanagin
,
V. L.
,
Schörnich
,
S.
,
Schranner
,
M.
,
Hummel
,
N.
,
Wallmeier
,
L.
,
Wahlberg
,
M.
,
Stephan
,
T.
, and
Wiegrebe
,
L.
(
2017
). “
Human exploration of enclosed spaces through echolocation
,”
J. Neurosci.
37
,
1614
1627
.
11.
Fujitsuka
,
Y.
,
Sumiya
,
M.
,
Ashihara
,
K.
,
Yoshino
,
K.
,
Nagatani
,
Y.
,
Kobayasi
,
K. I.
, and
Hiryu
,
S.
(
2021
). “
Two-dimensional shape discrimination by sighted people using simulated virtual echoes
,”
JASA Express Lett.
1
,
011202
.
12.
Kirsch
,
C.
, and
Ewert
,
S. D.
(
2021
). “
Low-order filter approximation of diffraction for virtual acoustics
,” in
2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
, 17–20 October, New Paltz, NY, pp.
341
345
.
13.
Kolarik
,
A. J.
,
Cirstea
,
S.
,
Pardhan
,
S.
, and
Moore
,
B. C. J.
(
2014
). “
A summary of research investigating echolocation abilities of blind and sighted humans
,”
Hear. Res.
310
,
60
68
.
14.
Kolarik
,
A. J.
,
Moore
,
B. C. J.
,
Zahorik
,
P.
,
Cirstea
,
S.
, and
Pardhan
,
S.
(
2016
). “
Auditory distance perception in humans: A review of cues, development, neuronal bases, and effects of sensory loss
,”
Atten. Percept. Psychophys.
78
,
373
395
.
15.
Kolarik
,
A. J.
,
Scarfe
,
A. C.
,
Moore
,
B. C. J.
, and
Pardhan
,
S.
(
2017
). “
Blindness enhances auditory obstacle circumvention: Assessing echolocation, sensory substitution, and visual-based navigation
,”
PLoS One
12
,
e0175750
.
16.
Małecki
,
P.
,
Czopek
,
D.
,
Piechowicz
,
J.
, and
Wiciak
,
J.
(
2020
). “
Acoustic analysis of the glacier caves in Svalbard
,”
Appl. Acoust.
165
,
107300
.
17.
Massiceti
,
D.
,
Hicks
,
S. L.
, and
van Rheede
,
J. J.
(
2018
). “
Stereosonic vision: Exploring visual-to-auditory sensory substitution mappings in an immersive virtual reality navigation paradigm
,”
PLoS One
13
,
e0199389
.
18.
Matthis
,
J. S.
,
Yates
,
J. L.
, and
Hayhoe
,
M. M.
(
2018
). “
Gaze and the control of foot placement when walking in natural terrain
,”
Curr. Biol.
28
,
1224
1233
.
19.
Middlebrooks
,
J. C.
, and
Green
,
D. M.
(
1991
). “
Sound localization by human listeners
,”
Annu. Rev. Psychol.
42
,
135
159
.
20.
Pelegrín-García
,
D.
,
De Sena
,
E.
,
van Waterschoot
,
T.
,
Rychtáriková
,
M.
, and
Glorieux
,
C.
(
2018
). “
Localization of a virtual wall by means of active echolocation by untrained sighted persons
,”
Appl. Acoust.
139
,
82
92
.
21.
Pulkki
,
V.
,
McCormack
,
L.
, and
Gonzalez
,
R.
(
2021
). “
Superhuman spatial hearing technology for ultrasonic frequencies
,”
Sci. Rep.
11
,
11608
.
22.
Rowan
,
D.
,
Papadopoulos
,
T.
,
Edwards
,
D.
,
Holmes
,
H.
,
Hollingdale
,
A.
,
Evans
,
L.
, and
Allen
,
R.
(
2013
). “
Identification of the lateral position of a virtual object based on echoes by humans
,”
Hear. Res.
300
,
56
65
.
23.
Schutte
,
M.
,
Ewert
,
S. D.
, and
Wiegrebe
,
L.
(
2019
). “
The percept of reverberation is not affected by visual room impression in virtual environments
,”
J. Acoust. Soc. Am.
145
,
EL229
EL235
.
24.
Steffens
,
H.
,
Schutte
,
M.
, and
Ewert
,
S. D.
(
2022
). “
Acoustically driven orientation and navigation in enclosed spaces
,”
J. Acoust. Soc. Am.
152
,
1767
1782
.
25.
Teng
,
S.
, and
Whitney
,
D.
(
2011
). “
The acuity of echolocation: Spatial resolution in the sighted compared to expert performance
,”
J. Vis. Impair. Blind
105
,
20
32
.
26.
Thaler
,
L.
, and
Castillo-Serrano
,
J.
(
2016
). “
People's ability to detect objects using click-based echolocation: A direct comparison between mouth-clicks and clicks made by a loudspeaker
,”
PLoS One
11
,
e0154868
.
27.
Thaler
,
L.
,
De Vos
,
H. P. J. C.
,
Kish
,
D.
,
Antoniou
,
M.
,
Baker
,
C. J.
, and
Hornikx
,
M. C. J.
(
2019
). “
Human click-based echolocation of distance: Superfine acuity and dynamic clicking behaviour
,”
J. Assoc. Res. Otolaryngol.
20
,
499
510
.
28.
Thaler
,
L.
,
De Vos
,
R.
,
Kish
,
D.
,
Antoniou
,
M.
,
Baker
,
C.
, and
Hornikx
,
M.
(
2018
). “
Human echolocators adjust loudness and number of clicks for detection of reflectors at various azimuth angles
,”
Proc. R. Soc. B
285
,
20172735
.
29.
Thaler
,
L.
,
Reich
,
G. M.
,
Zhang
,
X.
,
Wang
,
D.
,
Smith
,
G. E.
,
Tao
,
Z.
,
Abdullah
,
R. S. A. B. R.
,
Cherniakov
,
M.
,
Baker
,
C. J.
,
Kish
,
D.
, and
Antoniou
,
M.
(
2017
). “
Mouth-clicks used by blind expert human echolocators—Signal description and model based signal synthesis
,”
PLoS Comput. Biol.
13
,
e1005670
.
30.
Tirado
,
C.
,
Gerdfeldter
,
B.
, and
Nilsson
,
M. E.
(
2021
). “
Individual differences in the ability to access spatial information in lag-clicks
,”
J. Acoust. Soc. Am.
149
,
2963
2975
.
31.
Tonelli
,
A.
,
Brayda
,
L.
, and
Gori
,
M.
(
2016
). “
Depth echolocation learnt by novice sighted people
,”
PLoS One
11
,
e0156654
.
32.
Tonelli
,
A.
,
Campus
,
C.
, and
Brayda
,
L.
(
2018
). “
How body motion influences echolocation while walking
,”
Sci. Rep.
8
,
15704
.
33.
Wallmeier
,
L.
, and
Wiegrebe
,
L.
(
2014
). “
Self-motion facilitates echo-acoustic orientation in humans
,”
R. Soc. Open Sci.
1
,
140185
.
34.
Wendt
,
T.
,
van de Par
,
S.
, and
Ewert
,
S. D.
(
2014
). “
A computationally-efficient and perceptually-plausible algorithm for binaural room impulse response simulation
,”
J. Audio Eng. Soc.
62
,
748
766
.

Supplementary Material