This study proposes a method for analyzing sampling jitter in audio equipment based on the time-domain analysis, considering the temporal fluctuations of the zero-crossing points in the recorded sinusoidal waves to characterize the jitter. This method enabled the separate evaluation of jitter in an audio player from those in audio recorders when the same playback signal is simultaneously fed into two audio recorders. The experiments were conducted using commercially available portable devices with a maximum sampling rate of 192 000 samples per second. The results demonstrated that jitter values on the order of a few tens of picoseconds can be identified in an audio player. Moreover, the proposed method enabled the separation of jitter from phase-independent noise utilizing the left and right channels of the audio equipment. As such, this method is applicable for performance evaluation of audio equipment, signal generators, and clock sources.

## I. INTRODUCTION

Sampling jitter in audio equipment is an error in the sampling instants from the ideal timing, i.e., where $ f S$ is the sampling frequency of digital-to-analog converter (DAC) and analog-to-digital converter (ADC). Sampling jitter $ j S ( t )$ causes the sampling instants $ t [ i ] = ( i \u2212 1 ) f S \u2212 1$ to be changed to $ t [ i ] = ( i \u2212 1 ) f S \u2212 1 + j S ( t [ i ] )$. Hence, sampling jitter affects the performance of audio equipment. The sampling jitter has conventionally been analyzed in the frequency domain.^{1–4} In this method, one plays back a sinusoidal wave whose frequency is $ f P / 4$, where $ f P$ is the sampling frequency of the audio player, records it, and then examines the frequency response using a window function with small sidelobe levels such as a Blackman Harris window. In addition to frequency-domain analysis (FDA), time domain analysis (TDA) of sampling jitter has been conducted.^{5} The Hilbert transform has been employed to obtain the real jitter waveform $ j S ( t )$. The advantage of TDA is that one can separately extract jitter and amplitude modulation (AM) from a recorded waveform, which is not possible with FDA.

In this study, we propose an efficient and powerful method to characterize sampling jitter in audio equipment. The proposed method comprises two key elements. The first is an improved TDA termed zero-crossing analysis (ZCA). To apply this method, the zero-crossing points (ZCPs) of a recorded waveform are analyzed, following which the ZCPs of an ideal sinusoidal wave are calculated. Time differences in ZCPs between the recorded waveform and an ideal sinusoidal wave contain the jitter information. We term these time differences “zero-crossing fluctuations (ZCFs).” The ZCA enables us to extract jitter information from a recorded waveform even when the input signal contains both jitter and AM. The second key element is the simultaneous recording of the same playback signal with two audio recorders to generate two independent waveforms. We term this “double recorder setup (DRS).” Because ZCA preserves absolute time information, we can exactly compare and calculate the positive and negative correlations of ZCFs between the two generated waveforms. Based on the addition rule of probability, the sampling jitter of the player and that of the recorders can be individually evaluated.

Note that the proposed method requires neither an optional output clock signal synchronized with the recorders' internal clock nor an external clock generator that is more precise than the internal clock. Thus, the proposed method is possible using low-cost recorders and is feasible at an end-user level. The proposed method can also be applied to high-frequency phase noise and jitter measurements. Replacing the two recorders with two digital oscilloscopes with higher sampling rates, we can utilize the proposed method for evaluating the performance of various signal and clock generators that output high-frequency sinusoidal waves.

The method is somewhat similar to the reciprocal calibration of microphones, which has been used since the 1940s;^{6,7} however, it does not require the bidirectional use of devices. The DRS is similar to the setup of the cross-spectrum method (CSM) for phase noise measurement.^{8} In CSM, repeating FDA with double instruments reduces the influence of the instruments to $ 1 / m CS$, where $ m CS$ is the number of measurements. The proposed method is the so-called TDA version of CSM. The influence of the instruments is canceled by using DRS.

In this study, we focus on the performance of audio equipment; however, it contributes to the field of human audibility.^{9–11} Previous jitter studies^{9} demonstrated that the threshold of perceptual detection of random jitter in music signals is large, but the original jitter, i.e., the one that is already existing before adding extra jitter, has not been controlled at that time. Researchers can quickly select instruments with minimal jitter using the method proposed herein. It provides an opportunity to examine how the detection threshold is reduced after the testees are well-trained using audio players with lower levels of jitter and using music signals with which slight artificial jitter is compounded. Moreover, the proposed method helps to diagnose whether the player is appropriately operating in all audibility studies.

The remainder of this study is organized as follows. Section II describes the principles on which the proposed method is based. Section III presents the experimental procedure. Section IV presents our results. In Sec. V, multiple perspectives are discussed, and Sec. VI provides a summary and outlook. We performed numerical calculations in supplementary material, in which several instructive results are presented. A preprint of this study has been posted on a preprint server.^{12}

## II. PRINCIPLES

### A. Classification of noise in a playback signal and a recorded waveform

*t*,

*ω*,

*θ*

_{0}, and

*A*

_{0}are the time, angular frequency, initial phase, and amplitude of the wave, respectively. When one plays a digital audio file by which a pure sinusoidal wave is expected to be reproduced, the resulting playback signal is not a pure sinusoidal wave. It contains (i) jitter, (ii) AM, and (iii) phase-independent (PI) noise. Here, we explain these three noise patterns individually. (i) Jitter is the deviation of the playback timing at each point. When jitter is present,

*t*is replaced by $ t + j ( t )$, where

*j*(

*t*) is the jitter of the player. The pure sinusoidal wave $ F pure ( t )$ is thus changed to

*j*(

*t*) when

*A*

_{0}and

*ω*are larger. A conceptual view of jitter is shown in Fig. 1(a). (ii) AM is the amplitude variation of a wave concerning time. If AM is present,

*A*

_{0}is replaced by $ A 0 + A M ( t )$, where $ A M ( t )$ is a continuous function that represents AM at time

*t*. The pure sinusoidal wave $ F pure ( t )$ is changed to

*c*(

*t*) become

*k*is the index of ZCPs. At $ t = s k \u2032$, the playback signal

*c*(

*t*) is not equal to zero and is given by

The aforementioned noise classification is also appropriate in the recording process. Although a data array is a set of discrete variables rather than a continuous function, we refer to it as a “waveform” in the following. When one records a pure sinusoidal playback signal, the recorded waveform is not equal to the sampled data points of a pure sinusoidal wave because it contains (i) jitter, (ii) AM, and (iii) PI noise. In real digital audio recorders, jitter originates primarily from the internal clock module, whereas AM originates from ADC units. The driver amplifier for ADC contributes to PI noise.

As demonstrated later in this study, PI noise becomes comparable to jitter in the case of recent audio equipment with small jitter values, typically less than $ 100 \u2009 ps$. Jitter and PI noises can be separated using the left and right channels of the player and recorder.

### B. Modeling of digital audio player

Herein, we represent how jitter, AM, and PI noises, introduced in Sec. II A, can be realized in a real playback process. Hence, we introduce a model of a digital audio player. A schematic of a single-channel digital audio player is shown in Fig. 2(a). The parameters are fixed to be the same as those of the experimental conditions described in Sec. III and can be arbitrarily selected depending on the experimental conditions. The DAC in the player, assumed to be ideal and noise-free, performs the conversion at $ N P$ bits with a sampling rate $ f P$. In this study, $ N P$ and $ f P$ are set to $ 24 \u2009 bit$ and $ 48 \u2009 kHz$, respectively.

*i*is a natural number and $ v [ i ]$ is a $ N P$ bit signed integer. The length of $ v [ i ]$ is $ N total$. The waveform $ v [ i ]$ used in this study is shown in Fig. 2(b). It can be separated into five parts, labeled (i), (ii), (iii), (iv), and (v) as depicted in the figure. Dots have been reduced to improve visibility in the figure. The horizontal axis represents index

*i*. (i) Silent part, where the playback waveform is $ v [ i ] = 0$ for $ 1 \u2009 \u2a7d \u2009 i \u2009 \u2a7d \u2009 i main \u2212 N F \u2212 1$. As shown below, $ i main$ is the first index of the main part, and $ N F$ is the length of the fade part. We set $ i main = 480 \u2009 000$ and $ N F = 240 \u2009 000$. Therefore, the temporal duration of the silent part becomes $ ( i main \u2212 N F ) f P \u2212 1 \u2248 5 \u2009 s$. (ii) Fade-in part: the playback waveform has a form of

The DAC output voltage, represented as *v*(*t*), is a square wave. For $ t P + ( i \u2212 1 ) f P \u2212 1 \u2009 \u2a7d \u2009 t < t P + i f P \u2212 1$, *v*(*t*) is expressed as $ v ( t ) = A P v [ i ] / v max$. The time when playback is started is $ t P$. In this section, we set $ t P = \u2212 ( i main + N main / 6 ) f P \u2212 1 \u2248 \u2212 15 \u2009 s$.

A low-pass filter (LPF) is connected after the DAC. The cut-off frequency of the LPF is assumed to be $ f P / 2$. Because the square wave *v*(*t*) is smoothed by the LPF, the output voltage after the LPF becomes a pure sinusoidal wave expressed as Eq. (1). The frequency of the pure sinusoidal wave becomes $ f C : = \omega / 2 \pi = f P / 4$ since the main part of the playback waveform is expressed as Eq. (14). The relationship between *v*(*t*) and $ F pure ( t )$ is depicted in Fig. 2(c). The dotted and solid lines represent *v*(*t*) and $ F pure ( t )$, respectively. For comparison with *v*(*t*), the playback waveform $ v [ i ]$ is plotted as black squares at the position $ t = t P + ( i \u2212 1 ) f P \u2212 1$ and $ v ( t ) = A P v [ i ] / v max$. As mentioned above, the components of $ 3 f C , 5 f C , \u2026$ in *v*(*t*) are perfectly attenuated by the LPF, and a pure sinusoidal wave of frequency $ f C$ remains.

The playback signal *c*(*t*) is obtained by adding $ n jitter ( t ) ,$ $ n AM ( t ) ,$ $ n PI ( t )$ to $ F pure ( t )$, as shown in Eqs. (9) and (10). In real audio player, jitter is primarily caused by fluctuation of $ f P$. However, in this model, DAC and LPF are ideal, and jitter noise is added after the LPF. The output is made by a buffer amplifier with a direct current (DC)-blocking capacitor.

### C. Modeling of digital audio recorder

*c*(

*t*) are fed into a buffer amplifier with input impedance $ Z R$. Subsequently, jitter $ a jitter ( t )$, AM $ a AM ( t )$, and PI noise $ a PI ( t )$ are added, and the high-frequency component is attenuated by an LPF. The cut-off frequency of the LPF is assumed to be $ f R / 2$. The voltage signal before an ADC is represented as

*x*(

*t*). Their relationship is similar to Eqs. (9) and (10). This relationship can thus be expressed as follows:

*i*th value is expressed as $ x [ i ] = floor [ x max { x ( t [ i ] ) / A R} ]$, where $ A R$ is a constant with voltage dimensions and $ x max : = 2 N R \u2212 1 \u2212 1$ is the maximum value of a $ N R$ bit signed integer. In this study, $ N R$ and $ f R$ are set to 24 bit and 192 kHz, respectively. The analog-to-digital conversion timing is denoted as $ t [ i ]$. The value $ t [ i ]$ is expressed as

The relationship between *x*(*t*) and $ x [ i ]$ is depicted in Fig. 3(b). The solid line represents *x*(*t*). The recorded waveform $ x [ i ]$ is plotted as white squares at $ t = t [ i ]$ and $ x ( t ) = x [ i ]$. The range of the horizontal axis is equal to that of Fig. 2(c). The fluctuation of the solid line in this figure shows artificial random noise. Because the ratio between the sampling rate of the ADC and the frequency $ F pure ( t )$ is $ f R / f C = 16$, 16 sampling points are present for each wavelength.^{13}

We demonstrate the model diagram of a dual-channel digital audio recorder in Fig. 3(c). The recorder comprises two single-channel digital audio recorders; the jitter inputs of the single-channel recorders are assumed to be equipotential. To obtain the experimental results described in Secs. IV A, IV B, and IV C, we used a dual-channel recorder as a single-channel recorder by contacting two analog inputs and averaging their two waveforms $ x ( L ) [ i ]$ and $ x ( R ) [ i ]$. We term this setup a “pseudo single-channel recorder.” As an exception, we analyzed waveforms $ x ( L ) [ i ]$ and $ x ( R ) [ i ]$ separately to estimate the jitter from the digital audio recorder. See Sec. V D for details.

### D. ZCA

*x*(

*t*), crosses the

*t*-axis while $ 0 \u2009 \u2a7d \u2009 t \u2009 \u2a7d \u2009 T$. For this purpose, we reconstruct a continuous function $ x \u2032 ( t )$ from sampling data $ x [ i ]$, which satisfies $ x \u2032 ( t ) \u2248 x ( t )$ for $ 0 \u2009 \u2a7d \u2009 t \u2009 \u2a7d \u2009 T$. The reconstruction process comprises three steps. (i) To avoid the boundary effect of the sampling data, a window function $ w ( t [ i ] )$ is multiplied to $ x [ i ]$ as $ x [ i ] w ( t [ i ] )$, where

*w*(

*t*) is the Blackman type, and is expressed as follows:

*N*, and the domain of

*w*(

*t*) becomes $ \u2212 0.25 \u2009 s \u2009 \u2a7d \u2009 t \u2009 \u2a7d \u2009 1.25 \u2009 s$. Thus the data length $ x [ i ] w ( t [ i ] )$ becomes 6

*N*. (ii) After the multiplication with window function, the data points are interpolated using the fast Fourier transform (FFT) method by an oversampling factor of $ N over = 64$. As a result, the number of data points increases to $ 6 N over N$. The value of $ N over$ is adjusted depending on the required accuracy. We confirmed that $ N over = 64$ is sufficiently large by performing numerical simulation.

^{14}Thanks to the FFTW library used in matlab,

^{15}the computation time is almost negligible. We also applied bandwidth limitations to the data to eliminate the DC component. (iii) The interpolated points are connected by a line. After these three steps, a continuous function $ x \u2032 ( t )$ is obtained from the discrete data $ { x [ i main \u2212 N ] , \u2026 , x [ i main + 5 N ]}$.

Figure 4(a) shows the process to obtain $ x \u2032 ( t )$. The range of the horizontal axis is equal to that of Fig. 3(b). The white squares are the recorded waveform $ x [ i ]$, and the black circles are the interpolated points by the FFT method. Solid lines represent $ x \u2032 ( t )$. Figure 4(b) presents a magnified view of Fig. 4(a) around the first and second ZCPs. For the ease of viewing, the FFT interpolation was performed using an oversampling factor $ N over = 4$ in Figs. 4(a), 4(b). One can see that the $ N over \u2212 1$ points are interpolated between two recorded data points. As shown in Fig. 4(b), the zero-crossing times are labeled as $ t = s 1 , s 2 , \u2026 , s M$, where *M* denotes the number of ZCPs when $ 0 \u2009 \u2a7d \u2009 t \u2009 \u2a7d \u2009 T$. The obtained sequence $ s 1 , s 2 , \u2026 , s M$ is not equally spaced because of jitter.

*s*using the least squares method. A conceptual diagram is provided in Fig. 5. The fitting function is written as follows:

_{k}*s*from the straight line is less than $ 100 \u2009 ps$; this is enlarged in Fig. 5 to ease visualization. From the fitting function of Eq. (19), the

_{k}*k*th equidistant point $ s \u2032 k : = s \u2032 ( k )$ is obtained. The frequency $ f \u2032 C$ is the averaged frequency during $ 0 \u2009 \u2a7d \u2009 t \u2009 \u2a7d \u2009 T$. Thus, this analysis is sensitive to short-term drift with a frequency of $ f \u2009 \u2a7e \u2009 1 / T$ but is not sensitive to long-term drift.

*s*and $ s \u2032 k$, which is expressed as follows:

_{k}*t*-axis twice per cycle, one can obtain ZCF values at a repetition rate of $ f Z = 2 f C$. In other words, the bandwidth of

*j*(

*t*) reconstructed from $ \Delta s k$ becomes $ f \u2009 \u2a7d \u2009\u2009 f Z / 2 = f C$. This is expected because jitter resembles the frequency modulation in which it is impossible to transmit a frequency higher than the carrier wave. If one observes only the rising or falling ZCPs, the bandwidth of

*j*(

*t*) becomes restricted to $ f \u2009 \u2a7d \u2009\u2009 f C / 2$, which is insufficient to perfectly reconstruct

*j*(

*t*).

### E. Single recorder setup (SRS) and DRS

*k*. Consequently, $ r \u2032 k$ in Eq. (25) can be replaced by $ s \u2032 k$ then, we obtain the following:

The left-hand side of these equations can be obtained using experimental data. Using Eqs. (27)–(29), we can evaluate noise in the player ( $ \sigma n 2$) separately from that in the recorders ( $ \sigma a 2$ and $ \sigma b 2$). Equation (30) can be used to verify the calculations. The experimental results are presented in Sec. IV B.

### F. Separating jitter from PI noise

Finally, we consider the case shown in Fig. 6(c). This setup enables us to separate jitter from PI noise for the player. The setup differs from that of Fig. 6(b), where the L and R signals of the player are bundled together. The RMS of ZCFs for the player, represented as $ \sigma n 3$, can be obtained as in the DRS.

## III. EXPERIMENTAL PROCEDURE

### A. Audio players and recorders

In our experiment, we used three identical portable audio devices (DR-100MKIII; TASCAM, Japan). These devices offer several advantages: they are unaffected by the quality of the alternating current power supply, which ensures the reproducibility and independence of the measurement, are inexpensive, and are easy to obtain.

One of the three devices (No. 1) was used as a player, and the others (Nos. 2 and 3) were used as recorders. Figures 7(a) and 7(b), corresponds to SRS and DRS, respectively, as described in Sec. II E. Figure 7(c) shows the setup required to separate jitter from PI noise as described in Sec. II F.

The device settings for the recorders are summarized in Table I. The recording levels for the three setups are adjusted to be equal by inserting a matching resistor as shown in Figs. 7(a) and 7(c). This is necessary to prevent level changes in recordings that affect PI noise.

FILE FORMAT | WAV24 |

SAMPLING RATE | 192kHz |

FILE TYPE | STEREO |

XRI | OFF |

DUAL REC | OFF |

SOURCE | EXT LINE |

A/D FILTER | FIR1 |

DUAL ADC | ON |

LOW CUT | OFF |

RECORDING LEVEL | +3dB |

FILE FORMAT | WAV24 |

SAMPLING RATE | 192kHz |

FILE TYPE | STEREO |

XRI | OFF |

DUAL REC | OFF |

SOURCE | EXT LINE |

A/D FILTER | FIR1 |

DUAL ADC | ON |

LOW CUT | OFF |

RECORDING LEVEL | +3dB |

### B. How to synchronize different waveforms

Details of the playback file are described in Sec. II B. The length of the main part is $ N main f P \u2212 1 \u2248 30 \u2009 s$. The lengths of the fade-in and fade-out parts are both $ N F f P \u2212 1 \u2248 5 \u2009 s$. Therefore, there are $ ( N main + 2 N F ) / 4 = 480 \u2009 000 \u2009 cycles$ of sinusoidal waves in the playback signal, and the same is true for the recorded waveform. In our improved TDA, the analysis program counts the number of cycles in the two sinusoidal waves from two recorders and assigns a common zero-crossing index. This characteristic is of importance in Sec. IV B.

## IV. RESULTS OF MEASUREMENTS

### A. SRS

Using the SRS [Fig. 7(a)], we played back and recorded the sinusoidal wave of $ f C = 12 \u2009 kHz$. To eliminate low-frequency noise that does not originate from the clock in the player, the recorded waveform was processed in a limited bandwidth range of $ f C \u2212 B w \u2009 \u2a7d \u2009 f \u2009 \u2a7d \u2009\u2009 f C + B w$. Consequently, we analyzed jitter in the bandwidth of $ 1 / T \u2009 \u2a7d \u2009\u2009 f \u2009 \u2a7d \u2009 B w$. In this analysis, $ B W$ was set to 6 kHz. Then, the ZCF was obtained using matlab code. Figure 8(a) shows the obtained ZCF, $ \Delta s 1 , \u2009 \Delta s 2 , \u2009 \u2026$, and $ \Delta s M$. Figure 8(b) shows the distribution of the obtained ZCF, which resembles a Gaussian curve. As expressed in Eq. (22), this ZCF includes the effects of jitter and PI noise from both the player and recorder. The RMS of ZCF is $ { ( \sigma n 1 ) 2 + ( \sigma a 1 ) 2} 1 / 2 = 55.3 \u2009 ps$.

### B. DRS

*E*

_{1},

*E*

_{2},

*E*

_{3}, and

*E*

_{4}are the standard deviations calculated from $ { \Delta s 1 , \u2026 , \Delta s M} ,$ $ { \Delta r 1 , \u2026 , \Delta r M} ,$ $ { \Delta s 1 \u2212 \Delta r 1 , \u2026 , \Delta s M \u2212 \Delta r M}$, and $ { \Delta s 1 + \Delta r 1 , \u2026 , \Delta s M + \Delta r M}$, respectively. The values of

*E*

_{1},

*E*

_{2},

*E*

_{3}, and

*E*

_{4}are obtained as $ E 1 = 56.0 \u2009 ps , \u2009 E 2 = 56.1 \u2009 ps , \u2009 E 3 = 50.6 \u2009 ps$, and $ E 4 = 100.0 \u2009 ps$, respectively. Using Eqs. (27), (28), and (29), we obtain the following RMS values of ZCFs for the player:

These values satisfy Eq. (30).

### C. Jitter and PI noise of player

## V. DISCUSSIONS

### A. Detection limit of jitter

^{1–5}This is partly due to the recent improvement in the performance of ADCs. We measured waveforms at 192 kHz and 24 bit, whereas 16-bit DACs of 44.1 or 48 kHz were used in previous studies.

^{5}The detection limit of the proposed method exists due to quantization noise, which is represented as $ j LSB$ and can be obtained by solving the following equation:

### B. Phase dependence analysis

We now consider the phase dependence of the total playback noise, i.e., $ n total ( t )$. One might expect that phase dependence analysis enables the separation of jitter, AM, and PI noise; unfortunately, this approach is not promising, as shown below.

*A*and

*B*. For this purpose, we express time

*t*with phase

*θ*and the number of cycles

*m*as follows:

*m*, represented as $ m max$, is set to

*θ*is restricted to

*j*(

*t*), $ A M ( t )$, and $ n PI ( t )$. As a result, the playback noise is expressed as

*θ*. Therefore, the phase dependence of $ V { n total ( \theta , m )}$ can always be expressed by two parameters

*A*and

*B*provided the assumptions adopted above are valid.

Consequently, the following behavior can be confirmed: (i) when jitter and AM are not negligible, the offset *B* is not equal to PI noise; (ii) when jitter and AM are comparable, amplitude *A* vanishes; (iii) when PI noise is negligible, one can obtain jitter and AM by calculating $ B \xb1 A$; (iv) when PI noise is not negligible, one cannot obtain jitter, AM, and PI noise from *A* and *B*. Behavior (iv) indicates that further considerations are necessary to separate $ V { j ( \theta , m )}$ from *B*. The procedure designed for this purpose is explained in Secs. II F and IV C.

### C. Comparison with CSM

As noted in the introduction, the DRS is similar to the setup of CSM.^{8} CSM can be regarded as a combination of FDA and DRS, whereas the proposed method is a combination of ZCA and DRS. For CSM, noise from two instruments is reduced by averaging, and the cross-spectrum attains the power spectrum of the device under test.

Commercial products based on CSM are designed to evaluate clock generators with a greater frequency than 1 MHz.^{16,17} This is primarily because frequency conversion in the audio frequency range is technically challenging. Hence, assessing audio signal with CSM has not been performed so far. The proposed method, a combination of ZCA and DRS, can access audio signal and appears to be feasible as a substitution for CSM.

### D. Jitter and PI noise of recorder

*E*

_{5},

*E*

_{6},

*E*

_{7}, and

*E*

_{8}are the standard deviations calculated from $ { \Delta s 1 ( L ) , \u2026 , \Delta s M ( L )} ,$ $ { \Delta s 1 ( R ) , \u2026 , \Delta s M ( R )} ,$ $ { \Delta s 1 ( L ) \u2212 \Delta s 1 ( R ) , \u2026 , \Delta s M ( L ) \u2212 \Delta s M ( R )}$, and $ { \Delta s 1 ( L ) + \Delta s 1 ( R ) , \u2026 , \Delta s M ( L ) + \Delta s M ( R )}$, respectively. Using the experimental data generated herein,

*E*

_{5},

*E*

_{6},

*E*

_{7}, and

*E*

_{8}are obtained as 63.7, 63.1, 61.9, and $ 110.6 \u2009 ps$, respectively. Consequently, we obtain

## VI. SUMMARY AND OUTLOOK

Herein, we proposed an efficient and powerful method for highly accurate jitter measurements. This method is based on two key elements: ZCA and DRS. The ZCA enables us to determine the zero-crossing times of the voltage signals in the recorders (*s _{k}* and

*r*) and those of the pure sinusoidal waves ( $ s \u2032 k$ and $ r \u2032 k$) by analyzing the recorded waveforms ( $ x [ i ]$ and $ y [ i ]$). Their respective differences, “ZCFs ( $ \Delta s k$ and $ \Delta r k$),” contain information about both player noise ( $ \sigma n 2$) and recorder noises ( $ \sigma a 2$ and $ \sigma b 2$). If one measures ZCFs with a DRS, it is possible to eliminate recorder noise from ZCFs by calculating positive and negative correlations between ZCFs ( $ V { \Delta s k + \Delta r k}$ and $ V { \Delta s k \u2212 \Delta r k}$). As a result, one can independently determine player noise. The player noise ( $ \sigma n 2$) results from the jitter ( $ dev { j ( s \u2032 k )}$) and PI noise ( $ dev { a PI ( s \u2032 k )}$). To separate them, some considerations are required. An example of such a procedure is to measure player noise when L and R outputs are bundled together ( $ \sigma n 3$).

_{k}We demonstrated the proposed method using commercial audio equipment. The RMS values of jitter and PI noise were determined as $ dev { j ( s \u2032 k )} \u2248 20 \u2009 ps$ and $ dev { n PI ( s \u2032 k )} / ( \omega A 0 ) \u2248 40 \u2009 ps$, respectively. These results show that the proposed method can evaluate values of jitter that are smaller than PI noise. The high accuracy of the proposed method entails that it will be powerful means by which to develop ultrahigh performance devices in the future. Using such devices, more definite and quantitative study of real-life sounds, such as music, becomes possible. This will form the basis of future investigations.