This study proposes a method for analyzing sampling jitter in audio equipment based on the time-domain analysis, considering the temporal fluctuations of the zero-crossing points in the recorded sinusoidal waves to characterize the jitter. This method enabled the separate evaluation of jitter in an audio player from those in audio recorders when the same playback signal is simultaneously fed into two audio recorders. The experiments were conducted using commercially available portable devices with a maximum sampling rate of 192 000 samples per second. The results demonstrated that jitter values on the order of a few tens of picoseconds can be identified in an audio player. Moreover, the proposed method enabled the separation of jitter from phase-independent noise utilizing the left and right channels of the audio equipment. As such, this method is applicable for performance evaluation of audio equipment, signal generators, and clock sources.
I. INTRODUCTION
Sampling jitter in audio equipment is an error in the sampling instants from the ideal timing, i.e., where is the sampling frequency of digital-to-analog converter (DAC) and analog-to-digital converter (ADC). Sampling jitter causes the sampling instants to be changed to . Hence, sampling jitter affects the performance of audio equipment. The sampling jitter has conventionally been analyzed in the frequency domain.1–4 In this method, one plays back a sinusoidal wave whose frequency is , where is the sampling frequency of the audio player, records it, and then examines the frequency response using a window function with small sidelobe levels such as a Blackman Harris window. In addition to frequency-domain analysis (FDA), time domain analysis (TDA) of sampling jitter has been conducted.5 The Hilbert transform has been employed to obtain the real jitter waveform . The advantage of TDA is that one can separately extract jitter and amplitude modulation (AM) from a recorded waveform, which is not possible with FDA.
In this study, we propose an efficient and powerful method to characterize sampling jitter in audio equipment. The proposed method comprises two key elements. The first is an improved TDA termed zero-crossing analysis (ZCA). To apply this method, the zero-crossing points (ZCPs) of a recorded waveform are analyzed, following which the ZCPs of an ideal sinusoidal wave are calculated. Time differences in ZCPs between the recorded waveform and an ideal sinusoidal wave contain the jitter information. We term these time differences “zero-crossing fluctuations (ZCFs).” The ZCA enables us to extract jitter information from a recorded waveform even when the input signal contains both jitter and AM. The second key element is the simultaneous recording of the same playback signal with two audio recorders to generate two independent waveforms. We term this “double recorder setup (DRS).” Because ZCA preserves absolute time information, we can exactly compare and calculate the positive and negative correlations of ZCFs between the two generated waveforms. Based on the addition rule of probability, the sampling jitter of the player and that of the recorders can be individually evaluated.
Note that the proposed method requires neither an optional output clock signal synchronized with the recorders' internal clock nor an external clock generator that is more precise than the internal clock. Thus, the proposed method is possible using low-cost recorders and is feasible at an end-user level. The proposed method can also be applied to high-frequency phase noise and jitter measurements. Replacing the two recorders with two digital oscilloscopes with higher sampling rates, we can utilize the proposed method for evaluating the performance of various signal and clock generators that output high-frequency sinusoidal waves.
The method is somewhat similar to the reciprocal calibration of microphones, which has been used since the 1940s;6,7 however, it does not require the bidirectional use of devices. The DRS is similar to the setup of the cross-spectrum method (CSM) for phase noise measurement.8 In CSM, repeating FDA with double instruments reduces the influence of the instruments to , where is the number of measurements. The proposed method is the so-called TDA version of CSM. The influence of the instruments is canceled by using DRS.
In this study, we focus on the performance of audio equipment; however, it contributes to the field of human audibility.9–11 Previous jitter studies9 demonstrated that the threshold of perceptual detection of random jitter in music signals is large, but the original jitter, i.e., the one that is already existing before adding extra jitter, has not been controlled at that time. Researchers can quickly select instruments with minimal jitter using the method proposed herein. It provides an opportunity to examine how the detection threshold is reduced after the testees are well-trained using audio players with lower levels of jitter and using music signals with which slight artificial jitter is compounded. Moreover, the proposed method helps to diagnose whether the player is appropriately operating in all audibility studies.
The remainder of this study is organized as follows. Section II describes the principles on which the proposed method is based. Section III presents the experimental procedure. Section IV presents our results. In Sec. V, multiple perspectives are discussed, and Sec. VI provides a summary and outlook. We performed numerical calculations in supplementary material, in which several instructive results are presented. A preprint of this study has been posted on a preprint server.12
II. PRINCIPLES
A. Classification of noise in a playback signal and a recorded waveform
Schematic of sinusoidal signals modulated by (a) jitter, (b) AM, and (c) PI noise. The horizontal axis represents the phase of a pure sinusoidal wave. Time is the first zero-crossing time of the pure sinusoidal wave, and m is the number of cycles. The filled bands represent the range of fluctuation during the repetition.
Schematic of sinusoidal signals modulated by (a) jitter, (b) AM, and (c) PI noise. The horizontal axis represents the phase of a pure sinusoidal wave. Time is the first zero-crossing time of the pure sinusoidal wave, and m is the number of cycles. The filled bands represent the range of fluctuation during the repetition.
The aforementioned noise classification is also appropriate in the recording process. Although a data array is a set of discrete variables rather than a continuous function, we refer to it as a “waveform” in the following. When one records a pure sinusoidal playback signal, the recorded waveform is not equal to the sampled data points of a pure sinusoidal wave because it contains (i) jitter, (ii) AM, and (iii) PI noise. In real digital audio recorders, jitter originates primarily from the internal clock module, whereas AM originates from ADC units. The driver amplifier for ADC contributes to PI noise.
As demonstrated later in this study, PI noise becomes comparable to jitter in the case of recent audio equipment with small jitter values, typically less than . Jitter and PI noises can be separated using the left and right channels of the player and recorder.
B. Modeling of digital audio player
Herein, we represent how jitter, AM, and PI noises, introduced in Sec. II A, can be realized in a real playback process. Hence, we introduce a model of a digital audio player. A schematic of a single-channel digital audio player is shown in Fig. 2(a). The parameters are fixed to be the same as those of the experimental conditions described in Sec. III and can be arbitrarily selected depending on the experimental conditions. The DAC in the player, assumed to be ideal and noise-free, performs the conversion at bits with a sampling rate . In this study, and are set to and , respectively.
(a) Model diagram of single-channel digital audio player. (b) Playback waveform; dots have been reduced to improve visibility. (c) Relationships among DAC output v(t) (dotted line), signal after LPF (solid line), and playback waveform vi (black square). (d) Model diagram demonstrating a dual-channel digital audio player.
(a) Model diagram of single-channel digital audio player. (b) Playback waveform; dots have been reduced to improve visibility. (c) Relationships among DAC output v(t) (dotted line), signal after LPF (solid line), and playback waveform vi (black square). (d) Model diagram demonstrating a dual-channel digital audio player.
The DAC output voltage, represented as v(t), is a square wave. For , v(t) is expressed as . The time when playback is started is . In this section, we set .
A low-pass filter (LPF) is connected after the DAC. The cut-off frequency of the LPF is assumed to be . Because the square wave v(t) is smoothed by the LPF, the output voltage after the LPF becomes a pure sinusoidal wave expressed as Eq. (1). The frequency of the pure sinusoidal wave becomes since the main part of the playback waveform is expressed as Eq. (14). The relationship between v(t) and is depicted in Fig. 2(c). The dotted and solid lines represent v(t) and , respectively. For comparison with v(t), the playback waveform is plotted as black squares at the position and . As mentioned above, the components of in v(t) are perfectly attenuated by the LPF, and a pure sinusoidal wave of frequency remains.
The playback signal c(t) is obtained by adding to , as shown in Eqs. (9) and (10). In real audio player, jitter is primarily caused by fluctuation of . However, in this model, DAC and LPF are ideal, and jitter noise is added after the LPF. The output is made by a buffer amplifier with a direct current (DC)-blocking capacitor.
C. Modeling of digital audio recorder
(a) Model diagram of a single-channel digital audio recorder. (b) Relationship between signal before LPF x(t) (solid line) and recorded waveform xi (white square). (c) Model diagram of a dual-channel digital audio recorder. The method to utilize a dual-channel recorder as a single-channel recorder is depicted.
(a) Model diagram of a single-channel digital audio recorder. (b) Relationship between signal before LPF x(t) (solid line) and recorded waveform xi (white square). (c) Model diagram of a dual-channel digital audio recorder. The method to utilize a dual-channel recorder as a single-channel recorder is depicted.
The relationship between x(t) and is depicted in Fig. 3(b). The solid line represents x(t). The recorded waveform is plotted as white squares at and . The range of the horizontal axis is equal to that of Fig. 2(c). The fluctuation of the solid line in this figure shows artificial random noise. Because the ratio between the sampling rate of the ADC and the frequency is , 16 sampling points are present for each wavelength.13
We demonstrate the model diagram of a dual-channel digital audio recorder in Fig. 3(c). The recorder comprises two single-channel digital audio recorders; the jitter inputs of the single-channel recorders are assumed to be equipotential. To obtain the experimental results described in Secs. IV A, IV B, and IV C, we used a dual-channel recorder as a single-channel recorder by contacting two analog inputs and averaging their two waveforms and . We term this setup a “pseudo single-channel recorder.” As an exception, we analyzed waveforms and separately to estimate the jitter from the digital audio recorder. See Sec. V D for details.
D. ZCA
Figure 4(a) shows the process to obtain . The range of the horizontal axis is equal to that of Fig. 3(b). The white squares are the recorded waveform , and the black circles are the interpolated points by the FFT method. Solid lines represent . Figure 4(b) presents a magnified view of Fig. 4(a) around the first and second ZCPs. For the ease of viewing, the FFT interpolation was performed using an oversampling factor in Figs. 4(a), 4(b). One can see that the points are interpolated between two recorded data points. As shown in Fig. 4(b), the zero-crossing times are labeled as , where M denotes the number of ZCPs when . The obtained sequence is not equally spaced because of jitter.
(a) The recorded waveform xi (white square), points obtained by FFT interpolation (black circle), and the continuous function (solid line). For ease of viewing, the FFT interpolation was performed using an oversampling factor of rather than . (b) Magnified graph for .
(a) The recorded waveform xi (white square), points obtained by FFT interpolation (black circle), and the continuous function (solid line). For ease of viewing, the FFT interpolation was performed using an oversampling factor of rather than . (b) Magnified graph for .
The zero-crossing time in ms versus zero-crossing index k. sk denotes the zero-crossing time of the recorded data, and denotes that of the corresponding pure sinusoidal wave. Consecutive times are equally spaced, whereas sk are not. The difference between sk and is extremely magnified for the ease of viewing.
The zero-crossing time in ms versus zero-crossing index k. sk denotes the zero-crossing time of the recorded data, and denotes that of the corresponding pure sinusoidal wave. Consecutive times are equally spaced, whereas sk are not. The difference between sk and is extremely magnified for the ease of viewing.
E. Single recorder setup (SRS) and DRS
The left-hand side of these equations can be obtained using experimental data. Using Eqs. (27)–(29), we can evaluate noise in the player ( ) separately from that in the recorders ( and ). Equation (30) can be used to verify the calculations. The experimental results are presented in Sec. IV B.
F. Separating jitter from PI noise
Finally, we consider the case shown in Fig. 6(c). This setup enables us to separate jitter from PI noise for the player. The setup differs from that of Fig. 6(b), where the L and R signals of the player are bundled together. The RMS of ZCFs for the player, represented as , can be obtained as in the DRS.
III. EXPERIMENTAL PROCEDURE
A. Audio players and recorders
In our experiment, we used three identical portable audio devices (DR-100MKIII; TASCAM, Japan). These devices offer several advantages: they are unaffected by the quality of the alternating current power supply, which ensures the reproducibility and independence of the measurement, are inexpensive, and are easy to obtain.
One of the three devices (No. 1) was used as a player, and the others (Nos. 2 and 3) were used as recorders. Figures 7(a) and 7(b), corresponds to SRS and DRS, respectively, as described in Sec. II E. Figure 7(c) shows the setup required to separate jitter from PI noise as described in Sec. II F.
(a) SRS; the output, the left channel of device No. 1, was fed simultaneously to the left and right channels of device No. 2. A matching resistor of was inserted. (b) DRS; the output, the left channel of device No. 1, was fed simultaneously to the left and right channels of device No. 2 and No. 3. (c) The output is the sum of the left and right channels of device No. 1 and is fed simultaneously to the left and right channels of device No. 2 and No. 3.
(a) SRS; the output, the left channel of device No. 1, was fed simultaneously to the left and right channels of device No. 2. A matching resistor of was inserted. (b) DRS; the output, the left channel of device No. 1, was fed simultaneously to the left and right channels of device No. 2 and No. 3. (c) The output is the sum of the left and right channels of device No. 1 and is fed simultaneously to the left and right channels of device No. 2 and No. 3.
The device settings for the recorders are summarized in Table I. The recording levels for the three setups are adjusted to be equal by inserting a matching resistor as shown in Figs. 7(a) and 7(c). This is necessary to prevent level changes in recordings that affect PI noise.
Device settings for the recorders.
FILE FORMAT | WAV24 |
SAMPLING RATE | 192kHz |
FILE TYPE | STEREO |
XRI | OFF |
DUAL REC | OFF |
SOURCE | EXT LINE |
A/D FILTER | FIR1 |
DUAL ADC | ON |
LOW CUT | OFF |
RECORDING LEVEL | +3dB |
FILE FORMAT | WAV24 |
SAMPLING RATE | 192kHz |
FILE TYPE | STEREO |
XRI | OFF |
DUAL REC | OFF |
SOURCE | EXT LINE |
A/D FILTER | FIR1 |
DUAL ADC | ON |
LOW CUT | OFF |
RECORDING LEVEL | +3dB |
B. How to synchronize different waveforms
Details of the playback file are described in Sec. II B. The length of the main part is . The lengths of the fade-in and fade-out parts are both . Therefore, there are of sinusoidal waves in the playback signal, and the same is true for the recorded waveform. In our improved TDA, the analysis program counts the number of cycles in the two sinusoidal waves from two recorders and assigns a common zero-crossing index. This characteristic is of importance in Sec. IV B.
IV. RESULTS OF MEASUREMENTS
A. SRS
Using the SRS [Fig. 7(a)], we played back and recorded the sinusoidal wave of . To eliminate low-frequency noise that does not originate from the clock in the player, the recorded waveform was processed in a limited bandwidth range of . Consequently, we analyzed jitter in the bandwidth of . In this analysis, was set to 6 kHz. Then, the ZCF was obtained using matlab code. Figure 8(a) shows the obtained ZCF, , and . Figure 8(b) shows the distribution of the obtained ZCF, which resembles a Gaussian curve. As expressed in Eq. (22), this ZCF includes the effects of jitter and PI noise from both the player and recorder. The RMS of ZCF is .
(a) ZCF obtained for 1 s. There are ZCPs. (b) Histogram of the obtained ZCF.
B. DRS
Histograms of and . The RMS values of the former and latter were 50.6 and , respectively.
Histograms of and . The RMS values of the former and latter were 50.6 and , respectively.
These values satisfy Eq. (30).
C. Jitter and PI noise of player
V. DISCUSSIONS
A. Detection limit of jitter
B. Phase dependence analysis
We now consider the phase dependence of the total playback noise, i.e., . One might expect that phase dependence analysis enables the separation of jitter, AM, and PI noise; unfortunately, this approach is not promising, as shown below.
Consequently, the following behavior can be confirmed: (i) when jitter and AM are not negligible, the offset B is not equal to PI noise; (ii) when jitter and AM are comparable, amplitude A vanishes; (iii) when PI noise is negligible, one can obtain jitter and AM by calculating ; (iv) when PI noise is not negligible, one cannot obtain jitter, AM, and PI noise from A and B. Behavior (iv) indicates that further considerations are necessary to separate from B. The procedure designed for this purpose is explained in Secs. II F and IV C.
C. Comparison with CSM
As noted in the introduction, the DRS is similar to the setup of CSM.8 CSM can be regarded as a combination of FDA and DRS, whereas the proposed method is a combination of ZCA and DRS. For CSM, noise from two instruments is reduced by averaging, and the cross-spectrum attains the power spectrum of the device under test.
Commercial products based on CSM are designed to evaluate clock generators with a greater frequency than 1 MHz.16,17 This is primarily because frequency conversion in the audio frequency range is technically challenging. Hence, assessing audio signal with CSM has not been performed so far. The proposed method, a combination of ZCA and DRS, can access audio signal and appears to be feasible as a substitution for CSM.
D. Jitter and PI noise of recorder
VI. SUMMARY AND OUTLOOK
Herein, we proposed an efficient and powerful method for highly accurate jitter measurements. This method is based on two key elements: ZCA and DRS. The ZCA enables us to determine the zero-crossing times of the voltage signals in the recorders (sk and rk) and those of the pure sinusoidal waves ( and ) by analyzing the recorded waveforms ( and ). Their respective differences, “ZCFs ( and ),” contain information about both player noise ( ) and recorder noises ( and ). If one measures ZCFs with a DRS, it is possible to eliminate recorder noise from ZCFs by calculating positive and negative correlations between ZCFs ( and ). As a result, one can independently determine player noise. The player noise ( ) results from the jitter ( ) and PI noise ( ). To separate them, some considerations are required. An example of such a procedure is to measure player noise when L and R outputs are bundled together ( ).
We demonstrated the proposed method using commercial audio equipment. The RMS values of jitter and PI noise were determined as and , respectively. These results show that the proposed method can evaluate values of jitter that are smaller than PI noise. The high accuracy of the proposed method entails that it will be powerful means by which to develop ultrahigh performance devices in the future. Using such devices, more definite and quantitative study of real-life sounds, such as music, becomes possible. This will form the basis of future investigations.