We present an open-source package that helps the user to perform a basic statistical analysis of a given turbulence dataset. We believe that this package is useful to the entire turbulence community. It can be used to estimate statistical quantities of turbulence such as the spectral density, turbulence intensity, integral length scale, Taylor microscale, Kolmogorov scale, and dissipation rate as well as velocity increments, structure functions, and their scaling exponents. Different well-known methods available in the literature were selected so that they can be compared. This package also performs an advanced analysis that includes the scale-dependent statistical description of turbulent cascade using the Fokker–Planck equation, consequently leading to the assessment of integral fluctuation theorems. As a precondition for the stochastic process approach, Markovian properties of the turbulent cascade in scale are tested. This is utilized to estimate the drift and diffusion coefficients of the Fokker–Planck equation. The total entropy production for each independent cascade trajectory can be determined using a Fokker–Planck equation. Estimating total entropy production admits a rigorous law of non-equilibrium stochastic thermodynamics, namely, the integral fluctuation theorem, which must be valid if Markov properties hold and the Fokker–Planck equation is correct. This approach to the turbulent cascade process has the potential for a new way to link the statistical description of turbulence, non-equilibrium stochastic thermodynamics, and local turbulent flow structures. Finally, we emphasize that the presented package can be used for the analysis of other data with turbulent-like complexity as well.
I. INTRODUCTION
The phenomenon of turbulent flows is ubiquitous and has been studied by various methods and concepts over a long period of time. One of the central models for describing turbulent flows is the phenomenologically inspired energy cascade model proposed by Richardson and Lynch.1 In this model, turbulence is understood as the evolution of turbulent structures on different spatial or temporal scales. The central assumption of this cascade model is that kinetic energy is transferred through all scales of the inertial range by the repeated random breakup of eddies into increasingly smaller eddies by a cascade-like process. Dissipation of kinetic energy into heat will not take place in this inertial range of the cascade but on smaller scales in the so-called dissipation range. Based on this assumption, in the famous dimensional analysis initiated by the work of Kolmogorov2–4 and Obhukov5,6 in 1941 (known as the K41 theory), the cascade model by Richardson is extended by a theory for fully developed, isotropic, and homogeneous turbulence (HIT). This kind of turbulence, which is assumed to be stationary, represents an idealized case that should have universal features. Nonetheless, the intermittency phenomenon, which manifests itself by deviations from such a self-similar cascade process at small scales, is considered as one of the key signatures of turbulence and is still not entirely understood.7 Kolmogorov himself8 and Obukhov9 refined the K41 theory by allowing fluctuations of the transferred energy leading to a lognormal distribution for the local energy dissipation rate and, consequently, to intermittent, i.e., non-Gaussian velocity fluctuations. This theory is known as K62 theory. An overview of various intermittency correction models along with fractal and multifractal models can be found in Refs. 7 and 10.
To characterize various types of turbulent flows such as atmospheric flows, jet flows, boundary layer flows, grid turbulent flows, wake flows, and von Kármán flows, in a statistical way, it is beneficial to have a common data post-processing tool in turbulence. Hence, this open-source package is created to perform a basic statistical analysis of turbulence data (here, we focus on one-dimensional velocity time series). We are not aware of such a comprehensive compilation of standard analysis tools commonly used in turbulence research in a single user-friendly package. Along with the basic statistical analysis of turbulence data, the application to Fokker–Planck equations (FPEs) and integral fluctuation theorems (IFTs), which provide a comprehensive statistical description in terms of the complexity of turbulent velocity time series, is implemented into this package. This analysis is very complex and time-consuming by means of developing a post-processing algorithm. With this package, for the first time, the application of this new method in practice is possible for everyone easily and quickly.
Altogether, this open source package enhances the practicability and availability of both the standard analyses already established in turbulence research, as well as the extended analysis via Fokker–Planck equations and integral fluctuation theorems. We believe that this package should be shared with the turbulence community as it is of great interest, especially to young scientists and those getting acquainted with this topic. It possesses a high reuse potential and considerable efforts are put into estimating turbulence quantities by different alternative methods. The latter package property, thus, allows for a more consistent and accurate assessment of these turbulence quantities. Furthermore, the open source package is organized in such a way that other researchers can add further methods to our collection as well.
The open source package can also be used by researchers outside of the field of turbulence. Using this package may contribute to new insights into the complex characteristics of scale-dependent processes (in space or time), which occur in many (multi-) fractal systems, ranging from physics to meteorology, biology systems, finance, economy, surface science and medicine. The package can be applied to these types of data (one-dimensional time series), but it is important to keep in mind that a variety of turbulence-specific assumptions like the theory of HIT or the Taylor hypothesis of frozen turbulence are assumed as the analysis proceeds. Hence, the implications of turbulent specific aspects for other systems has to be evaluated critically. A central feature of the open source package is the validation of a Markov property of velocity fluctuations in scale (see details in Sec. III A). It has turned out that the method of analysis based on Markov processes can also be successfully applied to the characterization of rough surfaces,11–13 rogue waves,14–16 urban systems,17 and financial time series18–20 to name but a few. Such spatially disordered structures can be analyzed as scale-dependent stochastic processes (see also Refs. 21 and 22) and are related to a non-equilibrium thermodynamical description.23 It has to be verified whether the investigation by non-equilibrium quantities such as scale-dependent entropies24 also contributes to the characterization of extreme events in the data the user intends to analyze.
The central part of this paper is a detailed discussion of the procedure of how to use this package. This paper only covers the discussion of turbulent flows satisfying the assumption of homogeneous, isotropic turbulence and the Taylor hypothesis of frozen turbulence. In principle, the tools of this package can be applied to turbulence data in other configurations and turbulent flows with non-ideal HIT conditions, as shown for non-fully developed turbulence, which may lead to strong deviations of the linearity of the drift coefficient (the deterministic force) with different slopes for positive and negative increments as found in Ref. 25. Furthermore, turbulence with strong shear is studied in Refs. 26 and 27 for the turbulent wake. In this investigation, a pronounced structural changes in the drift coefficient is observed as one passes from not fully developed turbulence (near field) into fully developed turbulence (far field). A careful interpretation of the results is advised here as well.
A general overview of all functions to carry out various analyses is given as part of this discussion in Secs. II–V. All subroutines that can be accessed in this package and the setting of the relevant parameters are called in a logical order using the main function. Overall, the analysis consists of four parts: Part I: Standard turbulence analysis, Part II: Markov analysis, Part III: Entropy analysis, and Part IV: Consistency check. A question dialog box is displayed where the user can select the analyses to be performed (see Fig. 1).
To use this package, it is beneficial if the user has an adequate knowledge about turbulent flows. Nevertheless, for the fundamental understanding of turbulent flows, we recommend standard books.10,28 For a quick and simple overview of the theoretical background associated with the turbulent cascade process, Fokker–Planck equation (FPE), integral fluctuation theorem (IFT), and its interpretation, we recommend the following review article by Peinke, Tabar, and Wächter.22
Note that all subroutines are also accessible from the command line and can be included by the user in other applications. Additionally, all abbreviations used in this document are listed at the end in the list of nomenclature/abbreviations. Also, it should be kept in mind that many of the equations discussed in this paper refer to the theory of fully developed, isotropic, and homogeneous turbulence. Thus, it is advised to use the package cautiously, as the assumption in the experimental data may not be fulfilled and can lead to incorrect/misleading results. In general, velocity is a vectorial quantity (with respect to its coordinate system), and it is dependent on the physical location in space. In this paper, the streamwise component of velocity is considered and features are discussed for this single velocity component. Other components may be characterized similarly. First promising results for more component datasets are shown in Refs. 29–31.
A. Quality control
This package has been used continuously within our lab since 2018. It has also been successfully used by a large number of students (in the context of practical exercises which were part of the fluid dynamics lecture at the University of Oldenburg) to ensure stability across different machines and operating systems. Furthermore, this package has proved its value in a number of research publications.22,27,32–37 The package itself, or parts of it, as well as results obtained by using the package for analyzing turbulent data, have also been presented at several international conferences.
B. Implementation and architecture
This software is implemented in MATLAB® (2022a), which is a high-level, matrix-based programming language designed specifically for engineers and scientists to analyze data, develop algorithms, and create models. Furthermore, a Python version of this package is currently under development and “Part I: Standard turbulence analysis,” will be made available on GitHub soon. The package is available as free software, under the GNU General Public License (GPL) version 3 [source code and standalone applications (64-bit) for Windows, macOS and Linux], and a typical dataset can be downloaded from the repository on GitHub or Matlab File Exchange Server to replicate all the results presented in this article. Support is available at github.com/andre-fuchs-uni-oldenburg/OPEN_FPE_IFT, where questions can be posted. The following toolboxes should be included in your MATLAB license.
Curve Fitting Toolbox
Optimization Toolbox
Parallel Computing Toolbox
Signal Processing Toolbox
Statistics and Machine Learning Toolbox
As these MATLAB toolboxes are essential for the application of the package, the compatibility to Octave cannot be provided. Nonetheless, in order to enhance the accessibility, standalone applications (64-bit) for Windows, macOS and Linux are also created to run the MATLAB code on target machines that do not have a MATLAB license.
C. Typical dataset and system requirements
To demonstrate the application of this program, an typical dataset obtained in a turbulent air jet experiment by Renner et al.38 is used within this document. The local velocity was sampled by a time-resolved single hot-wire measurement (Dantec 55P01) with a spatial resolution of 1.25 mm and a maximal time resolution of 30 kHz. This dataset is composed of ten independent measured datasets, and the data acquisition comprises samples in total with a sampling frequency of 8 kHz. The latter exemplary dataset is also included in the package itself. We also provide the generated plots/results allowing the user to verify whether the software is operating correctly or not. Table I lists all parameters that the user must enter during the analysis to reconstruct the results shown in this paper.
Fs . | L . | λ . | . | ΔEM . |
---|---|---|---|---|
8000 Hz | 0.067 m | 0.0066 m | 93 | 22 samples |
Fs . | L . | λ . | . | ΔEM . |
---|---|---|---|---|
8000 Hz | 0.067 m | 0.0066 m | 93 | 22 samples |
The system requirements (memory, disk space, processor, total run time) demanded by the script depend very much on the size of the dataset to be examined, the resolution in relation to the number of bins, and the available number of central processing units (CPUs). The required memory will be allocated to each MATLAB computational engines during parallel computations using multicore CPUs and computer clusters. A typical processing time for this dataset using four physical cores and 16 GB RAM is about 60 min.
II. PART I: STANDARD TURBULENCE ANALYSIS
In this section, various subroutines to perform basic statistical analysis of the turbulence data are presented. This mainly includes the verification of stationarity of datasets, filtering of the signal, estimation of turbulence length scales, and dissipation rate along with the assessment of structure functions from velocity increments.
A. Loading data and variables
In the very first step, the user is asked to load the dataset and to choose the folder path to save the results. Furthermore, the user has to input the sampling frequency of the data and the kinematic viscosity of the fluid.
uiimport is the first command in the program, which asks the user to select interactively the data file that will be used for the analysis (Fig. 2). At this point, it is possible to specify the percentage of the total data that will be used to perform the analysis (for example, the first 20% of the data). This feature is sometimes helpful if one wants to make a first fast check of data and program. Note that this parameter has a significant effect on the overall performance of the script. Based on our experience, several thousand large eddy turnover times, respectively, acquired cascade trajectories (see Sec. IV A), are typically required to perform the analysis successfully.
Save_path opens a dialog box to navigate through local directories to select a folder for saving figures and files.
Save_name generates a pop-up dialog box to enter the name for the analyzed data and figures to be saved.
Fs generates a pop-up dialog box to enter the sampling frequency of the data in Hz.
Kin_vis generates a pop-up dialog box to enter the value of kinematic viscosity ν in of the fluid for which the experimental data has been acquired.
B. Test of stationarity and filtering of the time series
For many statistical quantities to characterize a dataset, it is required that the datasets are stationary. Thus, before performing the basic turbulence analysis of the dataset, the stationarity of the data using different functions and parameters as mentioned below is examined.
plot_stationarity. With this function, the stationarity of the data is verified. For this purpose, the following analysis checks whether the statistical properties do not changeover time (stationary process). Therefore, the data are subdivided into 20 sections and the mean, standard deviation, skewness, and kurtosis are plotted, respectively, in Fig. 3(a). In the title of Fig. 3(a), the number of NaNs (not a number) is printed. The origin of these NaNs may be, for example, due to the measurement apparatus. Regardless of the origin of these, no preprocessing is performed in the package up to this point that can lead to a NaN. If the data to be analyzed contains NaNs a pop-up dialog box is displayed, which allows the user to choose five different methods to fill these entries using fillmissing function of MATLAB. The user must choose one of the following options to fill missing entries: “nearest non-missing value,” “linear interpolation of neighboring, non-missing values,” “piecewise cubic spline interpolation,” or “moving average” and “moving median” using a moving window of a length, which can also be set by the user. In a future version of the package, an additional stochastic interpolation of gaps in the dataset will also be provided.39
plot_pdf. This function plots the probability density function (PDF) of the data with the number of bins specified in the function increment_bin in Fig. 4. It also plots the Gaussian distribution with the same standard deviation and mean value as of the data. In the title, the range of the data (the difference between maximum and minimum values of sample data), the skewness, and flatness of the data are printed.
low_freq generates a pop-up dialog box to select whether the data should be filtered (low-pass filter) or not. If a filter is to be applied, then in the next step the frequency in Hz at which the data will be filtered by a low-pass filter has to be specified (for example 1800 Hz). If the pop-up dialog box is denied, it is set to the value low_freq=Fs/2.
C. Estimation of fundamental turbulence length scales
Length scales are important for any analysis of turbulent data. For scales larger than the “large length scale,” called integral length scale, it is expected that only uncorrelated noise is present in the flow. The “small length scale” determine the end of the cascade where dissipation smooths out all fluctuations. In this context, the Taylor length scale can be considered as the scale where dissipative become noticeable for the first time, whereas the dissipation or Kolmogorov length is where the flow structures become fully dissipative. Thus interesting features of turbulence are expected in the interval between large and small length scale, which is called inertial range. As these scales are important for the data analysis and interpretation, here, we present different well-known estimation methods for each of these scales. It should be noted that this list of estimation methods is far from complete and other package users are strongly encouraged to add further standard methods to it.
length_scales. In this function, the integral length scale L, Taylor length scale λ, Kolmogorov length scale η, mean energy dissipation rate , normalized energy dissipation rate and the local Taylor–Reynolds number are estimated using different methods of calculation. These parameters are used for the further processing of data (solving the Fokker–Planck equation and extracting cascade trajectories). If the fundamental length scales are already known from previous analysis, this calculation can be skipped and the values of integral, Taylor, Kolmogorov length scale in m, and energy dissipation rate in can be entered in the pop-up dialog boxes. This also enables the package user to study the effect of different scale choices on further results of the analysis. The entered length scales will be rounded toward the nearest integer in sample units. Here, the Taylor hypothesis is used, so the link between mean velocity, sampling frequency, and a time interval is used to convert this interval into a length scale (see details in Sec. II C). The proposed value in the pop-up dialog box is the median length scale for all methods.
It should be noted that the following collection of methods for the estimation of fundamental length scales and the dissipation rate is far from being complete. When creating the package, only methods that are well-known to us have been taken into account and the list can be readily extended in future versions of the package. The different methods for estimating these parameters are numbered sequentially. While each method may share the same assumptions and hypotheses, the different methods can lead to inconsistent results. In our experience, this typically happens with experimental data, which may be characterized by some observational noise, as well as when the assumptions made are not perfectly met for these datasets, to which the methods have different sensitivities. Furthermore, for example, the number of methods for determining the Taylor length and dissipation rate differs. At the end of the estimation of the respective length scale, the results are presented in a single figure to compare them with each other. The comparison of the results of different methods allows the user to evaluate the consistency of the methods and adjust the proposed value in the pop-up dialog boxes (median length scale for all methods) accordingly. In an ideal case, all methods for one scale should give the same result, but in reality, different aspects of experimental data (e.g., measurement noise or insufficient number of samples) may change results significantly. In such cases, it is up to the user to find out the most reliable value.
The integral length scale L is estimated by using
-
where r is the scale in meter and is the fluctuating component of streamwise velocity. The cumulative integral gives the asymptotic value at a specific scale r, which is characterized by the integral length scale L.
In the case of Lagrangian tracer particles where trajectories from direct numerical simulation (DNS) are oftentimes not long enough or in the case of non-ideal experimental data, Eq. (13) may not fully converge. As a result, the aforementioned method can lead to large errors in the estimation of the integral length. In the case of a non-monotonic decrease of , the autocorrelation function is
-
integrated up to the first zero-crossing of the autocorrelation function.45
-
integrated up to the first crossing of the autocorrelation function.46
In particular, for experimental data, there may be additional measurement noise present in the data that may cause no zero crossing of the autocorrelation function. In this case, is
- fitted by an one-term exponential function,42,46The fit region corresponds to the range of scales , with re is the scale of the first crossing of the autocorrelation function. This fit range is indicated by the two black, vertical dashed lines in Fig. 8. The integral is calculated by using the coefficients a and b of the exponential fit,
Furthermore, the integral length scale L is estimated via
- the second order structure function,which is linked to the autocorrelation coefficient according to
where can be calculated directly from the second order structure function. At a sufficiently large length scale compared to the energy injection scale of the experiment, the second order structure function tends to the asymptotic value of .47
- The zero crossings proposed by Mora and Obligado.48 To verify that zero crossings are well resolved, the signal has to be filtered with a low-pass filter with a characteristic length ηc. This method consists of estimating the Voronoi tessellation of the 1D zero-crossings positions data set. It is compared with a random Poisson process, which has no correlations between the scales. The method proposes that the integral length scale is equal to the characteristic length ηc for which
Here, σvoro is the standard deviation of the Voronoi cells normalized by their mean value and σRPP is the equivalent value for an random Poisson process, that is equal to . In Fig. 9, the standard deviation (normalized) of the Voronoi cells as a function of the characteristic length ηc is plotted. Finally, the integral length scale is defined as the value of ηc that corresponds to . If we observe that for all values of ηc, the method cannot provide the value of the integral length scale and longer signals are needed (nevertheless the extrapolation of the value remains possible). Note that the characteristic length ηc should not be confused with the Kolmogorov length scale η. To provide a direct link to Ref. 48, here, we used the exact same notation as in the original reference.
The Taylor length scale λ is estimated by using
- a parabolic fit,
to the autocorrelation function, to estimate λ at the origin r = 0 (see Fig. 10). The range of the negative curvature is, therefore, used for the estimation (close to r = 0 the auto-correlation function has an inflection point). Since this method needs a well-resolved auto-correlation function, it strongly depends on the resolution of the sensor. It is also affected by low pass filtering.
Assuming isotropy at small scales and Taylor's frozen turbulence hypothesis the Taylor length scale λ is estimated using the relationThe numerical differentiation in the denominator is approximated by
-
the simple difference quotient. Due to the finite spatial resolution of the measuring sensor used in this typical experimental dataset and measurement noise, this method will yield an incorrect result. In order to correctly compute the derivatives, the spatial resolution must be of the order of the Kolmogorov microscale η (see Ref. 49), which is estimated via Eq. (32).
- The procedure proposed by Aronson and Löfdahl.50 Here the derivative of the velocity is approximated via the second order structure function S2 in the inertial range. In Fig. 11 the asymptotic behavior of
as a function of scale r is plotted. For the extrapolation we use a linear fit. The fit-region is determined for spatial lags that are larger than four times the scale that corresponds to the low-pass filter frequency low_freq. The larger limit of the fit-region must be set by the user.
- Using the dissipation spectrum,
This procedure has been proposed by Hinze.42 The upper limit of the integration is set to the low-pass filter frequency low_freq.
-
From the dissipation spectrum, but here the upper limit of the integration of Eq. (22) is set to infinity (or the largest available wave number k) so that the full dissipation spectrum is used.
Furthermore, the Taylor length scale λ is estimated via
- the zero crossings of the fluctuating velocity
where l is the average distance between zero-crossings. C is a constant of the order of unity that quantifies the level of non-Gaussianity of the derivative . This method is discussed in Refs. 51–53. In Fig. 12, the density of zero-crossings ns times the average distance between zero crossings as a function of the characteristic length ηc is plotted.
For values of ηc within the inertial range, a power law 2/3 is expected,51 and eventually for smaller filter sizes (or large ) a plateau is reached. The presence of this plateau, related to the dissipative range, implies that the density of zero-crossings are well resolved, and therefore, l can be deduced using the trivial relation . If the plateau is not reached, small scales are not resolved and the method cannot estimate the Taylor length scale. On the other hand, if after the plateau the value of increases again, it means that the cutoff frequency is too high and the analysis is affected by small scale noise. Initially, the constant C = 1 gives a good approximation of λ using Eq. (23).
- The zero crossings of the fluctuating velocity, but here C is defined as (see Ref. 54),
where is the mean and is the standard deviation of , where is filtered with the largest frequency within the plateau of . Latter method results in a better estimation of λ if is resolved.
The mean energy dissipation rate is estimated by its one-dimensional, isotropic surrogate using
- 1. & 2. method: either second or third order structure function. The estimation of the dissipation rate using this method relies on the transfer of energy within the inertial range. This method is particularly useful when the higher frequency content present in the flow is not fully resolved by the measurement device. This is generally the case when, e.g., the length of the hot wire is larger than the Kolmogorov length scale η [see Eq. (32)] of the flow under consideration. This function generates a pop-up dialog box to enter the value of the constant (Refs. 55–57) used in the relation [see Eq. (25)] between second order structure function and energy transfer rate (transfer from one scale to another one) based on the assumption of homogeneous isotropic turbulence (HIT).28,40 The mean energy dissipation rate is calculated by determining the mean among 5 points closest to the peak (i.e., a small plateau) value of . We expect that this plateau becomes more pronounced with the increasing Reynolds number. In Fig. 13, the scale dependence of
using either second or third order structure function, or , is plotted.
Basically, the energy transfer mechanism in the turbulent cascade takes place in three different ways: First, the energy is injected at large scales in the turbulent flow. Second, the injected energy up to the integral length scale transfers across within the inertial range down to the scale at which viscous effects become significant. Third, the transferred energy across the inertial scales starts to dissipate into heat because of the dominant effects of fluid viscosity across the dissipation range. From the conservation of energy principle, it follows that in steady state the rate of injection of energy at large scales, the energy transfer rate across inertial scales, and the energy dissipation rate at small scales in the dissipation range must be equal. We note that Eq. (26) is an exact result in the inertial range and it requires stationarity, homogeneity, and isotropy of the turbulent flow. The lack of isotropy in turbulent flows leads to a modification of the prefactor . For further details, the corresponding literature should be studied, see, for example, Ref. 58. In the present context, however, we assume the underlying flow to be isotropic. Equations (25) and (26) lead to the same estimation of the dissipation rate by choosing the right value of the constant C2.
- By using the chosen Taylor length scale and the following relation:
which is valid for isotropic turbulence.59
- As a consequence of the K41 phenomenology, the second order structure function implies an energy spectrum density of the form
Ck is the so-called Kolmogorov constant that remains undetermined in Kolmogorov's theory [typically (Refs. 9 and 60)]. Following Kolmogorov's prediction, a fit in the inertial range ( ) according to Eq. (28) is used to estimate the mean energy dissipation rate (see Fig. 14).
- A fit in the inertial range ( ) according to
with k0 being a shift for the argument of the function, i.e., a shift along the wave number axis, which is now used to estimate the mean energy dissipation rate (see Fig. 14).
- Through the dissipation spectrum,Equation (30) is valid for isotropic turbulence. As proposed by Mora et al.,53 the dissipation spectrum is modeled for large wave number k using a power law (see Fig. 15),
with the coefficients a and b of the power law fit. The integration of Eq. (30) is divided in two parts. The first part is performed from k = 0 to the beginning of the range of wave numbers used for the fit. Then, the modeled dissipation spectrum is used for the second part of the integration from the beginning of the range of wave numbers used for the fit to the largest available wave number k.
allows for reasonable comparisons of experiments with different boundary conditions or energy injection mechanisms ( is independent of the integral length scale L).
At the end of this function, in Fig. 16, a vertical dashed line at the integral L, Taylor λ, and Kolmogorov length scale η will be added to energy spectral density and compensated energy spectral density.
D. Forward cascade and normalization of the data
In the following, a test corresponding to the forward cascade using the third order structure function is performed. Furthermore, on the account of the comparability of various turbulent datasets, a common data normalization strategy is proposed, as mentioned below. Such normalization procedure is possible, as we have now introduced all the necessary scales.
Struc_flip_test. This function tests whether the data have to be flipped or not. Flipping the data here is referring to reversing the order of the elements. The decision of flipping of data depends on a simple relation of third order structure function with the dissipation based on the assumption of homogeneous isotropic turbulence (HIT). In the present analysis, we assume the forward turbulent cascade. Thus, in the forward turbulent cascade, the third order structure function becomes negative as a consequence of irreversibility (i.e., no inverse cascade) within the turbulent cascade. Based on this we propose a rule of thumb that demands that the quantity must be negative in the inertial range, taking into account that the increment time series is obtained according to Eq. (10). Vortex stretching is the physical phenomenon that causes the third order structure function to be negative. To verify this, as a function of the scale r is plotted, from which it is possible to decide whether it is essential to flip the data or not.
E. Statistics of velocity increments ur
In this section, the properties of the turbulent structures estimated from the velocity increments using the probability density function and scaling exponents of the higher-order structure functions are examined.
plot_increment_pdf. This function plots the probability density function of the velocity increments at the scale r = L, and , as depicted in Fig. 17.
The scaling of a selected set of known intermittency models is also included in Fig. 19 for comparison. For this, the user is asked to specify the intermittency coefficient μ [experiments suggest a value of (Ref. 10) and (Ref. 67)] and the coefficient D = 2.8 (Refs. 10 and 68–70) of the β-model proposed by Novikov and Stewart69 and introduced by Frisch et al.10,70
III. PART II: MARKOV ANALYSIS
In the phenomenological model by Friedrich and Peinke,71 the turbulent energy cascade is interpreted as a Markov process of the velocity increments ur evolving in scale r. The chaotic property of a turbulent flow implies that the evolution of turbulent structures in the cascade exhibits a certain degree of randomness. For this reason, the turbulent cascade is taken as a stochastic process described by a Fokker–Planck equation and its Kramers–Moyal coefficients. The distinctive feature of this approach is that the stochastic processes are not considered in time but as a process evolving in scale r.71–75 A realization of this stochastic process is a sequence of velocity increments, with at one location x or, respectively, t, see definition equation (10).
This approach, hereafter referred to as Markov analysis, achieves a comprehensive and effective characterization of the complexity of turbulence including/preserving the phenomenon of intermittency by stochastic equations. The correct estimation of the coefficients is crucial for an accurate description of the underlying scale dependent process by a Fokker–Planck equation. The validity of these coefficients is subsequently tested via the reconstruction of structure functions and probability density functions of velocity increments (see Sec. V) and by validating the integral fluctuation theorem (see Sec. IV).
A. Examination of the Markov property/determination of the Markov–Einstein length
There are several methods to verify the Markov property using experimental data.81–83 In the following, two different methods are presented. The Wilcoxon test, which is a quantitative test and one of the most reliable procedures, and a qualitative/visual test to validate the Markov property. The advantage of these two methods is that they do not depend on the estimation of the Kramers–Moyal coefficients instead both use exclusively the underlying data for the verification.
markov generates a pop-up dialog box to enter the number of samples that corresponds to Einstein–Markov length (proposed value in the pop-up dialog box is ). It should be noted that if the resolution of the used sensor ceases at ΔEM, it is possible to enter the number of samples that correspond to a larger scale than ΔEM at which the data might be resolved in scale (for example samples corresponding to λ or ). A red vertical dashed line at the Einstein–Markov length will be added to the spectrum in the frequency domain.
min_events generates a pop-up dialog box to enter the minimum number of events/counts to occur in a single bin, which will make that specific bin valid for further processing. If the minimum number of events is equal to 400 all of the information in those specific bins in which the number of events/counts is less than 400 will be excluded for further post-processing of data. The provision of the minimum number of events/counts is to avoid the appearance of noise and hence to achieve better parameter fits. Based on the experience, we have fixed the minimum value of the to 400. Nonetheless, it is possible to increase/decrease this number on the basis of number of sample points or significant statistics.
Note that the results presented hereafter are related to a modified number of bins (changed from 93 to 201). The number of bins has been adjusted to get a more detailed view of the following figures (conditional PDFs and the Kramers–Moyal coefficients). These detailed illustrations will be described in the readme file in an exemplary manner. A smaller number of bins leads to slightly different results, but the general trend remains the same.
B. Estimation of conditional moments
Scale_steps. This pop-up dialog box calculates the possible number of steps between the integral length scale and Taylor length scale, which are separated by the Markov length. For these steps, the conditional moments for different r and later on the Kramers–Moyal coefficients (KMCs) will be estimated.
multi_point. In this pop-up dialog box, it must be selected whether to perform the multipoint analysis or not (see annual review article by Peinke, Tabar and Wächter22). To do the Fokker–Planck analysis using the multiscaling approach, the question in this pop-up dialog box must be denied. If multipoint analysis should be performed, an additional condition on the increment must be specified in the next pop-up dialog box.
conditional_moment. This function estimates the kth order conditional moment , for all scales and for each bin (specified in the function scale_steps and increment_bin) for all values of longitudinal velocity increments ur. For a fixed scale r, the conditional moments are calculated for 5 different scales separations (colored circles in Fig. 22) within the range of . Here, the condition is fulfilled.
plot_conditional_moment. This function plots a higher resolved estimation of the first and second conditional moments for 12 different scales separations within the range of (see Fig. 22). For this purpose, a scale r and the number of a bin (value of the velocity increment ur) condition must be specified. The proposed value in the pop-up dialog box is r = L and . A possible deviation from a linear law for small values of is due to the Einstein–Markov length, as the Markov property ceases to hold for very small scale separations.
C. Estimation of Kramers–Moyal coefficients (KMCs)
In this section, the Kramers–Moyal coefficients are estimated by means of the conditional moments that were calculated in Sec. III B.
D. Pointwise optimization of Kramers–Moyal coefficients
As described above, a histogram-based estimation of the Kramers–Moyal coefficients is performed. We call this estimation pointwise due to the fact that the coefficients are determined for fixed scale r and fixed velocity increment of a bin. While this approach is one of the most straightforward methods to determine an initial estimate of from data, it is also sometimes characterized by a high degree of uncertainty. This is especially true for bins containing a relatively small number of events. Furthermore, the limit approximation in Eq. (54) (see also Fig. 22) leads to uncertainties in the absolute values of the Kramers–Moyal coefficients, whereas the functional forms of are commonly well estimated.
In this package, we address this problem by following a two-step optimization. First, a pointwise optimization of the experimentally estimated at each scale and value of velocity increment is performed using the short time propagator76 to find the best Fokker–Planck equation to reproduce the conditional PDFs as depicted in Figs. 23(c) and 23(d) (see Sec. III E). As shown in Ref. 64 for experimental data, there still remains an uncertainty in the coefficients using this first optimization. This uncertainty is used as a freedom to furthermore optimize the KMCs in the following second step. In this second step of the optimization (see details in Sec. IV B), the Kramers–Moyal coefficients are optimized relatively to the integral fluctuation theorem (see details in Sec. IV).
This procedure to obtain an optimal Fokker–Planck equation can be interpreted as follows: If the condition for a Markov process for the cascade is given, the whole process is uniquely defined by the transition probabilities for increments from scale to rn. This is the most central feature of the cascade process. In step 1, the best Fokker–Planck equation to model is found. The remaining uncertainties in the coefficients of the Fokker–Planck equation are used in step 2 to get the best values for the integral fluctuation theorem (IFT). In this way, a Fokker–Planck equation, which is compatible with the Markovian cascade process is obtained. As the fulfillment of the IFT is an independent feature and a direct consequence of a Fokker–Planck equation, the here-obtained Fokker–Planck equation can be considered as the optimal solution. The methods presented here allow for a critical analysis of these points for any given data set.
E. Pointwise optimization of Kramers–Moyal coefficients: Conditional PDF
In addition, this function generates a pop-up dialog box whether an example optimization is required or not. If the pop-up dialog box is denied, then this function straightaway performs the optimization for all scales and all velocity increments without plotting anything. If the pop-up dialog box is confirmed, then the conditional PDFs will be plotted using different representations [see Figs. 23(c)–23(f)] to see the differences between optimized, non-optimized, and experimental conditional PDFs. Note that if the variable scale_steps is equal to 9, then it is possible to enter any scale number from 1 up to scale number 9 (smallest and largest scale, respectively).
IV. PART III: ENTROPY ANALYSIS
In the spirit of non-equilibrium stochastic thermodynamics,23 it is possible to associate a total entropy variation with every individual cascade trajectory .23,24,32,86,87 As the effective dynamics of the cascade trajectories through scales are stochastic in scale due to the stochasticity of the variable ur itself, this entropy defined in the following is a stochastic entropy variation associated with the evolution of the cascade process given in terms of a Langevin corresponding to the above-mentioned Fokker–Planck equation. Therefore, entropy is defined here as a statistical or rather information-theoretic quantity, that is, not equivalent to otherwise defined thermodynamic entropy of the fluid.
A. Validation of integral fluctuation theorem
In this section, agreement with the integral fluctuation theorem is addressed based on the estimation of the total entropy production for each of the turbulence cascade trajectories. The validity of the integral fluctuation theorem follows if the Fokker–Planck equation correctly describes a system.23
trajec. A pop-up dialog box is generated to select if the start and end of the cascade trajectory should be adjusted. If the pop-up dialog box is denied, then starts at the integral length L and end at the Taylor length λ. If this is confirmed, at the beginning of function checkFT, a pop-up dialog box is generated to specify whether the start and/or the end of the cascade trajectory should be adjusted in multiples of integral, respectively, Taylor length scale.
In addition, a pop-up dialog box (see Fig. 26) is generated to select whether the total entropy should be calculated for overlapping or independent cascade trajectories. Independent cascade trajectories are obtained by splitting the velocity time series into intervals of integral length scales. For overlapping cascade trajectories, it is no longer guaranteed that the trajectories are independent. As a result, the detected structures are analyzed several times. Thus, higher accuracy is incorrectly obtained for the derived quantities. This should be taken into account when interpreting the results.
dr_ind. A pop-up dialog box is generated to define the separation of scales referred to the sequence from large to small scales in the cascade trajectory. The proposed value in the pop-up dialog box is equal to the Einstein–Markov length ΔEM.
data_length. A pop-up dialog box is generated to select the percentage of the data length that should be used to perform the calculation of the total entropy variation (for example, the first 20% of the data).
B. Pointwise optimization of Kramers–Moyal coefficients: IFT
This function performs a pointwise optimization of with respect to the integral fluctuation theorem, which utilizes the experimental uncertainty in the determination of the Kramers–Moyal coefficients. The separation of scales/step increment (in samples) associated with the sequence from large to small scales in the (independent) cascade trajectories is set to a minimum step increment of 1 sample. Note that we use here a separation, that is, less than, or equal to the Einstein–Markov length of ΔEM. This can be justified by the fact that the stochastic equation is a continuous model for the cascade process, which bridges the corresponding coarse graining of the turbulent data with the finite Einstein–Markov length. Thereby, the stochastic model processes are not changed to processes with correlated noise, which could be a topic for further research. Furthermore, the defined separation of scales referred to the sequence from large to small scales in the cascade trajectory can be modified by the function dr_ind.
iter. This pop-up dialog box is generated to enter the maximum number of iteration, which will be performed for the optimization.
Tol_D1, tol_D2. This pop-up dialog box is generated to specify the constraints/tolerance in percent of the coefficients , which will be used to perform the optimization.
Combining the functions checkFT and plot_entropy with dr_ind = 1 and imposing overlapping cascade trajectories and optimized Kramers–Moyal coefficients, the results presented in Fig. 28 are obtained for the calculation of .
As it can be seen in Fig. 27, this optimization is a fine-tuning of the coefficients, but by comparing Figs. 28(a)–28(d), its impact on the IFT and the PDF of the system, medium and total entropy is clearly evident. In this comparison, it must be taken into account that the separation of scales from large to small scales in the cascade trajectory is different and that overlapping cascade trajectories are investigated here.
V. PART IV: CONSISTENCY CHECK
In this section, the reconstruction of structure functions, i.e., moments of PDFs of velocity increments at different scales using the estimated KMCs is performed. The comparison of reconstructed and experimental structure functions allows addressing the validity of the estimated KMCs.
The overall scheme is: if the Markov properties are fulfilled and the cascade process can be described by a Fokker–Planck equation (i.e., Pawula theorem is fulfilled), then the knowledge of the Kramers–Moyal coefficients allow to determine:
-
the conditional PDF given by the short time propagator (consequence of Markov property),
-
the unconditioned PDF (consequence of the definition of conditioned probabilities),
-
the kth order structure function (consequence of the definition),
-
the entropy production of each cascade path (consequence of the definition),
-
and to check the agreement with the IFT (consequence of the validity of the Fokker–Planck description).
recon_struc_pdf. In addition to verifying the consistency with the integral fluctuation theorem, the validity of the estimated drift and diffusion coefficients is subsequently tested via the reconstruction of structure functions and probability density functions of velocity increments at different scales is performed. The comparison of reconstructed and experimental structure functions allows addressing the validity of the estimated KMCs.
Initially, a pop-up dialog box (see Fig. 29) is generated to select the Kramers–Moyal coefficients to be used for the reconstruction.
From Eq. (68), it can be seen that the unconditional PDF at the larger scale is needed to perform the reconstruction at the next smaller scale in the hierarchical ordering. In a second pop-up dialog box (see Fig. 30), the user must select from the following two scenarios:
In the first scenario, Eq. (68) is solved using the unconditional PDF at the larger scale estimated directly from the experimental data at each scale independently. In the case, the optimized (conditional PDF) or optimized (IFT) is chosen, and this method often results in very good agreement, because the optimization of the drift and diffusion coefficient is performed at each scale independently. The non-optimized leads to poorer agreement in the experimental and reconstructed structure functions and PDFs.
In the second scenario, Eq. (68) is solved using the unconditional PDF at the larger scale from the experimental data only at scale L in the first step and an iterative process is used toward smaller scales. In other words, the entire turbulence cascade is reconstructed using only the initial solution at the largest scale from the experimental data. Typically, this method leads to very good agreement at large scales, whereas the error between experiment and reconstruction increases toward smaller scales using the iterative approach. As mentioned earlier, this is due to the fact that the drift and diffusion coefficients are estimated locally at each scale by optimization. Accordingly, in this iterative approach, minor deviations at the largest scale influence the reconstruction at the next smaller scales. This error may be reduced by performing a scale dependent optimization from large to small scales. Overall, the convergence of the statistics at large increment values, which are rarely encountered, should also be taken into account when making this comparison.
As just described, the structure functions and the probability density functions of velocity increments are obtained from pstp. In Fig. 31, the comparison of experimental and reconstructed kth order structure function with for scales are shown. In Fig. 32, the reconstructed probability density functions of velocity increments are plotted.
VI. CONCLUSION AND OUTLOOK
We have presented a user-friendly, open-source package, which helps to perform standard analyses of given turbulent data. In addition, the stochastic equations describing the scale-dependent cascade process in turbulent flows through Fokker–Planck equations and their application to the integral fluctuation theorem can be estimated. Altogether, we believe that this package is of interest especially to young scientists and newcomers to a stochastic approach to turbulence, as it greatly enhances the practicability and availability of both the standard and the advanced analyses introduced in this paper.
This package is designed to analyze turbulent data, but it can also be used by researchers outside of the field of turbulence for the investigation of the statistical properties of complex time series such as financial, surface height, or seismic time series to name but a few. Using this package may contribute to new insights into the complex characteristics of length/timescale-dependent processes ranging from physics to meteorology, biological systems, finance, economy, surface science, and medicine; see also Ref. 21.
The overall structure of the package is shown in Fig. 33. As can be seen from this figure, the fundamental length scales are key variables for all subsequent analyses. Therefore, a collection of the methods for the estimation of the fundamental length scales and the dissipation rate is presented. This collection does not claim to be complete, and it would be of benefit if other researchers add further methods to our collection. The direct comparison of different methods allows the user to evaluate the consistency of the methods. We include a comprehensive collection of basic analyses already established in turbulence research. The advanced analysis will result in the stochastic equations describing the scale-dependent cascade process in turbulent flows by Fokker–Planck equations. The consistency check, including the IFT, between the results of the Fokker–Planck equation and the data can be performed.
As discussed in the paper, the estimation of the stochastic equations [in particular via the limit approximation in Eq. (54) (see also Fig. 22)] leads to uncertainties of the Kramers–Moyal coefficients. In this package, we address this problem by a two-step optimization as an example. If the condition for a Markov process for the cascade is given, which is the central feature of this advanced analysis, the first step of the optimization finds the best Fokker–Planck equation to reproduce the transition probabilities. In the second step, the optimization of the Kramers–Moyal coefficients with respect to the integral fluctuation theorem, which must be fulfilled if the Fokker–Planck description is correct, is performed. Note that the optimization may be extended to other quantities like structure functions. It is always important to perform a final consistency check of the optimized Fokker–Planck equation to quantify the improvements, which is indicated by the double arrows in Fig. 33.
Potential usage of the Fokker–Planck analysis includes the derivation of equations for structure functions describing the moments of probability density functions of velocity increments at different scales. Furthermore, the estimated Fokker–Planck equation is the basis to the generation of synthetic or surrogate time series. Here, the estimated drift and diffusion coefficients can be used to setup a Langevin equation, which can then be deployed for time series enhancement or forecasting. Such a type of synthetic time series can be generated by either a forward in time88 or a central in space scheme.89
This paper only covers the discussion of turbulent flows satisfying the assumption of homogeneous, isotropic turbulence and the Taylor hypothesis of frozen turbulence. In principle, the tools of this package can be applied to turbulence data in other configurations and turbulent flows with non-ideal HIT conditions as shown for non-fully developed turbulence25 or turbulence with shear; cf. Refs. 26, 27, and 90. Future work will also be devoted to benchmark the here-proposed package on the basis of exactly parametrized synthetic turbulence data.91
ACKNOWLEDGMENTS
The software resulted from funded research. We acknowledge financial support by Volkswagen Foundation (VolkswagenStiftung): 96528. This work was partly funded by the German Federal Ministry for Economic Affairs and Energy in the scope of the projects EMUwind (No. 03EE2031A/C). We acknowledge the following people for helpful discussions and testing pre-version of the package: A. Abdulrazek, J. Ehrich, A. Engel, A. Girard, G. Gülker, P. G. Lind, D. Nickelsen, N. Reinke, M. Obligado, and T. Wester.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
André Fuchs: Writing – original draft (lead); Writing – review & editing (equal). Swapnil Kharche: Writing – review & editing (equal). Aakash Patil: Writing – review & editing (equal). Jan Friedrich: Writing – review & editing (equal). Matthias Wachter: Writing – review & editing (equal). Joachim Peinke: Writing – review & editing (equal).
DATA AVAILABILITY
The package is available as free software, under the GNU General Public License (GPL) version 3. The package [source code and standalone applications (64-bit) for Windows, macOS, and Linux] and the data that support the findings of this study are openly available in the repository on GitHub92 or the Matlab File Exchange Server.93 Support is available at the GitHub repository,92 where questions can be posted and generally receive quick responses from the authors.
Name: OPEN_FPE_IFT
Persistent identifier: GitHub
Persistent identifier: Matlab File Exchange Server
Publisher: André Fuchs
Version published: 5.0
Date published: 15/06/22
Operating system: Windows, macOS, and Linux
Programming language: MATLAB
NOMENCLATURE
Latin symbols
-
normalized turbulent kinetic energy dissipation rate
-
kth order Kramers–Moyal coefficients
- E(f)
-
energy spectral density in the frequency domain
- E(k)
-
energy spectral density in the wave number domain
- Ekin
-
total kinetic energy
- Fs
-
sampling frequency
- k0
-
shift of the argument of the fit function, i.e., along the wave number axis
- K
-
kurtosis
- L
-
integral length scale
-
kth order conditional moment
-
probability density function of ur
-
conditioned PDF of velocity increments for a pair of two scales with
-
spatial length scale
- Re
-
Reynolds number
-
Taylor Reynolds number
- S
-
skewness
- Ti
-
turbulence intensity
- u
-
streamwise velocity
-
mean streamwise velocity
-
streamwise velocity fluctuations
-
standard deviation
-
temporal velocity increment at time scale τ
-
spatial velocity increment at length scale r
-
cascade trajectory
Greek symbols
NOMENCLATURE
Latin symbols
-
normalized turbulent kinetic energy dissipation rate
-
kth order Kramers–Moyal coefficients
- E(f)
-
energy spectral density in the frequency domain
- E(k)
-
energy spectral density in the wave number domain
- Ekin
-
total kinetic energy
- Fs
-
sampling frequency
- k0
-
shift of the argument of the fit function, i.e., along the wave number axis
- K
-
kurtosis
- L
-
integral length scale
-
kth order conditional moment
-
probability density function of ur
-
conditioned PDF of velocity increments for a pair of two scales with
-
spatial length scale
- Re
-
Reynolds number
-
Taylor Reynolds number
- S
-
skewness
- Ti
-
turbulence intensity
- u
-
streamwise velocity
-
mean streamwise velocity
-
streamwise velocity fluctuations
-
standard deviation
-
temporal velocity increment at time scale τ
-
spatial velocity increment at length scale r
-
cascade trajectory