Ensemble flow reconstruction in the atmospheric boundary layer from spatially limited measurements through latent diffusion models

Due to costs and practical constraints, field campaigns in the atmospheric boundary layer typically only measure a fraction of the atmospheric volume of interest. Machine learning techniques have previously successfully reconstructed unobserved regions of flow in canonical fluid mechanics problems and two-dimensional geophysical flows, but these techniques have not yet been demonstrated in the three-dimensional atmospheric boundary layer. Here, we conduct a numerical analogue of a field campaign with spatially limited measurements using large-eddy simulation. We pose flow reconstruction as an inpainting problem, and reconstruct realistic samples of turbulent, three-dimensional flow with the use of a latent diffusion model. The diffusion model generates physically plausible turbulent structures on larger spatial scales, even when input observations cover less than 1% of the volume. Through a combination of qualitative visualization and quantitative assessment, we demonstrate that the diffusion model generates meaningfully diverse samples when conditioned on just one observation. These samples successfully serve as initial conditions for a large-eddy simulation code. We find that diffusion models show promise and potential for other applications for other turbulent flow reconstruction problems.


I. INTRODUCTION
Atmospheric field campaigns aim to characterize the complicated state of the atmosphere by deploying several measurement systems, often supplementing observations with modeling.Field campaigns in the atmospheric boundary layer (ABL, approximately the lowest 1 km), are critical for different research areas, such as wind energy 1,2 , air quality 3? , and wildland fires 4 .While field campaigns in the ABL typically strive to measure as much of the lower atmosphere as possible, they measure only a small fraction of the atmosphere due to practical constraints and the costs associated with observation systems.Models can be used to fill the unmeasured areas between the spatially limited measurements, so that atmospheric dynamics can be characterized across a wide range of scales.This combination of measurements and models enables what is referred to as flow reconstruction: turning spatially and temporally sparse information into a highly resolved field.
The flow reconstruction problem can be formulated in a variety of approaches, depending on the available data and choice of machine learning architecture.Perhaps the most common framing is that of a "super-resolution" problem (see Fukami et al. (2023) 14 for a recent review).In super-resolution, low-resolution data is available, and data is upsampled to higher resolution through the use of an algorithm.Turbulence super-resolution has been carried out through the use of convolutional neural network architectures [15][16][17] as well as generative adversarial network (GAN) architectures [18][19][20][21][22][23] .In an alternative to super-resolution, turbulent flow reconstruction has been posed as an "inpainting" problem 24 .In this scenario, part of an image is masked, and an algorithm plausibly reconstructs the missing data.Others have specifically examined the problem of reconstruction given sparse measurements, often through the use of convolutional neural networks 25? .Finally, instead of using machine learning architectures rooted in the field of computer vision, others have drawn inspiration from the intersection of partial differential equations and machine learning.Flow reconstruction approaches in these categories employ architectures such as "physics-informed neural networks" (PINNs) [26][27][28] and Deep Operator Networks 29 .All in all, the aforementioned flow reconstruction techniques have been applied to a variety of turbulent flow environments, ranging from two-dimensional 19 and three-dimensional 30 canonical fluid mechanics problems (e.g., flow behind a cylinder) to two-dimensional geophysical problems ?(e.g., the state of the sea surface).
While flow reconstruction based on machine learning has been shown to be powerful, these techniques have not yet been applied to real-world, three-dimensional geophysical flows, such as an ABL.Real-world, ABL flow reconstruction comes with important challenges that must be accounted for.For example, many observation systems (e.g., Doppler lidar) measure lineof-sight velocity, which often serves as an accurate proxy for just one velocity component (e.g., streamwise velocity u) instead of measuring all three velocity components (u, v, w).
So far, only PINNs have demonstrated the ability to reconstruct all velocity components from measurements of just a scalar field 26 .While PINNs show great potential, they often struggle for multi-scale problems 31 such as high Re three-dimensional turbulence, and their success has not yet been demonstrated in an ABL.Another issue in reconstructing realworld flows is the sparseness of ABL measurements, as mentioned earlier.The atmosphere is highly chaotic and flow reconstruction from sparse measurements is typically an ill-posed problem, as many non-unique states could correspond to the same observation.As such, geoscientists often characterize its state probabilistically (e.g., in ensemble-based weather forecasting 32 ).However, most flow reconstruction studies have been practically deterministic (e.g., using GANs, which suffer from the "mode collapse" problem 33 ).Only two studies thus far, Gundersen et al. 25 using variational autoencoders and Hassanaly et al. 22 using modified GANs, have reconstructed turbulent flow in a probabilistic manner, though it is not clear if their two-dimensional architectures can computationally scale to a three-dimensional, highly turbulent ABL.
Recently, a new neural architecture known as a diffusion model (DM) 34,35 has achieved state-of-the-art status for generating high-resolution imagery 36 , showing promise for turbulent flow reconstruction in the ABL.A specific category of DM, known as a latent diffusion model (LDM) 37 is computationally efficient, and LDMs have been used to generate highresolution two-dimensional imagery 37 and three-dimensional medical imagery 38 .DMs are inherently stochastic, and they have been shown to generate diverse, photorealistic imagery when a random seed is changed.Finally, while DMs have shown strong performance for the super-resolution problem 37 , they have also excelled at inpainting 39,40 .We posit that the inpainting perspective is a natural approach to pose the problem of flow reconstruction from spatially limited measurements.
In this paper, we ask: given measurements in one small region of the atmosphere, can we estimate the instantaneous, unmeasured state of the ABL nearby through inpainting with an LDM?Here, we study this problem in the context of a synthetic field campaign that is conducted through large-eddy simulation (LES).In this work, our flow reconstruction strategy is to recreate instantaneous, volumetric flow fields.This strategy shares similarities with data assimilation approaches that are commonly applied for mesoscale and synoptic scale atmospheric reconstruction, namely 4DVAR 41 and the ensemble Kalman filter 42 , which produce plausible initial conditions for a dynamical solver given a time-history of observations.Our investigation here addresses three key ABL reconstruction challenges in a synthetic environment: (1) probabilistic reconstruction, (2) dealing with spatially limited measurements, and (3) translating from scalar measurements to all three velocity components.For the time being, we omit two additional challenges of ABL reconstruction, namely dealing with (1) noisy measurements and (2) a time-history of measurements.An optimal ABL reconstruction strategy would likely be able to account for all these challenges.We find that LDMs can successfully reconstruct ABL flow given limited measurements, and we believe that LDMs can be applied for other turbulent flow reconstruction problems as well.In this paper, we demonstrate the following contributions: • Through the use of LDMs, we can generate diverse three-dimensional turbulent flow fields.We characterize the quality of LDM samples through the use of qualitative visualization and quantitative assessment.
• We show that LDMs can reconstruct all three velocity components (u, v, w), even when given observations of almost exclusively u.
• While LDM studies in the field of computer vision have masked upwards of 75% of an image, we show that LDM reconstructions can be conditioned, even when minimal observations (<1% of the volume) are provided.
• We demonstrate that a LDM sample can successfully be used as an initial condition for an LES.To our knowledge, this is the first time that a machine learning sample has demonstrated this kind of compatibility with an LES code.
The rest of this manuscript is structured as follows.In Section II, we describe our synthetic field campaign, the configuration of the flow reconstruction problem, and the LES dataset that is used to train and test the LDM.In Section III, we provide details of the LDM architecture.In Section IV, we study the performance of the LDMs.Finally, in Section V, we conclude the paper and discuss potential future lines of inquiry.

II. DATA
In this work, we explore flow reconstruction in the context of the real-world Rotor Aerodynamics Aeroelastics and Wakes (RAAW) field campaign 2 , which seeks to thoroughly characterize the behavior of a single, utility-scale wind turbine with respect to the incoming atmospheric inflow, particularly on shorter timescales (1 second to 10 minutes).Here, we conduct a synthetic version of the field campaign, which is sometimes referred to as an "observing system simulation experiment." We simulate a thermally neutral atmospheric boundary layer using the LES code AMR-Wind 43 .AMR-Wind solves the incompressible Navier Stokes equations using a finite-volume approach with second-order accuracy in space and time.Details of the spatial and temporal discretization can be found in Almgren et al. (1998)  44 .The simulation is run with a Smagorinsky subgrid-scale model 45 .The domain is forced by U = 10 m s −1 geostrophic winds in the x-direction, as well as Coriolis forcing at a latitude of 90 • .The LES domain is sized (x, y, z) = (1920, 1920, 960) m with (nx, ny, nz) = (128, 128, 64) grid points, leading to 15-m resolution in all three directions.The simulation uses a fixed 0.5-s timestep.The LES is initially run for 6 hours to allow turbulence to spin up and become stationary.Afterward, we generate data for the training set by running the LES for 3.5 days of simulated time and saving the (u, v, w) fields everywhere in the domain every minute, totaling 5,040 samples.
After the training set is generated, we generate a test dataset by simulating another 3.5 days after the end of the training dataset period.By using test data from after the training period, we ensure there is no cross-contamination between datasets while also achieving good statistical agreement between the two datasets in our heights of interest, rotor span heights of 56.5-183.5 m (Section IV).
Synthetic measurements are generated by masking regions of the three-dimensional LES output, and in the context of this paper, we refer to unmasked regions as "observations." We test the LDM on two sets of masks (Fig. 1).The "box mask" is commonly used in two-dimensional image inpainting problems 39,40 .Here, we mask a cube at the center of the domain spanning (64, 64, 32) grid points, in order to qualitatively demonstrate flow reconstruction in a scenario with many observations.For the box mask, we use observations of all three components velocity.The "field campaign (FC) mask" hides data at locations specific to the RAAW field campaign instrumentation layout, observing data at 1,525 of 1,048,576 LES grid cells.This mask could be modified for any other layout of field measurements.
All instruments are aligned to point into the incoming wind, as they would be in the field.
The turbine can be thought of as sitting at (x, y) = (1800, 960) m.The horizontal scanning lidar reaches 1,000 m upwind of the turbine, covering an azimuthal range of 18 • at a height of 120 m.The vertical spinner lidar scan covers the rotor area and sits 120 m upwind of the turbine.The meteorological mast sits 360 m upwind of the turbine and measures in a vertical column between 0 m and 180 m.
In this study, we take steps towards developing an algorithm that could be applied to real-world measurements, but we still make a number of simplifying assumptions that will be relaxed in future work.
• We assume that measurements are noise-free.
• We assume that both lidars directly measure the u-component of velocity and only this component of velocity.In practice, lidars measure line-of-sight velocity, which is then projected to other directions to obtain specific components of the velocity vector.As our lidars measure directly upstream into the oncoming flow, we believe this assumption is warranted.The meteorological mast measures all three components of velocity.
• We assume that lidars instantaneously scan their 2D x−y plane of interest.In practice, both lidars scan across their 2D planes in approximately 2-5 seconds.I.

A. Background
In order to generate synthetic atmospheric states, we employ an LDM 37 (Figs.2, 3, Table I), an architecture based on DMs.In recent years, DMs have emerged as a new category of deep generative model.They were originally developed through the perspective of nonequilibrium statistical physics 34 , but they can be derived through the lens of score-based modeling 35 or Markovian hierarchical variational autoencoders 46 .From one perspective 47 , DMs are trained with the assistance of a prescribed degradation process, in which Gaussian noise is repeatedly added to a sample x over an interval t ∈ [0, T ] until the final sample is indistinguishable from pure Gaussian noise.This process can be thought of as a forward where f is the drift coefficient, g is the diffusion coefficient, and w is the standard Wiener process.This process can be undone such that the final Gaussian noise state can be reverted to the initial sample using the reverse stochastic differential equation where w is the standard Wiener process for the reverse equation and ∇ log p t (x) is the score function, which is challenging to estimate.DMs learn the score function, after which they can be used to draw samples from the probability distribution function p 0 (x) (which we will simply refer to as p(x) from here on) that characterizes the training dataset by initially starting with Gaussian noise.Given supplemental information, such as observations or class labels, DMs can draw samples from a conditional probability density function p(x|y) through a number of approaches-for example, training on paired data to directly learn to sample from p(x|y) as is done by Song and Ermon 35 , or by first learning to sample from the unconditional p(x) and then learning an additional conditioning mechanism 48 .We employ the first approach in this study.
While traditional DMs have been shown to generate high-quality images, they can be prohibitively expensive to train for large samples, and three-dimensional samples are often large.LDMs address this issue by compressing raw, pixel-space samples into a latent space with the use of an autoencoder (AE) (Fig. 2a).The AE is traditionally trained with a mixture of L 1 or L 2 losses, Kullback-Leibler divergence losses as in a variational AE 49 , perceptual losses based on the VGG network 50 , and patch-based adversarial losses 51 .After the AE is trained, a diffusion model is trained to generate samples in latent space (Fig. 2b).After the components of the autoencoder and diffusion model are trained, they can be combined to generate samples in pixel-space (Fig. 2c).

B. Modifications to the original LDM
We modify the original LDM architecture so that it can be applied to generate synthetic LES data.The exact implementation of the architecture can be found at https://github.com/rybchuk/latent-diffusion-3d-atmospheric-boundary-layer,and we provide a summary of the modifications to the LDM code here.
• The original LDM was developed for 2D images, so we modify the architecture to work on 3D data through the use of operations like 3D convolutions and 3D normalization.
• While the original LDM uses several attention blocks 52 throughout the AE, we omit their use there due to the computational demands of a 3D attention block.
• We replace group normalization 53 with instance normalization 54 everywhere except within attention blocks, as we found the latter performed better.
• In the process of selecting the finalized architecture for this paper, we experimented with different weights for each of the loss components in the AE.We also added the option for a physics-based loss that assesses mass conservation by calculating the divergence of the velocity field.In the end, we found the best performance by using an L 1 term, Kullback-Leibler divergence term, and whole-sample adversarial term in the loss function.We omit the VGG-based adversarial loss as well as the mass conservation loss.
• In the original LDM architecture 37 , image inpainting is accomplished by first drawing a sample from p(x|y) using unmasked regions as conditioning information.In this scenario, there is no guarantee that the generated sample exactly agrees with the conditioning information y, so inpainting is achieved by overlaying y onto the sample in a postprocessing step.In effect, this treats conditioning information as a hard constraint, meaning the observation is exactly matched in the reconstruction, which makes sense when inpainting images.However, we found this hard constraint would lead to artifacts at the boundaries of FC mask observations, likely because these observations are only one pixel wide in certain dimensions.As such, we do not postprocess LDM output to exactly match the observations, thereby treating observations as soft constraints, meaning observations are not exactly matched in reconstructions.In other common atmospheric reconstruction techniques, namely data assimilation 55 , observations are also treated as soft constraints.Though, we note that in data assimilation, the relative influence of observations on reconstructions can be controlled by prescribing a certain measurement noise magnitude, whereas in our LDM technique here, we cannot modulate the influence of observations.The LDM requires an AE network for compression to the latent space and decompression (Fig. 3a).Here, the AE has three input, latent space, and output channels that correspond to the velocity variables (u, v, w).Future work could expand this to learn other quantities like temperature, though we prioritize the three velocity components in this study as these have the greatest relevance to wind turbine dynamics.The encoder and decoder of the AE have three internal levels, each with three residual blocks 56 , compressing data with spatial dimensions of (128, 128, 64) to (32, 32, 16).The first internal level uses 16 channels, followed by 32 channels, and then 64 channels.We use a batch size of 2 due to the high memory use layers with 128 base channels.Our network is smaller due to computational constraints (we train on two 16-GB VRAM V100s) as well as our large three-dimensional samples.Following common practice, we rescale the input data so that the values in each channel fall in the range of -1 to 1.

C. Network configurations
While our primary goal is to generate candidate velocity fields given FC-style observa-

IV. RESULTS
Below, we assess the ability of the LDM to generate plausible atmospheric states.We begin by visualizing unconditional and conditional LDM samples and qualitatively assessing them, as is common in both computer vision manuscripts 37,47 as well as turbulent flow manuscripts 15,17,58 .Next, we quantitatively assess the LDM samples by calculating mean profiles, statistics, spectra, and divergence.Finally, we quantitatively assess the ability of the LDM to generate diverse samples from a single observation.
A. Qualitative assessment of LDM samples  We next show that conditional LDMs can inpaint well when provided abundant data, as is the scenario with the box max (Fig. 5).We visualize a u cross section of a ground truth sample, the associated box mask observation, one conditionally generated sample, and the mean and standard deviation from n = 100 conditionally generated samples.The Finally, we find that conditional LDMs can generate samples that look realistic even when provided minimal observations, as is the case for the FC mask (Fig. 6).We include additional visualizations, including visualizations of v and w which are only conditioned on meteorological mast measurements, in Appendix B. For the particular observation shown in Fig. 6, there is a region of slow u-wind in the vicinity of the meteorological mast.
Correspondingly, each of the samples also has a patch of slow wind in the same area.This slow flow structure is especially clear in the mean prediction (Fig. 6d, j), and qualitatively, the mean prediction aligns well with the ground truth in the vicinity of the observations.This behavior illustrates that the LDM includes the conditional information of the measurements when generating samples.Away from the observations, the mean prediction becomes much smoother, showing that the conditioning only has local effects, as would be expected in a turbulent flow.The standard deviation of the predictions for the FC mask (Fig. 6e, k) also show a smaller standard deviation in the vicinity of the observations, especially near the horizontal lidar measurement.Thus, the conditioning is reducing the sampleto-sample spread near the measurements, as desired.The FC samples exhibit much higher variance than the box mask samples (Fig. 5e), as would be expected because the FC samples are conditioned on a smaller region.As we discuss in Section IV C in greater detail, the sample standard deviation qualitatively matches up with a different estimate of the standard deviation that comes from another tool (Fig. 5f).
After assessing the visual quality of samples, we further demonstrate the quality of LDM samples by showing that they can successfully act initial conditions for our LES code.We generate a FC mask sample using the observation in Fig. 6 and then use it as the initial condition in a 1-hour LES simulation.We create a video that visualizes a time history of a streamwise cross section and a top-down view of the simulation at https://github.com/rybchuk/latent-diffusion-3d-atmospheric-boundary-layer.The simulation successfully runs and does not display any obvious numerical artifacts such as discontinuities in the flow field around the masked region or ringing in the form of waves or instabilities.

B. Quantitative assessment of LDM samples
We next quantitatively assess the performance of the LDMs against physical aspects of the LES: vertical profiles of velocity components and kinematic fluxes, probability density functions of velocity components, turbulence spectra, and mass conservation.The LDM accurately captures average velocity profiles (Fig. 7a-c), while slightly underperforming on kinematic velocity profiles (Fig. 7d).We compare horizontally averaged profiles from the training dataset, testing dataset, n = 100 unconditional samples, and n = 100 conditional samples from the FC mask.The mean and standard deviation of LDM velocity profiles agree well with training data at all heights, and with testing data beneath the capping inversion.Thus, the LDM performs well on first-order statistics.However, the LDMs do show some deviations when comparing kinematic fluxes, a second-order statistic that is more challenging to capture than the mean.In particular, the unconditional LDM underestimates the downward flux u ′ w ′ by as much as 20% between heights of 15 m and 150 m.The conditional LDM performs better in this region with only an 8% discrepancy, possibly because of the near-surface conditioning information.
Similarly, the probability distribution functions (pdfs) of velocity components show broad agreement, while missing fine-scale details (Fig. 8).We calculate velocity distributions using the same three-dimensional data that was used to calculate average vertical profiles.The pdfs from LDM samples match the pdfs from training data well in all bins, only showing minor discrepancies.
While the power spectra of LDM samples match ground truth at large spatial scales, the LDM spectra have too much energy at the smallest spatial scales (Fig. 9).We calculate one-dimensional spectra of u at a height of 90 m.The LDM and ground truth spectra have similar values for wavenumbers smaller than 0.01 m −1 , and the LDM accurately captures the inertial scale of turbulence, as visualized by the −5/3 slope.However, for wavenumbers larger than 0.01 m −1 , the LDM has too much energy.This behavior has been observed in other machine learning reconstruction studies 15,17 and is consistent with the "spectral bias" problem 59 inherent to many neural networks-the autoencoder part of our network seeks to minimize the L 1 of the samples, thereby prioritizing reconstructing fields on the largest spatial scales.As such, the smallest scales have a relatively small impact on the loss function.
Finally, we assess the LDM's ability to satisfy mass conservation by examining distributions of velocity divergence ∂u i ∂x i (Fig. 10).If continuity were exactly satisfied, the distribution of divergence would be shaped like a Dirichlet function at ∂u i ∂x i = 0.The testing and training data approximately satisfy this behavior, showing small deviations from 0 due to numerical artifacts.While distributions for both the unconditional and conditional LDM data show a spike centered on 0, their spikes are not nearly as sharp.Thus, the LDMs appear to have some awareness of mass conservation, but they do not satisfy it here as well as the ground truth data.

Rank histogram analysis
For ensemble-based flow reconstruction, it is important to have a well-calibrated ensemble 32 .
Ensemble members are drawn from some pdf, and in an ideal scenario, the ground truth would be seen as just another draw from that pdf, such that the ground truth is statistically indistinguishable from the ensemble members, a condition termed "ensemble consistency." Ideally, the simulated ensemble would show zero bias relative to the ground truth, and the ensemble should have an appropriate but not excessive amount of spread.
One common approach to assess ensemble consistency in the geosciences is the rank histogram 58,60 .When given a group of predictions, the rank of the ground truth is calculated by sorting the ensemble members from smallest to largest by some scalar quantity (e.g., u at a particular grid cell), and then identifying the position of the ground truth relative to the ensemble members.When repeated for several reconstructions, rank calculations can be collected into a rank histogram, a diagram that can diagnose ensemble bias as well as overconfidence or underconfidence.We assess ensemble quality by compiling a rank histogram using the first 6 hours (360 observations) from the test set and generating 10 ensemble members per observation (Fig. 11).We calculate the rank of each ground truth by examining u at the unobserved grid cell immediately downwind of the first scanning lidar observation, a cell that can be thought of as the turbine nacelle.An ideal rank histogram would look like a uniform distribution.
However, the rank histogram here looks like a combination of a U-shape and a downward linear slope.The U-shape shows that LDM samples are underly diverse-the ground truth disproportionately falls at one extreme of the ensemble.The downward slope indicates that the ground truth tends to be disproportionately smaller than the rest of the ensemble, or in other words, the ensemble often overpredicts.We verify this by calculating the bias of the LDM samples relative to their respective ground truths, finding that u is on average 0.02 m s −1 higher.Thus, while the downward slope suggests a tendency to overpredict u at turbine hub height, the near-zero bias is balanced out by the right arm of the U-shaped distribution.In the end, the LDM ensembles deviate from ideal behavior, which is in practice the case even for skillful ensemble forecasts 61,62 .These deviations may be small in the end, and future work will use these ensembles in a data assimilation process and assess if the unoptimal behavior is problematic in practice.

External estimate of conditional standard deviation
While rank histograms assess ensemble consistency and are used primarily in the ensemble weather forecasting field, we can alternatively assess the ensemble quality by examining "sample diversity".This concept is commonly examined in the deep generative modeling field, and when applied to computer vision problems, metrics such as Inception Score

Influence of FC observations on reconstruction accuracy
Finally, we quantify the impact of FC observations on the reconstruction accuracy at three locations (Fig. 12).We calculate vertical profiles of root mean squared error (RMSE) using the dataset that was used in the rank histogram analysis: 360 ground truth samples from which observations are sampled, and 10 corresponding reconstructions for each ground truth.
We calculate RMSE along the centerline of observations (y = 960 m) at the location of the meteorological mast (x = 1440 m), the upstream edge of scanning lidar measurements (x = 795 m), and an upstream location where conditional standard deviation analysis revealed that observations no longer had an influence on reconstructions (x = 480 m).
The RMSE profiles show that observations improve reconstruction accuracy predominantly as would be expected.Far away from observations at x = 480 m, the RMSE values

V. CONCLUSION
In this paper, we investigate turbulent flow reconstruction in the context of a synthetic field campaign in the atmospheric boundary layer in which only spatially limited observations are available.We demonstrate that latent diffusion models (LDMs) create diverse, three-dimensional turbulent fields that are convincingly similar to true LES fields.These reconstructions match many physical characteristics of LES, such vertical profiles and the largest spatial scales of the spectra.However, LDMs struggle at the smallest spatial scales, diverging from LES spectra in this region and failing to preserve continuity.
Our LDM work extends machine learning turbulent flow reconstruction literature in several key manners.In contrast to many of the deterministic approaches, LDMs generate diverse samples, which is important for turbulent environments.Our LDM reconstructed samples with minimal observations (< 1% of the domain is observed), though the algorithm also works well with abundant observations.And notably, LDM samples can be used as initial conditions for computationally stable LES simulations.
This study suggests several lines of possible future inquiry for turbulent flow reconstruction.In upcoming work, we will explore applying LDMs to noisy, real-world measurements in the RAAW field campaign.The LDM architecture could be modified to improve performance at the smallest spatial scales, perhaps through the use of physics-informed losses     Once fully trained, the final training loss is examined to decide if further increase of network complexity is warranted to appropriately approximate the optimal estimator (Figure 22).We find that as the number of trainable parameters increases beyond approximately 9,000 to approximately 60,000, the MSE decreases until a point where adding additional trainable parameters only marginally improves the loss.As such we stop increasing the complexity of the estimator network and produce the figure shown in Fig. 6f using the model with the highest number of parameters.
FIG. 1.(a) The masked region and observed region associated with the box mask.(b) The observations associated with the Field Campaign (FC) mask, with each instrument highlighted.The regions without observations are masked.
FIG. 2. A schematic depicting the major components of an LDM and their function.(a) First, the encoder and decoder of an autoencoder are trained.(b) Next, a diffusion model is trained with the help of the encoder from the autoencoder.The conditioning is optional and is not included for unconditional networks.(c) The trained encoder, diffusion model, and decoder are combined into an LDM and used to generate samples, possibly given optional conditioning.Additional details can be found in Fig. 3 and TableI.

FIG. 3 .
FIG. 3. A schematic showing the internal components of the LDM in greater detail, including (a) in the autoencoder and (b) in the UNet of the diffusion model.
tions, we train both a conditional DM and an unconditional DM to provide more context on network behavior.Both DMs uses the same AE for conversion between pixel space and latent space.The DMs use a UNet architecture 57 with three internal layers, respectively using(192, 384, 768) channels, and two residual blocks per layer.Attention blocks are used in the 384 channel layers, 768 channel layers, and the center of the UNet.The DM noise schedule uses 1,000 diffusion steps and is linear.The only difference between the unconditional and conditional DM is the input layer-the unconditional network uses three channels, and the conditional network has an additional four channels, three of which correspond to the compressed observation in latent space and the last of which corresponds to a compressed binary mask corresponding to observed pixels.The conditional DM is exposed to several different masks during training, including box-style and the FC-style observations, in order to increase robustness and potentially accuracy 40 .The same conditional DM is then used to produce both box-style and FC-style samples.

FIG. 4 .
FIG. 4. (a)-(c) Streamwise cross sections of u at y = 960 m from ground truth, test data.(d)-(f) Same as above, except for samples from an unconditional LDM.

First, in orderFIG. 5 .
FIG. 5. Streamwise cross sections of u at y = 960 m for (a) a ground truth sample, (b) the box observation created from the ground truth sample, and (c) one sample from the conditional LDM, given the observation.(d) The mean from n = 100 conditional LDM samples.(e) The standard deviation from n = 100 conditional LDM samples.All subpanels display a white dashed line as a reference to the mask.
FIG. 6. (a)-(e) Same as Fig. 5, except for the FC mask.(f) An external estimate of the conditional standard deviation for the observation in panel (b), which is discussed in greater detail in Section IV C. (g)-(l) Corresponding top-down views at the height of the scanning lidar for the data in panels (a)-(f).

FIG. 7 .
FIG. 7. (a)-(c) Vertical profiles of horizontally averaged velocity components from training data, testing data, unconditional LDM samples, and FC-mask conditional LDM samples.Averages are shown as solid lines, and ±1 standard deviations are shown as faded lines.(d) Kinematic fluxes of the same data.
FIG. 8.The probability distribution function for training data, testing data, unconditional LDM samples, and conditional LDM samples for (a) u, (b) v, and (c) w.

3 FIG. 9 .
FIG. 9.The turbulence spectra of u at z = 90 m for training data, testing data, unconditional LDM samples, and conditional LDM samples.
FIG. 11.Rank histogram for u at the turbine nacelle, given 10-member LDM FC ensembles for 360 observations.
33 are commonly used.Here, we assess sample diversity by simply examining standard deviation, which is a valuable measure for physics-based problems.In Sec.IV A, we assessed LDM sample diversity by taking a single observation, e.g., the one in Fig 6a, and calculating the standard deviation of u across 100 conditionally generated samples (Fig 6e).This conditional standard deviation was calculated using samples from the LDM for a given observation.Alternatively, by applying tools such as fully connected neural networks or stochastic estimation 22 to our training dataset, it is possible to estimate what the conditional standard deviation should be for a given observation, without the need to generate any samples.This function has no awareness of the LDM or the samples it generates, and as such, it serves as an external check on the sample-calculated conditional standard deviation.Following Hassanaly et al. 22 , we obtain an external estimate (Fig 6f, l) of the conditional standard deviation for the observation in Fig 6a, g by training a fully connected neural network.We provide more details on the external estimate methodology in Appendix C. From the side (Fig 6f), this external estimate agrees well with the LDM sample standard deviation (Fig 6e), but it fills in weaknesses that arise from calculating the sample standard deviation with a finite number of samples.From the side, the two standard deviations agree well in magnitude everywhere: the lowest grid cells, the observation network, and above the capping inversion.Both estimates also show larger variance just upwind of the meteorological mast and above the scanning lidar, a behavior that arises due to the presence of a coherent structure.When viewed from the top however (Fig 6l), the external estimate disagrees with the sample standard deviation.This discrepancy arises because the LDM here treats observations as a soft constraint, whereas the external estimate treats observations more like a hard constraint.As such, the external estimate shows a near-zero standard deviation in pixels where observation are available.From both the side and the top, the external estimate clearly shows that the sensing network has an impact on the flow reconstructions close to the observations, up to a maximum reach of roughly 150 m in any direction.By comparing the two standard deviations, we build confidence that the LDM doesn't show major deficiencies in terms of sample diversity.This result complements the rank histogram analysis, suggesting that indeed the LDM is only slightly underly diverse.
FIG. 12. Vertical profiles of root mean squared error for (a) u, (b) v, and (c) w along the centerline of the domain at the meteorological mast (x = 1440 m), the upstream edge of scanning lidar measurements (x = 795 m), and a location upstream of all observations (x = 480 m).

FIG. 15
FIG. 15. (a)-(f) Isometric views of vortices identified as contours of the Q-criterion 66 with a value of 0.0003 m s −1 .
FIG. 18.A distinct (a) ground truth and (b) FC observation of u from the one shown in Fig. 6.The (c) mean prediction and (d) standard deviation of prediction from 100 LDM samples are shown.(e)-(h) Four distinct LDM samples that are conditioned on the observation.
FIG. 22. Final MSE training loss versus number of trainable parameters for the FC mask for the first moment estimate (left) and the second moment estimate (right).

TABLE I .
Additional parameters used during training of the autoencoder and the diffusion model.
of 3D convolutions.This network is much smaller than the original LDM used in Rombach et al.37, https://github.com/CompVis/latent-diffusion,which, for example, had six internal 63.Finally, diffusion model researchers are investigating methods for quicker sampling 64 , which could potentially open the road to real-time flow reconstruction.In summary, the results indicate that diffusion models are a powerful class of machine learning algorithm that is worthy of further exploration in turbulence research.forpublication, acknowledges that the U.S. Government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this work, or allow others to do so, for U.S. Government purposes.number of open-source Python libraries, including numpy ?), Xarray ?, and matplotlib 65 .Thank you to Hristo Chipilski for helpful discussions regarding data assimilation, and Michael Kuhn for AMR-Wind support.Additionally, thank you to the editor and the reviewers in helping craft this manuscript.