Reservoir simulation faces challenges in computational efficiency and uncertainty management for largescale assets. This study presents an integrated framework combining the connection element method (CEM) and data space inversion with variable controls (DSIVC) for efficient history matching and optimized forecasting of reservoir performance. CEM reduces the computational cost of numerical simulation while retaining accuracy. DSIVC enables direct production forecasting after history matching without repeated model inversion. The CEM–DSIVC approach is applied to two reservoir cases. CEM efficiently constructs reservoir models honoring complex geology. DSIVC mathematically integrates production data to reduce uncertainty and parameter space. Without repeated forward simulation, optimized forecasts are obtained under different control strategies. Compared to conventional methods, CEM–DSIVC achieves reliable uncertainty quantification and optimized forecasting with significantly improved efficiency. This provides an effective solution to overcome limitations in simulating and managing uncertainty for largescale reservoirs. The proposed approach leverages the complementary strengths of CEM and DSIVC, synergistically improving reservoir modeling, management, and decisionmaking. This integrated datadriven framework demonstrates strong potential as an advanced tool for efficient field development planning and optimization.
I. INTRODUCTION
Reservoir closedloop optimization is a critical component in field development, with the reservoir system as the primary object.^{1} It combines numerical reservoir simulation to accurately reproduce the full lifecycle of field exploitation and aids development planning. Conventional numerical optimization generally uses appropriate optimization methods and optimization algorithms to find the optimal production system, but this process requires repeated numerical simulation calculations. Conventional numerical optimization entails low computational efficiency and prolonged duration, preventing efficient calculation.^{2,3} Therefore, it is imperative to explore reservoir closedloop optimization control methods from two perspectives history matching and production optimization through novel pathways.^{4,5}
Reservoir numerical simulation is an important tool to reproduce the full lifecycle of reservoir exploitation, forecast reservoir production, and adjust field management strategies.^{6} Commonly used numerical methods include finite difference,^{7} finite volume,^{8} and finite element methods.^{9,10} To ensure accuracy, these methods often require extensive grid refinement of the computational domain, resulting in poor convergence and difficulties in history matching.^{11,12} Grid coarsening can improve efficiency but leads to loss of flow paths and reduced accuracy.^{13} Boundary element^{14,15} and meshless methods^{16,17} have seen some reservoir applications. However, boundary elements are often limited to singlephase flow equations and relatively homogeneous reservoirs. Meshless methods, such as extended finite volume method (EFVM),^{18} can reduce the difficulty of discretizing reservoir domain and form richer internode connections to reduce the grid orientation effect.^{19} However, these methods still require a large number of nodes to achieve sufficiently high accuracy. Zhao et al.^{20} introduced the connection element method (CEM) combining generalized finite difference theory and datadriven flownet method. This derives flow parameters satisfying local mass conservation in the meshless framework. CEM calculate pressure on the coarsenode system, but transform the calculation of advectiondominated transport equations to onedimensional (1D) connection element between two neighboring nodes, which can be regarded as a multiscale meshless method and achieve a good balance of computational accuracy and efficiency. This forms richer connectivity and retains more flow structures.
Although the CEM can achieve accurate and efficient dynamic prediction, uncertainty in geological information still poses predictive challenges.^{20} History matching against observed data is needed to obtain more accurate geological parameters.^{21} Among optimization algorithms, gradientbased ones converge faster but require adjoint methods to compute gradients.^{22} However, due to the limited number of reservoir parameters, analytical gradients are difficult to obtain via adjoint methods and the process is extremely complex and incompatible with reservoir simulators.^{23} Compared to gradientbased methods, gradientfree algorithms construct approximate gradients for computation and are more portable and easier to couple with commercial simulators.^{24} However, they require extensive simulation runs and are prohibitively expensive for large models. Currently, more popular are ensemble methods like ensemble Kalman filter (EnKF)^{25,26} and ensemble smoother with multiple data assimilation (ESMDA).^{27,28} However, they do not avoid direct inversion of geological parameters, but are more efficient than gradientfree methods. However, the high computational cost remains due to the large number of prior models runs required. In recent years, many researchers have applied machine learning to build proxy models approximating timeconsuming simulations to improve efficiency.^{29} However, such models lack physical meanings and may deviate considerably from reality when production changes. Moreover, the intricate training process also impedes field application.^{30}
Constructing reservoir models consistent with historical production data through fluid flow numerical simulation is often computationally demanding. Existing history matching approaches still have limitations when applied to largescale complex reservoirs. Sun et al.^{31,32} proposed a new class of the data assimilation method called data space inversion (DSI). It constructs a data space spanned by simulation results from a large set of initial reservoir models with differing petrophysical properties. A proxy model is established by parameterizing the highdimensional data space. The proxy model is calibrated by history matching using mathematical optimization without repetitive numerical simulation runs. Unlike machine learning techniques, the proxy honors the physical relationships and weights between the truth case and prior models. DSI computes posterior production forecasts through stochastic maximumlikelihood principles without inverting reservoir models, thus avoiding prohibitive computational costs. For giant hydrocarbon fields, singleflow simulation runs are extremely timeconsuming, so iterative history matching via extensive simulation is infeasible. This makes DSI and other direct forecasting techniques attractive. DSI provides reliable uncertainty quantification and forecasting efficiency as demonstrated in channelized and Gaussian permeability fields under oil–water or gas–oil displacement.^{31} The previously discussed DSI and forecastcentric techniques were premised on the idealized assumption of a “perfect model,” ignoring any mismatches between simulations and actual production data during calibration. Jiang et al.^{33} used coupled coarsescale simulations and PCA (Principal Component Analysis)parameterized error representations to generate accurate posterior forecasts. However, the original DSI formulation lacks the capability for production optimization after history matching.^{34,35}
In the above DSI framework,^{31} specifying altered welloperating strategies following history matching necessitates recalibrating the entire prior ensemble of reservoir models via extensive flow simulations, incurring tremendous computational costs.^{36} To address this limitation, Jiang et al.^{37} developed data space inversion with variable controls (DSIVC) for prediction with userdefined well controls after history matching. Like prior DSI, DSIVC does not generate posterior models. Unlike existing DSI, DSIVC does not require resimulation when different matched control sets are specified. This is a key advantage of DSIVC over current data space techniques. However, simulating the prior models is hugely timeconsuming for DSIVC, especially for largescale models where running thousands of simulations is impractical. Therefore, the targeted coupling of the CEM and DSIVC proposed in this work is necessary and significant. The CEM is expected to drastically reduce simulation time and enables a fast largescale data space inversion and reservoir management. This combined approach leverages the advantages of both methods to achieve efficient and reliable uncertainty quantification and optimized field development planning.
In summary, CEM provides fast numerical simulation, while DSIVC integrates production data for reducing uncertainty without repeated inversion. By combining the strengths of both methods, CEM–DSIVC establishes an advanced workflow for efficient history matching and optimized forecasting of largescale reservoirs. This integrated approach is expected to achieve reliable uncertainty quantification and optimized production planning with low computational requirements.
II. METHODOLOGY
A. A brief introduction to CEM
Traditional gridbased reservoir simulation methods determine intergrid connections based on grid topology and calculate transmissibility using average grid properties. However, the grid topology defines but also constrains the connections. CEM uses a new approach to discretize the computational domain (i.e., meshless connection system).
The meshless connection system is a network structure as shown in Fig. 1. From a flow perspective, CEM discretizes the domain into interconnected geometrical entities (i.e., connection elements) represented by line segments. The endpoints of connection elements are defined as nodes, while connection elements are interconnected through the nodes. In Fig. 1, a set of connection elements sharing common nodes (red markers) is extracted from the connection system.
In CEM, a connection element has two parameters: transmissibility and connection volume. Taking twophase oil–water flow as an example, the detailed calculation of transmissibility and connection volume of a connection element is explained.
Assuming rock and fluid are incompressible and neglecting capillary force, the governing equation of twophase flow can be given as
 Mass conservation equations$ \u2207 \u22c5 ( \rho i k k r i \mu i \u2207 p i ) + q i = \u2202 ( \varphi \rho i S i ) \u2202 t.$
1. The calculation of connection transmissibility
2. The calculation of connection volume
Equation (13) has been proven good initial approximation for the connection volumes of connection elements, which may be updated in historymatching procedure.
Figure 2 shows an example of how the CEM model characterizes a rectangular oil reservoir. Figure 2(a) is a schematic diagram of a fivespot fourproduction rectangular oil reservoir. Figure 2(b) is a schematic diagram of node distribution, and Fig. 2(c) is a connection diagram of the CEM model. The CEM separates the calculation of the pressure equation and saturation equation into two steps. First, it calculates the pressure diffusion equation based on the nodes in the meshless system and the seepage characteristics parameters between the nodes, so as to obtain the average pressure of the control volumes of each node. Then, according to the average pressure of the control volumes of each node, the upstream flux of each connection element is estimated, so that the Buckley–Leverett saturation convection equation can be efficiently and accurately calculated in parallel on the connection element.^{20} More details on the solution process and parameter settings are provided in the Appendix.
B. Improved dataspace inversion with varying controls
1. The dataspace inversion with varying controls formulation
The DSIVC program is designed to enable production forecasts for userspecified well control settings in the posthistorymatch period. In DSIVC, well control is incorporated as a variable within the data space, and we consider the prescribed well settings as additional (specified) data to be matched during the data assimilation process.
2. Reparameterization methods
In this paper, simultaneous perturbation stochastic approximation (SPSA) algorithm is adopted for solving this problem. Because the gradient of SPSA deviates slightly from the real gradient, the posterior forecast derived by SPSA can also approximately quantify uncertainty. In addition, the application of SPSA can avoid over fitting phenomenon. This paper will introduce the reason about algorithm selection in Sec. II C.
C. Closedloop production optimization control method
For complex and uncertain petroleum reservoir dynamics, realtime optimized control based on current oil–water well production is crucial. This study performs history matching with the stochastic CEM–DSIVC approach to obtain a representative model, whose inversion results effectively characterize uncertainties. Combining multimodel optimization, well rates are optimized to pursue optimal performance while reducing sensitivityinduced variability, ensuring reliability.
Before determining J(u), it is essential to carry out DSIVC. We rely on the same collection of prior simulations, which span N_{cycle} phases of variable controls during the forecast duration. In this closedloop procedure, this does not undertake any further flow simulations. For every cycle, denoted as k, the well controls are set accordingly.
D. The CEM–DSIVC workflow
In summary of the descriptions in Secs. II A–II C, the proposed CEM–DSIVC workflow for efficient history matching and production optimization is summarized as the following steps:

Generate N_{r} initial reservoir connection models based on geological data and initial well controls u^{o}. Compute initial production dynamics using CEM to generate d_{full}.

Operate the field for historical period and first k − 1 stages in the forecasting period (only historical period for k = 1). Provide observed data d_{obs}.

Construct DSIVC model as described in Sec. II. Set initial guess of control variables u^{0} for the following N_{stage} − k+1 stages and SPSA parameters.

At iteration l, we resimulate initial models with CEM using each control set u_{spec} and $ d spec = [ u spec T , d obs T ] T$. Apply Eq. (17) to compute posterior forecast and objective function $ J ( u l )$.

if $ J ( u l ) > J ( u l \u2212 1 )$, set $ u l = u l \u2212 1$, and go to step 7. Else, reduce step size and return to step 4.
 End optimization when meeting convergence criteria:$  J ( u l ) \u2212 J ( u l \u2212 1 )  / J ( u l ) \u2a7d 1.5 \xd7 10 \u2212 3.$
Through solving the historymatching mathematical model, CEM–DSIVC obtains the transmissibility and connection volume for each connection element. This reduces the DSIVC matching parameter dimensionality. Seen from the workflow, theoretically, CEM in the workflow can reduce both computational cost of forward simulations and the number of degree of freedoms for history matching and optimization. DSIVC is an efficient data assimilation method that can integrate production data to reduce reservoir uncertainty. By combining DSIVC with the fast CEM reservoir simulation, CEM–DSIVC is an advanced novel closedloop optimization workflow that can perform reservoir management under uncertainty with high computational efficiency. The CEM–DSIVC method demonstrates high accuracy in history matching and uncertainty quantification, while requiring much lower computational cost compared to conventional methods. Therefore, CEM–DSIVC provides an effective and practical solution for closedloop reservoir optimization and uncertainty management.
III. NUMERICAL EXAMPLES
A. Case 1: Heterogeneous waterflooding reservoir model
A twodimensional heterogeneous reservoir example was established to verify the method. The permeability distribution of the model is shown in Fig. 3. Based on sequential Gaussian simulation, 200 initial CEM models were generated to reflect the uncertainty of the reservoir. Figure 4 shows the transmissibility and connected pore volume of one of the CEM models. The reservoir contains four water injection wells and nine production wells. All parameters of the reservoir models are the same except for transmissibility and connected pore volume. The total production time of the reservoir is 2500 days. All injection and production wells are controlled by constant bottomhole pressure. The first 1500 days of observed data are used for history matching, and the last 1000 days of data are used for production forecasting.
As mentioned above, DSIVC is a type of the data assimilation method. To further verify the effectiveness of the CEM–DSIVC method proposed in this paper, the Eclipse simulator and the CEM method are compared in the simulation section, and the mainstream multimodel data assimilation methods ESMDA and DSIVC are compared in the history matching and optimization methods. Figures 5 and 6 show the history matching and forecasting results of oil production rate and water cut by CEM–DSIVC, CEM–ESMDA, and ECLIPSE–DSIVC methods using the initial reservoir models and the truth case model. It refers to a combination of the CEM method and ESMDA, similar to the CEM–DSI method. ECLIPSE–DSIVC refers to the combination of eclipse and DSIVC methods, replacing the CEM method with ECLIPSE. The black dashed line separates the history stage and the forecast stage. The red dashed line represents the truth case model data, and the red dots represent observed data. The observation error is taken as 10% of the true model simulated values. The blue dashed line represents the average value calculated by the posterior model data. The inverted production forecast values can effectively match the production forecast of the truth case model.
In this paper, the oil price and water disposal cost are set as $ r o = 70$ USD/STB, $ r w = 3$ USD/STB, and $ r w i = 2$ USD/STB. The annual interest rate b is set to 0.1. For this case study, the operating conditions remain unchanged for the first 1500 days. The last 1000 days are divided into ten steps. The bottom hole pressure (BHP) ranges of the production wells are set to 200–400 Bar, and the bottom hole pressure (BHP) ranges of the injection wells are set to 400–600 Bar for the ten steps. Closedloop production optimization is performed using CEM–DSIVC, CEM–ESMDA, and ECLIPSE–DSIVC according to the above settings. Figure 7 shows the optimized BHP scheduling of the production and injection wells by CEM–DSIVC. The box plots of cumulative oil production and net present value before and after optimization are shown in Fig. 8, displaying the results of the prior and three optimized methods. It can be clearly seen from the figure that closedloop optimization significantly increases the cumulative oil production and net present value for all three methods. Compared to the previous models, all three methods significantly reduce uncertainty. The computational accuracy and time consumption comparisons of the three methods are shown in Table I. In this case study, one CEM forward simulation takes about 0.8 s, and one ECLIPSE forward simulation takes about 2.8 s. The computational time of CEM–ESMDA is about 850 s (mainly because 1000 CEM calculations are needed, including one prior model run and four update iterations). For the CEM–DSIVC method, 200 forward simulations of the prior ensemble take about 250 s, and the DSIVC takes about 57 s. The main computational time of ECLIPSE–DSIVC is the prior ensemble runs by ECLIPSE, which takes about 610 s. The results show that the CEM–DSIVC method can accurately predict reservoir production dynamics with relatively low computational cost.
B. Case 2: Real reservoir model
To test the methods, a practical waterflooding development case was selected. The reservoir model is constructed as shown in Fig. 9. There are high uncertainties in this reservoir system. First, the CEM model was obtained based on the geological data of the reservoir. Then, 200 initial datadriven models were generated by Gaussian sequential simulation, one of which CEM models is shown in Figs. 10(a) and 10(b). The total number of production wells in this reservoir is 17, and the total number of injection wells is 11, with a total production time of 4324 days. Here, the first 2010 days were set for history matching, and the following 2314 days were used for production forecast validation. The prior production data of field oil production total (FOPT) and field water cut (FWCT) were calculated as shown in Figs. 11(a) and 11(b). The black dashed line separates the history stage and the forecast stage. The red dashed line represents the truth case model data, and the red dots represent observed data. The observation error was taken as 10% of the true model simulated values. The blue dashed line represents the average value calculated by the posterior model data. It can be seen that there are some deviations between the prior production data and the actual observed production data, but the prior data can still cover the actual production data. The posterior production dynamics by the three methods for history matching can all better match the actual observed production data, and the production forecast can effectively capture the future variation laws of the reservoir production (Figs. 11 and 12).
The final production scheme of oil and water wells was used as the preoptimization plan. Optimization methods were then applied to optimize the well operating regimes over the next 600 days. Figure 13 shows box plots of the field cumulative oil production and NPV before and after optimization. The information reflected in the figure indicates that reservoir uncertainty was reduced through optimization. In addition, the optimized field cumulative oil production increased markedly compared to preoptimization. At the end of closedloop optimization, the average cumulative oil production rose from 2.51 × 10^{6} m^{3} before optimization to 2.84 × 10^{6} m^{3} after optimization. The average NPV increased from 184 million USD before optimization to 202 million USD after optimization. The computational accuracy and time consumption comparisons of case 2 are shown in Table II. In this case, one CEM forward simulation takes about 9.8 s, but one ECLIPSE forward simulation takes about 752 s. The computational time of CEM–ESMDA is about 4960 s (mainly because 500 CEM calculations are needed, including one prior model run and four update iterations). For the CEM–DSIVC method, 100 forward simulations of the prior ensemble take about 980 s, and the DSIVC takes about 61 s. The main computational time of ECLIPSE–DSIVC is the prior ensemble runs by ECLIPSE, which takes about 75 200 s. This computational cost would be prohibitively expensive for largerscale models or a greater number of prior model evaluations. This highlights the importance of the work presented in this paper. For large oilfields, the methods described here can substantially improve computational efficiency while still ensuring accuracy.
IV. RESULTS
Throughout the whole paper, several key conclusions can be obtained below:

This paper integrates the connection element method (CEM) and data space inversion with varying controls (DSIVC) for efficient history matching and production optimization in water reservoirs.

CEM significantly reduces the computational cost of forward reservoir simulation. DSIVC avoids direct inversion of geological parameters, improving calculation efficiency. The combined CEM–DSIVC approach drastically decreases the computational burden, enabling closedloop optimization for largescale reservoirs.

The proposed approach is demonstrated on two reservoir cases. For the first heterogeneous waterflooding model, compared to conventional gridbased simulation, CEM–DSIVC achieves comparable history matching accuracy with much less computational time. For the second case based on a giant offshore oilfield, CEM–DSIVC efficiently optimizes the field development plan while ensuring accuracy. The results validate that CEM–DSIVC can perform rapid uncertainty quantification and optimized forecasting for largescale assets.

Future work can focus on extending CEM–DSIVC to other reservoir settings like heavy oil, compositional simulation, etc., to further improve efficiency and accuracy. Integrating economic models into the framework could also enable technoeconomic optimization.
ACKNOWLEDGMENTS
This work was supported by the National Natural Science Foundation of China (Nos. 51922007, 52104017, and 52274030).
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
Wei Liu: Conceptualization (equal); Data curation (equal); Investigation (equal); Methodology (equal); Writing – original draft (equal). Yunfeng Xu: Data curation (equal); Visualization (equal); Writing – review & editing (equal). Xiang Rao: Funding acquisition (equal); Writing – review & editing (equal). Deng Liu: Visualization (equal); Writing – review & editing (equal). Hui Zhao: Conceptualization (equal); Funding acquisition (equal); Methodology (equal); Project administration (equal).
DATA AVAILABILITY
The data that support the findings of this study are available from the corresponding author upon reasonable request.
APPENDIX: CALCULATION OF GOVERNING EQUATIONS AND CALCULATION FORMULA OF SPSA ALGORITHM
1. Fully implicit scheme
CEMFI still suffers from the problem that the number of required nodes is too large and the computational cost for actual largescale reservoirs is high. This makes it difficult to perform the history matching and effectively predict the production dynamics. Therefore, in Subsec. 2 of the Appendix, a sequential coupled scheme is given to avoid this problem.
2. Sequential coupled scheme
The saturation distribution along the 1D connection element can be calculated from Eq. (A8) when the upstream flux and the connection volume are known.