Interdependent networks are susceptible to catastrophic consequences due to the interdependence between the interacting subnetworks, making an effective recovery measure particularly crucial. Empirical evidence indicates that repairing the failed network component requires resources typically supplied by all subnetworks, which imposes the multivariate dependence on the recovery measures. In this paper, we develop a multivariate recovery coupling model for interdependent networks based on percolation theory. Considering the coupling structure and the failure–recovery relationship, we propose three recovery strategies for different scenarios based on the local stability of nodes. We find that the supporting network plays a more important role in improving network resilience than the network where the repaired component is located. This is because the recovery strategy based on the local stability of the supporting nodes is more likely to obtain direct benefits. In addition, the results show that the average degree and the degree exponent of the networks have little effect on the superior performance of the proposed recovery strategies. We also find a percolation phase transition from first to second order, which is strongly related to the dependence coefficient. This indicates that the more the recovery capacity of a system depends on the system itself, the more likely it is to undergo an abrupt transition under the multivariate recovery coupling. This paper provides a general theoretical frame to address the multivariate recovery coupling, which will enable us to design more resilient networks against cascading failures.
Interdependencies among networks can greatly expand the function and application areas of the network system, but also highly increase system vulnerability, which typically presents with a cascading failure. To avoid catastrophic events or to heal cascading failures as they occur, researchers have focused on designing recovery strategies. Recent studies have shown that the recovery of one network depends on the functional state of the support network (which they refer to as “recovery coupling”). If the support networks are not functional, recovery will be partly confined. However, this study overlooks the functional status of the repaired components themselves and only takes into account the limitations on the recovery measures imposed by the functional state of the supporting network, and it is specific to post-disaster recovery and does not hold for dynamic recovery. In this paper, we develop a multivariate recovery coupling model for interdependent networks based on percolation theory. Considering the coupling structure and the failure–recovery relationship, we propose three recovery strategies for different scenarios based on the local stability of nodes. We find that the supporting network plays a more important role in improving network resilience than the network where the repaired component is located. In addition, the results show that the average degree and the degree exponent of the networks have little effect on the superior performance of the proposed recovery strategies. We also find a percolation phase transition from first to second order, which is strongly related to the dependence coefficient. This indicates that the more the recovery capacity of a system depends on the system itself, the more likely it is to undergo an abrupt transition under the multivariate recovery coupling. Our studies may provide a general theoretical frame to address the multivariate recovery coupling, which will enable us to design more resilient networks against cascading failures.
I. INTRODUCTION
Interdependence is an inherent property of complex systems in the real world.1,2 A typical scenario comprises critical infrastructures in which different functional networks, such as the Internet, electrical power grids, transportation systems, and communication networks, interact with each other.3–5 The interdependencies among networks can greatly expand the function and application areas of the system, but also highly increase system vulnerability, which typically presents with a cascading failure.6–8 For example, Venezuela's power grid suffered two consecutive large-scale blackouts from March 7 to 27, 2019, which were caused by a local man-made attack.9 Consider also the 2008 blackout in North America, for which the combination of power component failures and computer networks contributed to cascading failures that eventually led to a large-scale blackout.10 Increasing catastrophic events suggest that a small localized disturbance could cause catastrophic damage in interdependence systems.11,12 To represent these interdependent systems, Buldyrev et al. initially proposed an interdependent network model based on the percolation theory.1 For interdependent networks, the dependent link can further enrich network features, and the study of these coupling networks has expanded rapidly.
To avoid catastrophic events or to heal failures as they occur, many researchers have focused on designing recovery strategies, which are typically divided into two categories, e.g., dynamic recovery13–16 and post-disaster recovery.17,18 Due to its timeliness and rapid response to risk, the dynamic recovery method, in which system component failure and recovery typically occur concurrently and form a competitive relationship, has received a substantial study. Majdandzic et al.14 and Böttcher et al.19 proposed a recovery model for interdependent networks where failure and spontaneous recovery perform concomitantly. Di Muro et al.13 developed a theoretical framework in which failed nodes in the boundary of the functional network are repaired before the network completely collapses. La Rocca et al.16 developed an emergency recovery strategy in which isolated finite clusters are reconnected to the functional giant component with a probability at each time step of the cascade of failures. Virtually, the existing recovery methods tend to repair the failed network components with a fixed ability, ignoring the fact that recovery measures are often inhibited by many factors such as resources, cost, and time. Recently, Danziger and Barabási20 developed a theoretical framework for interdependent networks to address the issue of dependence during recovery. They captured the recovery of one network depending on the functional state of the support network (which they refer to as “recovery coupling”). If the support networks are not functional, recovery will be partly confined. This analysis framework offers a theoretical basis for how recovery coupling affects a system's functionality. Virtually, this study overlooks the functional status of the repaired components themselves and only takes into account the limitations on the recovery measures imposed by the functional state of the supporting network, and it is specific to post-disaster recovery and does not hold for dynamic recovery.
However, in many real-world interdependent network systems, the spontaneous recovery or deliberate repair of the network components requires resources typically supplied by both the networks in which the repaired components are located and their supporting network.20–22 For example, in an interdependent system composed of the power grid (PG) and the communication network (CN),23 restoring failed components of CN requires that the repair crews have an electricity supply (which is supported by PG) and coordination through communications (which is supported by CN). Therefore, any absence in either or all of these supporting conditions will consequently cause recovery stunting and sometimes even recovery discontinuation. Consider also the power–transportation network,24,25 a power outage and a blocked road may delay the repair of the failed power components in a given location. Similar cases are abundant in infrastructures. In this paper, we refer to the recovery processes mentioned above as the “multivariate recovery coupling.” In addition, the dynamic recovery serves as a more practical measure to deal with cascading failures as they occur, rather than a reactive post-disaster response.13,16,26 This quick reaction helps prevent disasters and hence helps minimize losses for systems in the real world, especially for interdependent infrastructures. However, the current research on multivariate recovery coupling under dynamic recovery is limited for interdependent networks.
In this paper, we develop a percolation-based theoretical model to capture multivariate recovery coupling under dynamic recovery of interdependent networks. In our model, failed finite clusters are repaired with a time-dependent probability that is related to the instantaneous functional state of each node in the finite cluster and all supporting nodes of the finite cluster. Considering the coupling structure and the failure–recovery relationship, we develop three recovery strategies for different scenarios based on the local stability of nodes. In addition, a network resilience indicator that reflects both the level and the effectiveness of recovery is proposed. We perform both theoretical and simulation results to exemplify the phase transition properties for the random and scale-free networks. We find that the phase transition of the network changes from the first to the second order with an increasing dependence coefficient, which indicates that the more the recovery capacity of a system depends on the system itself, the more likely it is to undergo an abrupt transition under the multivariate recovery coupling. In addition, we find that the supporting network plays a more important role in improving network resilience than the network where the repaired component is located. This performs as the node with a more stable supporting node is chosen for reconnection, the better recovery efficiency. In addition, the results show that the feedback strategy (FS) performs better than the other strategies, and similar results are observed in a real-world interdependent system composed of the Western U.S. power grid and the communication network.
II. METHODOLOGY
A. Model
In this paper, we define these recovery processes where repairing the failed network component requires resources supplied by all subnetworks as the “multivariate recovery coupling.” Here, a percolation-based model is developed to capture the multivariate recovery coupling under the dynamic recovery of the interdependent networks.
1. Failure–recovery propagation
The failure mechanism for interdependent networks has been well-studied.1,27,28 Based on the study of Buldyrev et al.,1 we detail the failure–recovery propagation in our model below. For simplicity and without loss of generality, we consider an interdependent network consisting of two networks, A and B, with degree distributions and , respectively. Both networks have N nodes, and each node in network A/B is randomly linked with one node from network B/A via a bidirectional interdependent link. We denote by the time steps of the failure–recovery propagation. The initial failure is added on a fraction of nodes in network A randomly at step , and the cascading failure is triggered. Following the assumptions in Buldyrev et al.,1 these finite clusters in network A that are not linked to the giant component (GC) will be removed, and then their interdependent nodes in network B also fail. As failure goes from network A to network B, network B is broken into a giant component and several finite clusters. Before these finite clusters fail, La Rocca et al.16 reconnected these isolated finite clusters to the functional giant component with a fixation probability to delay the network crash. Above this, we performed a new emergency recovery process in which isolated finite clusters are repaired following the multivariate recovery coupling as follows:
Step 1: Each isolated finite cluster in network B is repaired with a time-dependent probability that depends strongly on multivariate recovery coupling. To better utilize recovery resources, we assume that only the finite cluster with more than two nodes is repaired.
Step 2: For each repaired finite cluster, we select a node and reconnect it to one node belonging to GC in network B. To maintain the initial network topology as much as possible, the nodes we choose for reconnection must have free links, i.e., links that already exist at the beginning but are removed later due to cascading failures.
After the above recovery process, these unrepaired finite clusters in network B will be removed, which, in turn, causes these supporting nodes in network A to fail. Then, the initial step is terminated and the next step begins. The above procedure is repeated until a steady state is reached, where no more new failure nodes occur. For simplicity, only the multivariate recovery coupling for network B is presented here. Our model can be generalized to more scenarios, such as the initial failure and multivariate recovery coupling occurring concurrently in networks A and B.
2. Multivariate recovery coupling
B. Recovery strategy
The central question of the recovery strategy is to find an optimal set of nodes for reconnection (which is defined as R-node) in both repaired finite clusters and GC. Considering the coupling structure and the failure–recovery relationship, we develop three recovery strategies for different scenarios based on the local stability of nodes, and nodes with high indicator values are preferred for reconnection.
C. Network resilience
III. RESULTS
Numerical simulations and theoretical calculations are conducted on scale-free (SF), random (ER), and regular random (RR) networks, which have been commonly used to depict real-world networks. For numerical simulations, the ER and RR networks are generated using NetworkX (https://networkx.org/), which is a Python package for the construction, manipulation, and analysis of complex networks. The SF network is generated by the configuration model35 that can build a network with a pre-defined degree sequence. For theoretical derivation, the degree distribution of ER networks is , where represents the average degree. For SF networks, the degree distribution is , and it is typically approximated as , where denotes the degree exponent and is the minimum degree. For RR networks, the degree distribution is , where stands for the number of connections of a single node. All networks used for numerical simulations contain 10,000 nodes, and simulation results are averaged over 300 realizations.
A. Phase transition properties
The phase transition properties analysis is performed under the random recovery strategy (RS), in which these nodes for reconnection are chosen randomly. In addition, to verify the theory we present, the recovery constraint is relaxed to the extent that finite clusters with arbitrary sizes are potentially recoverable. All theoretical results are derived based on Eqs. (9)–(25) given in the supplementary material.
We first calculate the size of the giant components at the steady state for the ER–ER, SF–SF, and RR–RR interdependent networks under the different dependence coefficient [Figs. 3(a)–3(c)]. We find that the network undergoes a percolation phase transition at the critical value of , in which the value of jumps from a finite value to zero. More importantly, the phase transition changes from the first to the second order with increasing . It indicates that the more the recovery capacity of a system depends on the system itself, the more likely it is to undergo an abrupt transition under the multivariate recovery coupling. In Figs. 3(a)–3(c), the dot lines and symbols represent theoretical and numerical simulation results, respectively. The results show that there is an excellent agreement between theoretical and simulation results, except for a minor difference around .
Next, we show the number of iteration steps (NOI) required for the cascading process to reach the steady state [Figs. 3(d)–3(f)]. The results show NOI peaks at as p increases, which means that the network requires many iterations to reach the steady state when p is close to . This exactly explains the minor difference between the theoretical and simulation results around , since the recovery strategies from theory and simulation differ in their practical implementation. Specifically, after each recovery measure, we theoretically obtain a new GC following the initial topology, but in the simulation, we only reconnect the repaired cluster and GC by adding a new edge.
To further validate the exact effects of the dependence coefficient on the phase transition, the phase diagrams for ER–ER, SF–SF, and RR–RR networks are shown in Figs. 4(a)–4(c). The phase diagram is divided by the critical curves into three regions, i.e., Collapse, Recovery, and Function. These regions are defined based on the network state, which is related to the relative size of GC at the steady state. Here, the orange curve represents as a function of . The region to the left of the orange curve is labeled “Collapse” because the network belonging to this area will be completely destroyed, i.e., is close to 0. The black dashed line represents the minimum p for , and the region to the right of it is labeled “Function.” Networks located in this area do not crash for any value of p, i.e., . When the network is in the “Function” state, the network remains the essential function, even when the recovery strategy is not applied. The middle region is marked as “Recovery,” in which the maximum is allowed to prevent the network from falling out completely.
The critical curve ( as a function of ) of the ER–ER network under the different average degree is shown in Fig. 4(d). We find that the curves move toward the left as increases, which corresponds to a smaller “Collapse” region in Fig. 4(a). This indicates that the bigger the value of is, the more robust it is to cascading failures for the ER–ER network. Similar experiments are performed with the SF–SF network at the different and . As shown in Fig. 4(e), for the SF–SF network, the larger the value of , the smaller the area of the “Collapse” region and the stronger the ability against cascading failures. For , the result is the opposite. For the RR–RR network [Fig. 4(f)], the larger the value of , the smaller the area of the “Collapse” region. These findings enable us to design more resilient networks against cascading failures.
B. Resilience assessment
In this section, we make a comparative analysis of the SS, CS, FS, and RS recovery strategies under the random failure mode and the targeted failure mode. In the random failure mode, we assume that the initial failure is added on a fraction of nodes in network A randomly at step t = 0. In the targeted failure mode, the targeted attacks are ordered by the node degree. Note that all of the following results are obtained based on the simulations, and all recovery strategies meet the recovery constraint proposed in Sec. II.
We first test the network resilience under the random initial failure. A performance comparison in the ER–ER network is presented between the SS, CS, FS, and RS strategies, shown in Fig. 5(a). We find that resilience performance using FS is superior compared with the other three types of recovery strategies. This validates that the supporting network plays a more important role in improving network resilience than the network where the repaired component is located. This is because the FS recovery strategy is more likely to obtain direct benefits. In addition, as shown in Fig. 5(a), when p is either very large or small, the value of R under the different recovery strategies is almost identical. This is because the network will reach the steady state within only a few iteration steps when p moves away from the phase transition point; thus, the recovery effect of the different recovery strategies is not significant.
To verify the effects of on the performance of the recovery strategies, the network resilience under different is calculated as shown in Fig. 5(b). Here, Rs is defined as an estimation of the area under the R curve in Fig. 5(a), which is used to estimate the overall resilience for the entire test set. We find that the FS recovery strategy presents superior performance compared with other strategies under the different . In addition, Rs decreases gradually with increasing , which indicates the more the recovery capacity of a network depends on the state of the network itself, the more resilient the network is to cascading failures. Going a step further, we investigate the effect of changes in on the performance of the proposed recovery strategies, shown in Fig. 5(c). Here, the size of the circles represents the value of Rs. To facilitate comparisons, all results are normalized by their maximum values. We find that the network parameters have little effect on the size of the different circles, which indicates that the superior performance of the FS is not influenced by the structural properties of networks. Similar results are observed in the ER–ER network under the targeted failure mode [Figs. 5(d)–5(f)].
Next, a comparison of the network resilience between the FS strategy and some other strategies on the SF–SF network is shown in Fig. 6. The results are similar to those of the ER–ER network; that is, the resilience performance using FS is superior compared with the other three types of recovery strategies. In addition, we find that compared to the ER–ER network, the SF–SF network exhibits different behavior under the targeted failure mode than under the random failure mode. As shown in Figs. 6(b) and 6(e), in the case of the targeted failure mode, the value of Rs is much lower than that of the random failure mode at the different . This indicates that, compared to the ER–ER network, the SF–SF network is more vulnerable under targeted attack modes. However, under targeted attacks, the FS strategy still possesses the strongest recovery capability for the network.
In addition, we compare the performance of different recovery strategies on the RR–RR network. Because the degree value of each node in the RR network is the same, the effects of the random failure mode and the targeted failure mode are identical. Here, we only provide the simulation results of performance comparison for recovery strategies under random failure mode. As shown in Fig. 7, the same results are observed for RR–RR networks, indicating that the FS strategy still has better performance.
IV. TEST ON EMPIRICAL DATA
In particular, we test the validity of our proposed methods in a real-world interdependent system composed of the Western U.S. PG and CN and the communication network (CN) used for supervisory control and data acquisition of PG.28,36 Due to a lack of data, it is difficult to establish the structure of CN and its interdependencies with PG. Considering that most real-world network systems could be described as ER and SF networks, we approximately capture the structure of PG–CN by coupling PG to ER or SF networks. Here, we assume that the ER and SF networks have the same number of nodes and the same average degree as PG, and each node in PG is randomly linked with exactly one node from the ER/SF network. We perform a comparative analysis of the proposed recovery strategies in both the PG–ER and PG–SF networks (see Fig. 8). The results are similar to those of ER–ER and SF–SF networks. When p is close to , the network resilience under the FS recovery strategy is superior compared with the other strategies. Again, we investigate the effect of changes in on the performance of the proposed recovery strategies as shown in Figs. 8(c) and 8(d). We also find that FS performs well under the different and the superiority becomes more obvious as decreases, suggesting that the more the recovery capacity of a system depends on its own internal state, the more vulnerable the system is to cascading failures.
V. DISCUSSION
In this paper, we develop a multivariate recovery coupling model for interdependent networks based on percolation theory. Considering the coupling structure and the failure–recovery relationship, we develop three recovery strategies for different scenarios based on the local stability of nodes to improve the network resilience against cascading failures. In addition, a network resilience indicator that reflects both the level and the effectiveness of recovery is proposed. We find that the network undergoes a percolation phase transition at the critical value of , and the phase transition changes from the first to the second order with increasing . It indicates that the more the recovery capacity of a system depends on the system itself, the more likely it is to undergo an abrupt transition under the multivariate recovery coupling. In addition, we find that the supporting network plays a more important role in improving network resilience than the network where the repaired component is located. This is because the recovery strategy based on the local stability of the supporting nodes is more likely to obtain direct benefits. We also find that the average degree and the degree exponent of the networks have little effect on the superior performance of the proposed recovery strategies. In addition, the results show that the FS recovery strategy performs better than the other strategies, and similar results are observed in a real-world interdependent system composed of the Western U.S. power grid and the communication network. This paper provides a general theoretical frame to address the multivariate recovery coupling, which will enable us to design more resilient networks against cascading failures. The proposed method is employed to study the simple case that the multivariate recovery coupling is only for network B. A more general scenario that the multivariate recovery coupling occurs in both the networks A and B will be studied in the future.
SUPPLEMENTARY MATERIAL
See the supplementary material for the theoretical derivation of the multivariate recovery coupling in interdependent networks and then self-consistent equations based on generating functions to capture and solve the failure–recovery process.
ACKNOWLEDGMENTS
We acknowledge the support from the National Natural Science Foundation of China (NNSFC) (Grant No. 72001213), the National Social Science Foundation of China (Grant No. 19BGL297), and the Basic Research Program of Natural Science in Shaanxi Province (Grant No. 2021JQ-369).
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
Jie Li: Conceptualization (equal); Data curation (equal); Formal analysis (equal); Methodology (equal); Software (equal); Visualization (equal); Writing – original draft (equal). Ying Wang: Funding acquisition (equal); Project administration (equal); Supervision (equal). Jilong Zhong: Formal analysis (equal); Funding acquisition (equal); Supervision (equal); Writing – review & editing (equal). Yun Sun: Supervision (equal); Writing – review & editing (equal). Zhijun Guo: Conceptualization (equal); Software (equal); Supervision (equal); Writing – review & editing (equal). Chaoqi Fu: Funding acquisition (equal); Supervision (equal); Writing – review & editing (equal).
DATA AVAILABILITY
The data that support the findings of this study are available from the corresponding author upon reasonable request.