A heterogeneous structure of social networks induces various intriguing phenomena. One of them is the friendship paradox, which states that on average, your friends have more friends than you do. Its generalization, called the generalized friendship paradox (GFP), states that on average, your friends have higher attributes than yours. Despite successful demonstrations of the GFP by empirical analyses and numerical simulations, analytical, rigorous understanding of the GFP has been largely unexplored. Recently, an analytical solution for the probability that the GFP holds for an individual in a network with correlated attributes was obtained using the copula method but by assuming a locally tree structure of the underlying network [Jo et al., Phys. Rev. E 104, 054301 (2021)]. Considering the abundant triangles in most social networks, we employ a vine copula method to incorporate the attribute correlation structure between neighbors of a focal individual in addition to the correlation between the focal individual and its neighbors. Our analytical approach helps us rigorously understand the GFP in more general networks, such as clustered networks and other related interesting phenomena in social networks.
The generalized friendship paradox (GFP) implies that on average, your friends have higher attributes than yours. Here, the attribute could be, e.g., a scientist’s achievement in the scientific collaboration network. We derived an analytical solution of the GFP in a network with triangles where nodal attributes are correlated with each other. For this, we employed a copula method enabling to explicitly write the joint probability of correlated attributes between neighboring nodes. This result helps more rigorously understand how the triangular structure in social networks affects individuals’ perception about their neighborhood.
I. INTRODUCTION
Complex systems have been characterized in terms of graphs or networks:1–5 Nodes of the network represent elements of the system, while pairwise interactions between elements are denoted by links between nodes. Thanks to the rapid development of information-communication technology, researchers have access to large-scale digital records for human social behaviors, enabling us to study the social networks in more detail. By the empirical analyses of a variety of social network datasets, it has been shown that social network topology is heterogeneous in many aspects,6 often characterized by heavy-tailed distributions of degrees,7,8 assortative mixing or homophily,9 and community structure,10 to name a few. Such a heterogeneous structure of social networks induces a number of intriguing phenomena in complex social networks. They include dynamical processes taking place on those social networks, such as spreading and diffusion.11–14
One of the interesting phenomena induced by the heterogeneous structure of social networks is the friendship paradox (FP), which states that on average, your friends have more friends than you do.15 The FP focuses on the number of neighbors of the node, i.e., the node degree. However, nodal attributes other than the degree can also be used for the comparison of an individual to its neighbors, leading to the generalized friendship paradox (GFP).16 Candidates for nodal attributes could be either topological, e.g., betweenness centrality17 and eigenvector centrality,17,18 or non-topological, e.g., scientific achievement,16 happiness,19 and sentiment.20 The GFP states that on average, your friends have higher attributes than yours.16,21,22 The GFP has been extensively investigated not only by analyzing various empirical datasets16,19–21,23–27 but also by means of numerical and analytical approaches.18,22,28–30
Both FP and GFP have been studied at the network level as well as at the individual level. A typical approach at the network level is to compare the expected degree or attribute of a randomly chosen node to that of an end node of a randomly chosen link.15,16 On the other hand, the individual-level approach has been studied in terms of the probability that the degree or attribute of a focal node in a network is smaller than the average degree or attribute of neighbors of the focal node. This probability can be interpreted as peer pressure.31 From now on, nodal attributes indicate only non-topological ones to distinguish from degrees for the FP. Focusing on the GFP, nodes with high degree and high attribute are generically expected to have lower peer pressure than other nodes. Interestingly, it was also found by empirical analysis of scientific collaboration networks16 that some nodes with high degree and high attribute have higher peer pressure in a network with positively correlated attributes, i.e., a homophilic network,32 than expected for random counterparts. It is because when attributes of neighboring nodes are positively correlated with each other, nodes with high degree and high attribute are likely to be surrounded by neighbors with even higher degree and higher attribute than themselves. This empirical finding was supported by numerical simulations of a minimal model with tunable correlations between attributes of neighboring nodes.22
Although the analytical solution for the GFP in a network with uncorrelated attributes was obtained several years ago,22 analytical, rigorous understanding of the GFP in a network with correlated attributes has been largely unexplored due to the lack of proper mathematical tools for modeling the correlation structure between attributes of neighboring nodes. Recently, Jo et al.29 modeled the correlation structure between attributes of neighboring nodes by means of a copula method. In essence, the copula method enables to write a joint probability distribution function with a tunable correlation between variables in a tractable form.33 The copula method has been used in various disciplines, such as finance,34 engineering,35 biology,36 astronomy,37,38 and time series analysis.39,40 However, due to the complexity of the formalism for the GFP, the analytical solution could be obtained only approximately, e.g., by assuming a locally tree structure for the underlying network,29 implying that attributes of neighbors of the focal node are correlated with that of the focal node but not with each other. However, in reality, many social networks are highly clustered, i.e., abundant in triangles.1,6,41 Thus, attributes of neighbors of the focal node are expected to be correlated with each other too. This strongly calls for a more comprehensive analytical framework for the GFP in networks with triangles or clustered networks.
In this paper, we propose such a framework by employing a vine copula method particularly for multiple correlated variables,38 by which in addition to the correlation between the focal node and its neighbors, one can also consider the correlation between attributes of neighbors of the focal node. We analytically and numerically find for the exponentially distributed attributes that the peer pressure of individuals with high (low) attributes is increased (reduced) by the connections between their neighbors. By our analytical approach, we can get deeper insight into how the triangular structure of social networks affects individuals’ perception about their neighborhood, therefore better understand related phenomena observed in complex networks, such as perception biases.42 In addition, the copula method is expected to be useful for developing analytical approaches to various topics in complex systems.
We remark that for both FP and GFP, the degree or attribute of a node is compared to a set of degrees or attributes of its neighbors or a single value summarizing the set. The most common summarization is to take an average of degrees or attributes in the set, which is sensitive on a few neighbors with a very high degree or attribute. The median of the set was also proposed as it is less sensitive on such neighbors than the average.15,24,43,44 More recently, a novel summarization method in terms of the fraction of neighbors with a higher degree or attribute than the focal node has been suggested to study the effect of summarization methods on the FP.31 In our work, we focus on the most common method, namely, the mean-based method.
II. COPULA-BASED ANALYSIS
A network with nodes is considered. For non-topological attributes of such nodes, we consider an attribute distribution . For the sake of convenience, is assumed. Yet, values of can be considered too. At the individual level, the generalized friendship paradox (GFP) requires to compare an attribute of a focal node to a set of its neighbors’ attributes or a single value representing the set. As mentioned, in this work, we focus on the mean-based summarization method among others.31
The mean-based GFP is said to hold for a node when the node’s attribute is smaller than the average of its neighbors’ attributes. Precisely, the mean-based GFP holds for the node when the condition below is satisfied:16
where is the set of the node ’s neighbors and denotes the degree of the node . Then, the probability that Eq. (1) is satisfied is called the mean-based peer pressure. One can write the mean-based peer pressure of the node having degree and attribute as
where we have used the notation
Here, denotes a Heaviside step function, and is the ensemble average over . denotes the conditional joint probability distribution function (PDF) of attributes of the focal node’s neighbors given the attribute of the focal node. See Fig. 1 for the schematic diagram. For modeling , we adopt a C-vine copula to be introduced below.
Schematic diagram for the generalized friendship paradox in clustered networks. An ego with an attribute has neighbors whose attributes are , respectively. It is assumed to have the same correlation structure of attributes of neighboring nodes in every link, which is controlled by a single value of the Pearson correlation coefficient in Eq. (15).
Schematic diagram for the generalized friendship paradox in clustered networks. An ego with an attribute has neighbors whose attributes are , respectively. It is assumed to have the same correlation structure of attributes of neighboring nodes in every link, which is controlled by a single value of the Pearson correlation coefficient in Eq. (15).
We first introduce the simplest version of copula, i.e., a bivariate copula, and then a vine copula for dealing with interdependence between multiple variables. The bivariate copula means a function that joins a bivariate cumulative distribution function (CDF), denoted by , to their one-dimensional marginal CDFs, denoted by and ,33,45 such that
The bivariate PDF of and , denoted by , is then derived by
where and denote PDFs. Denoting and by and , the copula density is defined as
which carries information on the correlation structure between and .
We consider the case that a multivariate PDF with variables, i.e., for , can be written in terms of a product of bivariate copulas. For this, we choose a C-vine copula among others,38
where () denotes a conditional CDF of () when are given, and the conditional bivariate copula density carries information on the pairwise correlation between and when are given.
Now, let us consider a focal node with an attribute of and neighbors, and the attributes of those neighbors are , respectively. For the analysis of the GFP for this node, we need to model a conditional joint PDF with variables for a given ; i.e., . Using a C-vine copula in Eq. (7), one writes
For the analytical tractability, all for are assumed to have the same form as
It is also assumed that and have the same functional form and that and are not conditioned by , enabling us to write
where
Denoting and by and , we also assume that the copula density for a given has the same functional form for all ,
Then, represents the correlation structure between the attribute of the focal node, , and the attribute of its neighbor, , for . On the other hand, for represents the correlation structure between attributes of the focal node’s neighbors and for .
For modeling the pairwise copula density in Eq. (13), we adopt the Farlie–Gumbel–Morgenstern (FGM) copula density,33,37,45
If and are, respectively, CDFs of variables and , controls the correlation strength between and . We relate to the Pearson correlation coefficient (PCC) between and . The PCC between and is written as
where
and and are the mean and standard deviation of , respectively. Using Eq. (14), one gets
The upper bound of the value of is proven to be for any functional form of , which implies .46
Using the FGM copula, the case with in Eq. (13) reads
For the case with in Eq. (13), we adopt the FGM copula density for the focal node’s neighbors and ,
where indicates that and are connected to each other. Note that we do not take into account any pairs of the focal node’s neighbors that are not connected to each other. It is based on the assumption that the correlation is induced only by links between nodes.
where
Since it is not trivial to derive an exact form of Eq. (20), by assuming , Eq. (20) is expanded up to the first order of ,
We denote the size of the set as , meaning the number of links between neighbors of the focal node. Then, . We drop the subscript from hereafter.
where
and and are the Laplace transforms of and , respectively. One can obtain the mean-based peer pressure by means of the inverse Laplace transform of Eq. (23). We remark that the analytical result in Eq. (23) was derived for the arbitrary functional form of as well as for the arbitrary correlation coefficient , the range of which is, however, limited by the shape of [Eq. (17)]. We also mention that the approximations and assumptions made for the analysis may lead to the underestimation of the effects of correlations.
III. CASE STUDY: EXPONENTIALLY DISTRIBUTED ATTRIBUTES
A. Analytical solution
For demonstrating a solvable case, we here consider an exponential distribution of the nodal attribute , i.e.,
with , leading to
Note that from Eq. (17), implying that
Since
we obtain from Eq. (23)
We then take the inverse Laplace transform of Eq. (29) to obtain an analytical solution of . Using Eq. (28), the inverse Laplace transform of the first term on the right hand side of Eq. (29) is given as
Here, denotes the regularized Gamma function, where is the Gamma function and is the upper incomplete Gamma function.
The first term in the parentheses coupled to on the right hand side of Eq. (29) is written as
To take the inverse Laplace transform of Eq. (31), we utilize the following result for the inverse Laplace transform:29
The second term in the parentheses coupled to on the right hand side of Eq. (29) is written as
The term coupled to on the right hand side of Eq. (29) is explicitly written as
equivalently,
The first term in Eq. (37) is the same as Eq. (31) except for the sign, while for the second term, we utilize Eq. (32) to derive
Then, the inverse Laplace transform of the second term in Eq. (37) is obtained using Eq. (38) by setting and ,
where
Here, the subscript “fn” (“nn”) means between the focal node and its neighbor (between neighbors).
By the above solution, we can rigorously understand the effect of the attribute correlation between neighbors of a focal node on the mean-based peer pressure of the focal node. We first remark that the first and second terms on the right hand side of Eq. (40) are the same as the previous analytical solution obtained for a tree-like network with correlated attributes.29 Therefore, we focus on the third term on the right hand side of Eq. (40), representing the effect of the number of links between neighbors of the focal node (i.e., ) on the mean-based peer pressure of the focal node. As shown in Fig. 2(a), the value of is positive for and negligible for , while it is negative for . Therefore, from Eq. (40), the mean-based peer pressure can be either an increasing function of if and or if and , a decreasing function of if and or if and , or overall constant if and/or . These expectations will be tested against the simulation results in Subsection III B.
Analytical results of in Eq. (43) (a) and in Eq. (42) (b) as a function of for several values of . In panel (a), the results of , leading to , were omitted, and the black horizontal line denotes for guiding eyes.
We remark that turns out to vanish in the limit of large [see Fig. 2(a)], which can be explained by the fact that if the focal node has considerably many neighbors, the average attribute of those neighbors eventually approaches the population mean of attributes , irrespective of the value of or . Similarly, vanishes for large , except when , as shown in Fig. 2(b). The divergent with an increasing is unphysical but expected to be canceled out by the higher-order terms of . Therefore, in the limit of large , the peer pressure of the focal node having a general is essentially determined by the result for the uncorrelated case, i.e., in Eq. (41). The behavior of for large was obtained in Eq. (20) of Ref. 29.
Let us now discuss implications of the results: In a network with positively correlated attributes (), individuals having attributes higher (lower) than the population mean tend to have more (less) peer pressure when their neighbors are more connected to each other. In other words, the peer pressure of individuals with high (low) attributes is increased (reduced) by the connections between their neighbors. On the other hand, in a network with negatively correlated attributes (), we observe the opposite tendency that individuals having attributes higher (lower) than the population mean tend to have less (more) peer pressure when their neighbors are more connected to each other and vice versa.
B. Simulation results
In order to test the analytical solution in Eq. (40), we consider a random network model with tunable clustering, which is an extension of the configuration model independently suggested by Newman47 and Miller.48 For each node , the number of single links and the number of triangles are given, which are independently and randomly drawn from probability distributions and , respectively. Then, nodes are randomly chosen and connected to each other by single links or triangles incident to them. For the details of the model algorithm, see Refs. 47 and 48. We use the NetworkX package in Python for implementation.
We adopt geometric distributions for and as
where are parameters. The degree of the node is then obtained as , leading to the average degree with and .
For the simulation, we generate networks with , , and , implying . Nodal attributes randomly drawn from the exponential distribution in Eq. (25) with are assigned to nodes and then shuffled to get the Pearson correlation coefficient between neighboring attributes as some target value of .29 Here, we use , , and . For each , we randomly generate different configurations of attributes for each network. For each configuration, the mean-based peer pressure for each node is calculated as follows:
Then, we take the average of values for nodes with the same and as well as with the same over all configurations on all networks.
As we focus on the effect of on , we plot the simulation results of as a function of for combinations of parameter values of , and , , as shown in Fig. 3. As expected, results for uncorrelated attributes () are in good agreement with the analytical solution for . Numerical results of for correlated attributes () are qualitatively consistent with the analytical expectations for in Eq. (40). The deviations between simulation and analytical results for correlated cases are possibly due to approximations taken for deriving the analytical solution in Sec. II, such as the expansion of the equation up to the first-order terms of in Eq. (22). Calculation of higher-order terms of is left for future work. In addition, considering the fact that the results for correspond to the case with a locally tree structure, the effects of triangles or on can be delineated in terms of the difference between for and that for , which is denoted by in Fig. 3. We find that numerical differences of for from that for are in good agreement with the corresponding analytical solution, i.e., in Eq. (40). Finally, we remark that large fluctuations in numerical results for large are due to the scarce samples for large .
Simulation results of the mean-based peer pressure as a function of for values of (triangle), (circle), and (inverse triangle), which we denote by for clear presentation, and their differences from the results for , denoted by . clustered networks were generated with and average degree , and different configurations of attributes for each network were randomly generated using with in Eq. (25) for each value of . Error bars denote standard errors. For comparison, we plot the analytical solutions of and in Eq. (40) using , , and [Eq. (27)] as dotted, solid, and dashed lines, respectively.
Simulation results of the mean-based peer pressure as a function of for values of (triangle), (circle), and (inverse triangle), which we denote by for clear presentation, and their differences from the results for , denoted by . clustered networks were generated with and average degree , and different configurations of attributes for each network were randomly generated using with in Eq. (25) for each value of . Error bars denote standard errors. For comparison, we plot the analytical solutions of and in Eq. (40) using , , and [Eq. (27)] as dotted, solid, and dashed lines, respectively.
IV. CONCLUSION
The generalized friendship paradox (GFP) indicates that on average, your neighbors have higher attributes than yours.16 Despite successful demonstrations of the GFP by empirical analyses and numerical simulations, analytical, rigorous understanding of the GFP has been largely unexplored due to the lack of proper mathematical tools for modeling the correlation structure between attributes of neighboring nodes. Recently, an analytical solution for the probability of holding the GFP for an individual node in a network with correlated attributes was obtained using the copula method.29 Here, a locally tree structure of the underlying network was assumed for analytical tractability. However, the abundant triangles in most real social networks require a more general framework to incorporate the attribute correlation structure between neighbors of a focal individual in addition to the correlation between the focal individual and its neighbors. By employing a vine copula method, we have obtained the analytical solution for the GFP in clustered networks to find that the peer pressure of individuals with high (low) attributes is increased (reduced) by the connections between their neighbors. The analytical results were qualitatively supported by numerical simulations using the network models with tunable clustering. The deviation between analytical and numerical results could be accounted for by studying higher-order terms of the parameter controlling the correlation between attributes.
Our analytical approach can help us get deeper insight into how the triangular structure of social networks affects the GFP behavior or, more generally, individuals’ perception about their neighborhood. Hence, it can lead to better understanding of various related phenomena in social networks, such as perception biases.42
In our work, we have studied the case with the exponential attribute distribution and the Farlie–Gumbel–Morgenstern copula for the attribute correlation, whereas the reality must be more complex than our assumptions. Indeed, the attribute distributions often show heavy tails.16,49 In addition, more realistic attribute correlation structures can be studied by employing other families of copulas.33 Finally, we expect the vine copula method to be useful for modeling complex correlation structures between elements of complex systems.
ACKNOWLEDGMENTS
H.-H.J. acknowledges financial support by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2022R1A2C1007358) and by the Catholic University of Korea, Research Fund, 2021. Y.-H.E. acknowledges financial support by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (Grant No. 2020R1G1A1101950) and by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (Grant No. 2018R1A6A1A06024977).
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
H.-H.J.: Conceptualization (equal); Formal Analysis (lead); Funding Acquisition (equal); Methodology (lead); Software (lead); Validation (lead); Visualization (lead); Writing – original draft (lead); Writing – review & editing (equal). E.L.: Conceptualization (equal); Formal Analysis (supporting); Methodology (supporting); Software (supporting); Validation (supporting); Visualization (supporting); Writing – original draft (supporting); Writing – review & editing (equal). Y.-H.E.: Conceptualization (equal); Formal Analysis (supporting); Funding Acquisition (equal); Methodology (supporting); Software (supporting); Validation (supporting); Visualization (supporting); Writing – original draft (supporting); Writing – review & editing (equal).
DATA AVAILABILITY
Data sharing is not applicable to this article as no new data were created or analyzed in this study.