Clustering coefficients for networks with higher order interactions

We introduce a clustering coefficient for nondirected and directed hypergraphs, which we call the quad clustering coefficient. We determine the average quad clustering coefficient and its distribution in real-world hypergraphs and compare its value with those of random hypergraphs drawn from the configuration model. We find that real-world hypergraphs exhibit a nonnegligible fraction of nodes with a maximal value of the quad clustering coefficient, while we do not find such nodes in random hypergraphs. Interestingly, these highly clustered nodes can have large degrees and can be incident to hyperedges of large cardinality. Moreover, highly clustered nodes are not observed in an analysis based on the pairwise clustering coefficient of the associated projected graph that has binary interactions, and hence higher order interactions are required to identify nodes with a large quad clustering coefficient.


I. INTRODUCTION
Networks consist of nodes, representing components of a system, and relations between those nodes.When the relations are binary, they can be represented as links in a graph [1][2][3] .However in real-world systems relations often include three or more vertices, and these are called higher order interactions 4 .For example, a protein-protein interaction network can be seen as a network of binary relations, where two proteins are connected when they bind to each other, or it can be seen as a network with higher order interactions where a protein complex of χ proteins corresponds to a higher order interac-tion of cardinality χ.
Although in a first approximation real-world networks appear to be random, random networks have a smaller number of cliques than what is observed in real-world networks [1][2][3] .Indeed, the average clustering coefficient of a random graph, measuring the density of triangles 5 (the smallest possible clique), decreases linearly as a function of the number of nodes in the graph.On the other hand, the average clustering coefficient of real-world networks is larger and approximately independent of N 6 .Because of this observation, more realistic models for real-world networks have been developed that are based on a hierarchical network 7 or a small-world network structure 5,8 .
0][11][12] define a clustering coefficient that measures the degree of local transitivity, and corresponds with quantifying clustering of nodes in the projected graph associated with a higher order network.0][11][12][13] do not capture the density of the shortest cycles in hypergraphs.
In this Paper, we propose an alternative observable for clustering in hypergraphs that quantifies the density of the shortest possible simple cycle.The shortest simple cycle of a hypergraph is a quad.In a bipartite representation of a hypergraph, where nodes and hyperedges represent the two parties of the bipartite graph, a quad is a closed path of length four consisting of an alternating sequence of two nodes and two hyperedges.5][16] , but there are also some notable distinctions.For example, as we show here, the quad clustering coefficient is more effective in quantifying the density of quads in a hypergraph than coefficients defined previously in the literature.After a comparison with these previous works, we study clustering of quads in random graphs and real-world networks.
The paper is structured as follows.In Sec.II, we define hypergraphs and introduce the notation used in this paper.In Sec.III, we define the quad clustering coefficient and compare this coefficient with similar coefficients studied in the context of bipartite graphs.In Sec.IV, we derive exact expressions of the ensemble average of the quad clustering coefficient in a random hypergraph model.In Sec.V, we compare the results of Sec.IV with real-world hypergraphs and discuss notable distinctions between real-world networks and random graphs.In Sec.VI, we extend the quad clustering coefficient to directed hypergraphs, and make a corresponding study for realworld networks.Conclusions are given in Sec.VII, and the Papers ends with several Appendices containing technical details on the calculations in this Paper.

II. PRELIMINARIES ON HYPERGRAPHS
A nondirected, hypergraph is a triplet H = (V , W , E ) consisting of a set V of N = |V | nodes, a set of W of M = |W | hyperedges, and a set E of links.We denote nodes by roman indices, i, j ∈ V , and hyperedges by Greek indices α, β ∈ W .The set of links E consists of pairs (i, α) with i ∈ V and α ∈ W .We say that the hypergraph is simple when each pair (i, α) occurs at most once in the set E .
A simple, nondirected hypergraph can be represented by an incidence matrix of dimensions N × M that is defined by Consequently, a hypergraph can also be represented as a bipartite graph whose vertices are the nodes and the hyperedges of the hypergraph.Figure 1 shows an example of a hypergraph represented as a bipartite graph and an incidence matrix.For simplicity, we often make no distinction between the hypergraph H and its representation I.
We define the network observables that we use in this Paper.The degree of node i ∈ V is defined by and we use the vector notation ⃗ k(I) ≡ (k 1 (I), k 2 (I), . . ., k N (I)) to denote the sequence of degrees of the hypergraph I. Analogously, we define the cardinality of a hyperedge α by and the sequence of cardinalities is ⃗ χ(I) ≡ (χ 1 (I), χ 2 (I), . . ., χ M (I)). ( As a hypergraph is a graph with higher order interactions, we also consider the degrees that determine the number of hyperedges of cardinality χ that are incident to node i.In (6) δ n,m , with n, m ∈ N, represents the Kronecker-delta function.We denote the number of hyperedges incident to node i, excluding those with cardinality 1, by the so-called modified degree Lastly, we define the neighbourhood set consisting of nodes that are incident to the hyperedge α that is connected to the node i.
When χ α (I) = 2 for all α ∈ W , then I represents a graph.In this case, we can also represent the graph in terms of the adjacency matrix A with off-diagonal entries and zero-valued diagonal entries, A ii = 0. We say that the graph is simple when A i j ∈ {0, 1}.
Given a hypergraph, we can define the so-called projected graph by the adjacency matrix A proj with entries (10)   where Θ(x) is the Heaviside function, i.e., Θ(x) = 1 when x > 0 and Θ(x) = 0 when x ≤ 0. Note that this map is surjective, as a projected graph can correspond with multiple hypergraphs.

III. QUAD CLUSTERING COEFFICIENT: DEFINITION AND MOTIVATION
For simple graphs with pairwise interactions determined by the adjacency matrix A, the clustering coefficient of a node with degree k i (A) ≥ 2 is given by 5 where T i (A) is the number of triangles incident to node i, and is the maximum possible number of triangles incident to a node with degree k i (A).Hence, the clustering coefficient C clustering coefficient valid for hypergraphs.To this aim, we represent a hypergraph as a bipartite graph, see Fig. 1.In this bipartite representation, there exist no triangles, and instead the cycle of shortest length is a quad consisting of two nodes and two hyperedges, see the motif illustrated in magenta in Fig. 1 for an illustration of the quad.Specifically, the quad is a simple cycle of four links forming an alternating sequence of nodes and hyperedges.

A. Definition of the quad clustering coefficient
In this Section, we define the quad clustering coefficient C q i (I) of a node i in a hypergraph.Let i be a node that is connected to two or more hyperedges of cardinality two or higher, i.e., k * i (I) ≥ 2. We define the quad clustering coefficient of i by where is the number of quads incident to node i, with and where ) is the maximal possible number of quads that a node with degrees {k i (I; χ)} χ∈N can have.In Appendix A we show that the maximal number of quads can also be expressed by which makes it evident that q max is fully determined by the set {k i (I; χ)} χ∈N of degrees associated with node i.If k * i (I) < 2, then C q i (I) = 0, as the number of quads incident to a node with a degree less than two equals zero.Note that the formula for the maximal possible number of quads, q max , assumes that both the degree of node i and the cardinalities of the hyperedges connected to i are given.Also, note that the quad clustering coefficient is a density, i.e., C q i (I) ∈ [0, 1], and in the example of Fig. 2, The quad clustering coefficient C q i has two useful properties.First, for fixed degrees k i (I; χ), the quad clustering coefficient is a linear function of Q i .Second, the proportionality factor is such that C q i ∈ [0, 1], and C q i = 1 is attained when the number of quads around the node i is maximal.As will become evident, these properties do not hold for clustering coefficients of bipartite graphs considered previously in the literature.
Note that quads quantify the multitude of ways neighbouring nodes interact with each other, and in simple graphs we need higher order interactions to have multiple interaction paths.In the case of simple graphs (i.e., all hyperedges have cardinality 2 and for each pair of nodes there is at most one hyperedge connecting them) the quad clustering coefficient is zero, as the only way to create multiple interactions between two nodes is through multiple edges, which are absent when the graph is simple.
In the next two Subsections, we compare the quad clustering coefficient with two other clustering coefficients for bipartite graphs, namely, Lind's clustering coefficient 14 in Sec.III B and Zhang's clustering coefficient 15 in Sec.III C. As we will see, Lind's and Zhang's clustering coefficients are not functions of Q i , except when k i = 2, and in the latter case Lind's and Zhang's clustering coefficients are nonlinear functions in Q i .8][19][20][21] , but since these are significantly different from the quad clustering coefficient we do not discuss them here.Specifically, the clustering coefficients in Refs. 17,18pply to nodes in standard networks without higher order interactions, the clustering coefficient in Ref. 19 has a denominator that does not depend on the cardinalities of the hyperedges incident to the considered node, and the coefficients in Ref. 20,21 do not count the number of quads.

B. Lind's clustering coefficient
In Ref. 14 , Lind, González, and Herrmann define a clustering coefficient by where + q iαβ (I) I iα I iβ (19)   with For simplicity we call C Lind The difference between the formulas for C Lind i (I) and C q i (I), given by Eqs. ( 13) and (18), respectively, is in the definition of the maximal possible number of quads.For Lind's clustering coefficient, q Lind i,max is the sum of the existing quads q i and the number of ways (χ α (I) − η iαβ (I))(χ β (I) − η iαβ (I)) that the remaining edges can be combined to form quads.In general, the number q Lind i,max overcounts significantly the number of possible quads.For example, in Fig. 2 q Lind i,max = 3, even though q max = 2.
Another notable difference between the quad clustering coefficient and Lind's clustering coefficient is that the former is a linear function of Q i , while the latter is, in general, not a function of Q i .An exception is when k i = 2, in which case Lind's clustering coefficient is a function of Q i , but this function is nonlinear.This feature is illustrated in the upper panel of Fig. 3 that plots Lind's clustering coefficient as a function of the quad clustering coefficient for a node of degree 2 that is connected to a hyperedge with cardinality χ α and a hyperedge with cardinality χ β .The solid lines in Fig. 3 are obtained by taking the limit χ α → ∞ with the ratio r = χ β /χ α > 1 fixed, yielding the function where q = Q i /(χ α − 1) ∈ [0, 1] (see Appendix B).We observe a strong nonlinearity in C Lind (q) for large values of χ α .Indeed, as shown in Fig. 3(a), for q below one and large enough values of χ α , it holds that C Lind (q) ≈ 0, and for q = 1 it holds that C Lind (q) = 1, which can be recovered from Eq. ( 21) by taking the limit χ α → ∞.
For nodes with a degree k i > 2, Lind's clustering coefficient, is not a function of Q i , contrarily to the quad clustering coefficient, as q Lind i,max depends on all q iαβ , with α, β ∈ W .For the simplest case of k i = 3, we illustrate this feature in the lower panel of Fig. 3.The circles and squares denote C Lind i for two different assignments for q iαβ , q iαγ , and q iβ γ , as detailed in Appendix C. As Fig. 3(b) shows, the two curves for C Lind i are different for different prescriptions on the q's indicating that C Lind i is not a function of Q i .

C. Zhang's clustering coefficient
In Ref. 15 , Zhang et al. introduce the clustering coefficient + q iαβ (I) I iα I iβ (23)   is the maximal possible number of quads.We call Zhang's clustering coefficient.Note that Zhang's clustering coefficient can also be written as 16

C
Zhang i which is known as the Jaccard similarity coefficient 22 .
Comparing C (I) = 2/3 for (c).Like Lind's clustering coefficient, for nodes with degree k i = 2 Zhang's clustering coefficient is a nonlinear function of Q i .Indeed, taking the limit χ α → ∞ while keeping r = χ α /χ β > 1 fixed, we get for q ∈ [0, 1] (see Appendix B).We illustrate this function in the upper panel of Fig. 3.Note that Zhang's clustering coefficient is not normalised, as C Zhang (1) = 1/r, and more generally For nodes with degrees is not a function of Q i , as q Zhang i,max depends on q iαβ for all α, β ∈ W .

IV. AVERAGE QUAD CLUSTERING COEFFICIENT FOR RANDOM HYPERGRAPHS
In this Section, we determine the average quad clustering coefficients for random hypergraphs.First, in Sec.IV A we derive the ensemble averaged clustering coefficient in random hypergraph models with regular cardinalities, i.e., χ α (I) = χ for all α ∈ W .For these models we obtain compact expressions for the ensemble averaged quad clustering coefficient in terms of the model parameters.Subsequently, in Sec.IV B we deal with models that are biregular in the cardinalities, i.e., χ α (I) ∈ {χ 1 , χ 2 }, and, as will become evident, the calculations in biregular models are significantly more difficult than those in models with regular cardinalities.

A. Regular cardinalities
We consider three random hypergraph models with regular cardinalities, i.e., for which χ α (I) = χ for all α ∈ W .The three models are distinguished by the fluctuations in their degrees k i (I).In the χ-regular ensemble, considered in Sec.IV A 1, the degrees are unconstrained; in the (k, χ)regular ensemble, considered in Sec.IV A 2, the degrees are regular, i.e., k i (I) = k for all i ∈ V ; lastly, in the ( ⃗ k, χ)-regular ensemble, considered in Sec.IV A 3, the degrees are prescribed by the sequence ⃗ k, i.e., k i (I) = k i for all i ∈ V .

χ-regular ensemble
In the χ-regular ensemble the probability of drawing a hypergraph with incidence matrix I ∈ {0, 1} NM is given by with the normalisation constant N χ as derived in Appendix D 1.
The average quad clustering coefficient where ∑ I is a sum over all possible incidence matrices I ∈ {0, 1} NM , is given by (see Appendix D 1 for a derivation)  (markers) are plotted as a function of the number of quads Q i incident to a node i (the scaling factor on the x-axis is chosen to make the variable's range [0, 1]).Upper Panel: node i has degree k i = 2 and is connected to two hyperedges α and β , with cardinalities χ α = 10 and χ β as indicated in the legend.Lines denote the functions given by Eqs.(21) and (25) for C Lind , respectively.Lower Panel: node i has degree k i = 3 and interacts with hyperedges α, β and γ, of cardinalities χ α = 15, χ β = 20, and χ γ = 25, respectively.Circles and squares represent values of C Lind for different values of q iαβ , q iαγ , q iβ γ , obtained from two different prescriptions, i.e. uniform and biased, as explained in Appendix C).In the lower panel, lines are a guide to the eye.
Taking the limit of large N while keeping the mean node degree fixed, and thus finite, we obtain Note that the average quad clustering coefficient decreases as 1/N with the order of the hypergraph, implying that the density of quads vanishes in the limit of infinitely large, sparse, hypergraphs.For large values of χ, but still χ ≪ N, we get the simple formula stating that the average density of quads equals the cardinality χ divided by the number N of nodes.

(c, χ)-regular ensemble
In the (c, χ)-regular ensemble the probability assigned to a hypergraph with incidence matrix I is defined by where In Appendix D 2 we derive the average quad clustering coefficient for this model in the limit N ≫ 1 with fixed values of c and χ, and with M = (c/χ)N.Neglecting subleading order corrections, we find for the average quad clustering coefficient the expression In the limit of large values of k and χ, we recover Eq. ( 32), indicating that in this limit the average clustering coefficient is independent of the degree distribution.However, at finite k and χ the average clustering coefficient depends on the degree fluctuations, as (34) differs from (31).

( ⃗ k, χ)-regular ensemble
In the ( ⃗ k, χ)-regular ensemble the probability assigned to incidence matrices I is given by where Neglecting subleading order terms, the average quad clustering coefficient is given by (see Appendix D 2) where the overline denotes the mean value with 36), we find, respectively, the Eqs.( 34) and (31).Hence, the formula (36) generalises Eqs. ( 34) and (31).
Notice that the first term in Eq. ( 36) diverges when the degree distribution p deg (k) has a diverging second moment, indicating that the average clustering coefficient of random hypergraphs with diverging second moments decreases slower than 1/N as a function of N.This results is compatible with what is known for random graphs, as the average number of cycles of finite length diverges with the second moment of the degree distribution (see Equation (9) in Ref. 23 ).

B. Biregular cardinalities
Having studied in detail the case with regular cardinalities, including the effect of degree fluctuations, we now analyze how fluctuations in the cardinality affect the average quad clustering coefficient.We focus on the simplest case of biregular ensembles, where M 1 hyperedges have cardinality χ 1 and the remaining M − M 1 have cardinality χ 2 .In this case, the probability of incidence matrices I ∈ {0, 1} NM takes the form where as before N χ 1 ,χ 2 is the normalisation constant.
In Appendix E, we show that the average clustering coefficient, defined by is given by where M 2 = M − M 1 and we introduced the functions and We have not been able to simplify the expression (41)-(43) further, not even in the sparse limit.Hence, although models with degree fluctuations are analytical tractable, as shown in Sec.IV A 3, it is significantly more difficult to deal with models with heterogeneous cardinalities.
We understand each term in Eq. ( 41) as follows: the first and last terms consider quads consisting of two hyperedges with the same cardinality, and the middle term considers the case where the two hyperedges have different cardinalities.

V. QUAD CLUSTERING COEFFICIENT IN REAL WORLD HYPERGRAPHS
Having established a theoretical understanding of quad clustering coefficients in random hypergraphs, we focus now our attention on the quad clustering coefficient in realworld hypergraphs.To this aim, we build hypergraphs out of six datasets, which are related to Github, Youtube, NDCsubtances, food recipes, Wallmart, and crime involvement.As detailed in Table I, the real-world hypergraphs have diverse topologies: their order ranges from N ≈ 10 3 to N ≈ 10 5 , their mean degree ranges from k ≈ 3 to k ≈ 60, and their mean cardinality ranges from χ ≈ 3 to χ ≈ 10 [see Appendix F for more detailed information about these data sets].

A. Mean quad clustering coefficient
The mean quad clustering coefficient is a real number C q (I) ∈ [0, 1] that quantifies the density of quads in the hypergraph represented by I.In Figure 4, we compare the mean clustering coefficients C q (I real ) for the six   q (I real ) (unfilled, circles) in real-world hypergraphs, and average, mean clustering coefficients ⟨C q (I)⟩ (filled, squares) in random hypergraphs with prescribed degree and cardinality sequences ⃗ k(I real ) and ⃗ χ(I real ).Estimates of ⟨C q (I)⟩ are based on 100 hypergraph realisations, and error bars show the error on the mean, whenever they are larger than the marker size.The dashed line represents the prediction Eq. ( 31) for χ-regular hypergraphs with χ = 5.9 and c = 20/χ, which are, respectively, the average cardinality and mean degree of all hyperedges and nodes in all real-world datasets.
canonical hypergraphs under study, represented by I real , with those of the configuration model 24 with a prescribed degree sequence ⃗ k(I real ) and cardinality sequence ⃗ χ(I real ) (see Appendix G for a description of the algorithm used to generate hypergraphs from the configuration model).The results in Fig. 4 reveal that the quad clustering coefficients of realworld networks are significantly larger than the average clustering coefficient ⟨C q (I)⟩ of the corresponding configuration models (⟨C q (I)⟩ ≈ 0.10C q i (I real ), see Table I).Hence, the density of quads in real-world networks is higher than what is ex-pected in the configuration model, similarly to previous findings for clustering coefficients in networks with pairwise interactions, see, e.g., Ref. 2 .Similar conclusions can be drawn from comparing Lind's and Zhang's clustering coefficients between real-world and random networks (see Table I).However, the corresponding values of Lind's and Zhang's clustering coefficients are one order of magnitude smaller than the quad clustering coefficient, consistent with the behaviour of the clustering coefficients as a function of the number of quads as shown in Fig. 3 and discussed in Sec.III.

B. Distribution of quad clustering coefficients
As real-world hypergraphs exhibit a larger number of quads than expected from random models, we investigate the fluctuations in the quad clustering coefficient.We quantify the fluctuations of the quad clustering coefficient by its distribution Figure 5 shows the distribution P(C q ; I real ) for the six realworld hypergraphs under study.We highlight a few noteworthy features of these plots.Firstly, a significant proportion of nodes possess a near zero quad clustering coefficient, viz., between 50-70 % in the Hypergraphs (a)-(d) and over 90% in the Hypergraphs (e)-(f).Secondly, for the remaining nodes the distribution of C q i is broad.This latter feature stands in contrast with the average distribution ⟨P(C q ; I)⟩ in the corresponding configuration model with prescribed degree sequence ⃗ k(I real ) and cardinality sequence ⃗ χ(I real ), generated by a standard stub-joining algorithm 25 , also plotted in Fig. 5. Thirdly, the hypergraphs in Fig. 5 exhibit a peak at C q ≈ 1, which is most clearly visible in the NDC-substances hypergraph (a) and the Github hypergraph (hypergraph (d)).
As discussed in Sec.III, quad clustering can also be quantified with the Lind and Zhang clustering coefficients.As shown in Fig. 6, the peak at C q ≈ 1 also appears when quantifying quad clustering with the Lind clustering coefficient or the Zhang clustering.However, the distributions P(C Lind ; I real )  (10).Note that the distributions P(C q ; I real ) show a peak at C q = 1, while the distributions P(C pi ; A  I. and P(C Zhang ; I real ) have a larger peak at the origin, while the number of nodes with an intermediate value (not zero or one) is smaller.This result is consistent with the nonlinearity observed in Fig. 3. Indeed, since the C Lind and C Zhang clustering coefficients are nonlinear, nodes accumulate at values C Lind ≈ 0, 1 and C Zhang ≈ 0, 1, and hence these clustering coefficients are less effective at discriminating nodes based on their density of quads.
Importantly, disregarding for now Hypergraph (f) on which we come back later, the peak at C q (I) ≈ 1 peak is not captured by the pairwise clustering coefficient evaluated on the corresponding projected graphs represented by A proj .Indeed, as shown in the inset of Figure 5, the where A proj is the adjacency matrix of the projected graph as defined in (10), does not exhibit a peak at large valuees.Hence quad clustering captures a characteristic distinct to hypergraphs and that is not captured by pairwise clustering coefficients.
As shown in Fig. 5, Hypergraph (f), exhibits clustering properties that are different from those of the other networks.Specifically, Hypergraph (f) exhibits a peak at 1 in the distribution of pairwise clustering coefficients of the projected graph, and does not have a peak at 1 observed in the distribution of quad clustering coefficients.To understand this peculiar property of Hypergraph (f), we examine the network motifs formed by the nodes i for which it holds that both C q i < 0.5 and C pi i > 0.8 (a total of 38, 520 nodes out of the 88, 860 satisfy this condition).We have found two type of structures among such nodes: In particular, 75% of the nodes have ∑ ∞ χ=3 k i (I; χ) = 1, and hence their quad clustering co-   In this Subsection, we make a study of the topological properties of nodes that have a large quad clustering coefficient First, we address the correlations between C q i (I real ) and the modified degree k * i (I real ), as defined in Eq. (7).We consider the modified degree k * i instead of the degree k i , as by default hyperedges with unit cardinality do not contribute to the quad clustering coefficient.In Fig. 8 we present scatter plots containing all the pairs (k * i (I real ),C q i (I real )) for the six canonical real-world hypergraphs that we consider in this Paper, one marker for each node in the hypergraph.The red dashed line is a fit to the scaling relation C q ∼ (k * ) −β and it shows the decreasing trend of the quad clustering with the modified degrees.This demonstrates that highly clustered nodes have on average lower degrees than nodes with small quad clustering coefficients.Nevertheless, up to modified degrees k * i ≈ 100 there exist nodes with C q i (I) ≈ 1, and hence real-world hypergraphs contain highly clustered nodes that have large degrees.This result is surprising, as the denominator of the quad clustering coefficient increases fast as a function of k i , see Eqs. ( 13) and ( 17), hence one may have expected that the highly clustered nodes with C q i (I) ≈ 1 consist exclusively of nodes with small modified degrees.
This results is confirmed by Fig. 9 that compares the distribution of the modified degrees k * i sampled uniformly from the set V of hypergraph nodes with the distribution of nodes that have a clustering coefficient equal to one.As expected, the modified degree of highly clustered nodes with C q i = 1 are concentrated on small values of the modified degrees.Surprisingly, however, in the real-world hypergraphs (a), (d) and (f), highly clustered nodes can have modified degrees as large as k * i ≈ 100.As an illustration, for the NDCsubstances network, Fig. 9(a), the maximum value of k * i amongst nodes with C q = 1 is k * i = 192.This is unexpectedly large, as it implies that the 192 hyperedges connected to node i form a fully clustered configuration.
To further describe the topological properties of the neighbourhood sets of highly clustered nodes, we analyse the cardinalities of the hyperedges that are incident to a highly clustered node.We expect that strongly clustered nodes (C q i (I) ≈ 1) have neighbouring nodes with small cardinalities, as the denominator in the quad clustering coefficient increases fast as a function of the cardinalities of the neighbouring nodes.To quantify fluctuations in the cardinalities of hyperedges, we define the joint distribution of degrees and cardinalities of randomly selected links connecting nodes with hyperedges.Its marginal distribution quantifies the fluctuations of the cardinalities of hyperedges at the end point of a randomly selected link, and excluding nodes with cardinality one.In Fig. 10, we compare the distribution W * (χ; I) with the related distribution W * (χ|C q = 1; I) defined on nodes with a quad clustering coefficient equal to one.The latter distribution is defined by where (52) Interestingly, Fig. 10 reveals that nodes with C q i (I) = 1 can have a large cardinality χ ≈ 2000.This highlights that the neighbourhood sets of highly clustered nodes can have a large number of quads, as they contain hyperedges with large cardinality.Comparing Figs. 9 and 10, we observe that support of the distribution W * (χ; I real ) is in most cases equal to the support of W * (χ|C q = 1), while the support of the distribution P(k * ; I real ) is significantly smaller than the support of P(k * |C q = 1).Hence, highly clustered neighbourhoods are more biased towards low degree nodes than towards nodes of low cardinality, which is consistent with the formula (17) for the maximal number of quads a node can have.

VI. QUAD CLUSTERING COEFFICIENT FOR DIRECTED HYPERGRAPHS
In this Section we define a quad clustering coefficient for directed hypergraphs and we analyse its properties in real-world directed hypergraphs.

A. Preliminaries on directed hypergraphs
hyperedges, and the sets E in ⊂ V × W and E out ⊂ V × W of directed inlinks and outlinks, respectively.Both inlinks and outlinks consist of pairs (i, α) with i ∈ V and α ∈ W , albeit the former represents q i (I real )) of all nodes i ∈ V (I real ) in the canonical, real-world hypergraphs.The lines are a fit to C q ∼ (k * ) −β with the fitted values for β and their 95% confidence intervals equal to 0.17 ± 0.01 (a), 0.15 ± 0.01 (b), 0.06 ± 0.01 (c), 0.24 ± 0.01 (d), 1.2 ± 0.2 (e), and 0.72 ± 0.02 ( f ).Panels represent different real-world hypergraphs, as explained in the caption of Fig. 5.
links directed from a hyperedge to a vertex, while the latter represents links directed from a vertex to a hyperedge.We represent simple, directed, hypergraphs with a pair of incidence matrices I ↔ ≡ (I → , I ← ) defined by and Figure 11 illustrates different ways of representing hypergraphs with an example.
The out-degree and in-degree of node i ∈ V are defined by and we also use the notations and for their sequences.Analogously, we define the outcardinality and in-cardinality of hyperedge α ∈ W by and we also use the corresponding sequences ⃗ χ in (I → ) and ⃗ χ out (I ← ).In addition, we define the modified outand incardinalities excluding the stubs used to connect to a given node i. Lastly, we define the set of hyperedges incident to the node i as the union of the two hyperedge neighbourhood sets ∂ out i (I → ) and ∂ in i (I ← ) where and To each directed hypergraph we can associate a projected, directed graph of order N, such that there exists a directed edge that points from i to j in the projected graph whenever there exists a hyperedge α ∈ W such that (i, α) ∈ E out and ( j, α) ∈ E in .The adjacency matrix of the projected graph is given by for all i, j ∈ V , where Θ(x) = 0 if x ≤ 0 and Θ(x) = 1 for = 0 for all i ∈ V , then we call the projected graph simple.
Note that there exists a one-to-one correspondence between simple, directed hypergraphs H dir and pairs I ↔ of incidence matrices, while the mapping between H and A proj is not oneto-one, and hence the projected graph is a coarse-grained representation of the hypergraph.

B. Clustering coefficient for directed graphs with pairwise interactions
We review the definition of the pairwise clustering coefficient for directed graphs, as introduced in Ref. 26 .
Let A be the adjacency matrix of a simple, directed graph, such that [A] i j = 1 whenever there exists a directed link that points from i to j, and [A] i j = 0 whenever such a link is absent.The directed clustering coefficient of node i is defined by 26 where counts the number of directed triangles centered on node i, and where 66) is the maximum possible number of directed triangles incident to a node with a given total degree k tot i (A) ≡ ∑ N j=1; j̸ =i (A i j + A ji ), and a given degree of symmetric links k ↔ i (A) ≡ ∑ N j=1; j̸ =i A ji A i j .The denominator in the definition of the pariwise clustering coefficient is independent of the directionality and the symmetry (i.e., whether it is unidirectional or bidirection) of the links between node i and its neighbours.Additionally, for simple and nondirected graphs (A i j = A ji ,) the clustering coefficients in Eqs.(11) and (64) are equal.
Following the example of pairwise clustering coefficients, we define in the next Subsection a quad clustering coefficient for directed hypergraphs, which is an extension of the corresponding clustering coefficient for nondirected hypergraphs.

C. Quad clustering coefficient for directed hypergraphs
We define a quad clustering coefficient for directed hypergraphs.Similarly to the pairwise clustering coefficient for directed graphs C pi↔ i , we require that the quad clustering coefficient counts the number of directed quads incident to the node i of a hypergraph, and we require that for nondirected hypergraphs the directed quad clustering coefficient equals the quad clustering coefficient defined in Eq. (13).
We define the quad clustering coefficient C q↔ i (I ↔ ) of a node i in the directed hypergraph represented by I ↔ , for which where is the number of directed quads centred on the node i, and we have used the notation The denominator q ↔ max ({X iα (I ↔ ), I ↔ iα } α∈∂ i ) denotes the maximum possible number of directed quads incident to node i, given the sets of modified inand out-cardinalities of the hyperedges α ∈ ∂ i , and the corresponding values of I ↔ iα .We omit the explicit mathematical expression for q ↔ max here, as it is elaborate, but it can be found in Appendix H.If ∑ α∈∂ i (I ↔ ) (χ in α,i + χ out α,i ) < 2 then C q↔ (I) = 0. To illustrate how quads are counted by The first term (I ↔ (I ↔ ) ⊺ ) 2 ii counts the total number of paths of length 4 starting and ending in i.The second and third terms subtract off the contributions to the first term arising from paths returning to site i via backtracking paths of length one and two, respectively.The prefactor 1/2 corrects for double counting arising from counting the same path with the opposite orientation.
Next we turn to the denominator of the right-hand side of (67).Similarly to the pairwise, directed, clustering coefficient C pi↔ i (A), the denominator q ↔ max ({X iα (I ↔ ), normalizes the directed quad clustering coefficient C q↔ i (I ↔ ) such that its value is independent of both the directionality and symmetry (i.e., unidirectional or bidirectional) of the links that connect node i to its neighbouring hyperedges.This means that if two nodes i and j have the same motif of inlinks, as shown in Panel (c) of Fig 12, then the quad clustering coefficient of the two nodes, C q↔ i and C q↔ j , must be the same, even if the motifs of outlinks are different.
Note that for nondirected hypergraphs the directed quad clustering coefficient, defined by Eq. ( 67), equals the quad clustering coefficient for nondirected hypergraphs, defined by Eq. ( 13) (see Appendix I).

D. Clustering in directed, realworld, hypergraphs
In Sec.V we found that the density of quads in nondirected real-world hypergraphs is large compared to the density of quads in the configuration model.In this Section, we investigate whether an analogous phenomenon can be observed in directed hypergraphs.Specifically, we build directed hypergraphs from three data sets related to the DNC-email network, the English thesaurus, and the Human metabolic pathway (see Appendix F for more detailed information about these data sets).
In Table II we present the mean quad clustering coefficient for the three real-world hypergraphs under study, and compare their values with the corresponding directed configuration models, which have the prescribed degree sequences ⃗ k in (I ← real ) and ⃗ k out (I → real ), and the prescribed cardinality sequences ⃗ χ in (I → real ) and ⃗ χ out (I ← real ).We observe that the real-world networks have significantly larger directe quad clustering coefficient, up to 500 times larger than those of corresponding random models.Hence, the density of directed quads in real-world directed hypergraphs is significantly higher than their density in the corresponding configuration models, consistent with earlier findings for nondirected hypergraphs.
Furthermore, we determine the distribution of directed, quad clustering coefficients in real-world hypergraphs defined by P(C q↔ ; ) real , and present the results in Fig. 13.Also in directed real-world hypergraphs, we observe a a peak at C q↔ ≈ 1 in the quad clustering distribution.In the specific examples considered, the peak is most pronounced in the DNC-email hypergraph.We have introduced a clustering coefficient, called the quad clustering coefficient, that captures the multiplicity of interactions between neighbouring nodes in (non)directed hypergraphs with higher order interactions.We have shown that for random hypergraphs the mean quad clustering coefficient has a value near zero, while for real-world networks it is one order of magnitude larger taking values ranging from 0.01 to 0.34, which is a smaller range than the one observed for pairwise clustering coefficients in real-world networks 27 ; we note however that the distribution of quad clustering coefficients is supported on the whole [0, 1] range of values.Hence, the quad clustering coefficient describes a feature of real-world networks that is not captured by the current random hypergraph models.
We have determined the average quad clustering coefficient in several random hypergraph models.We have obtained exact expressions for models with fluctuating degrees and fixed cardinalities.Our analysis shows that it is significantly more difficult to deal with fluctuating cardinalities.
Analysing the distribution of quad clustering coefficients in real-world networks we have found that there exist a significant fraction of nodes that take its maximal value.Analysing the topological properties of the neighbourhood sets of these highly clustered nodes we have found that they can exhibit large degrees, and their neighbouring nodes can have large cardinalities.
The results of this paper show that the configuration model is not a good null model for real-world networks with higher order interactions.This in itself is not a surprising result, as the configuration model is also not a good model for networks without higher order interactions, see e.g., discussions in Ref. 6 .However, what is surprising is that the distribution of quad clustering coefficients exhibits a peak at its maximal value.This result has, to the best of our knowledge, no counter part in systems without higher order interactions.
This raises the question of what type of random hypergraph model can generate statistical properties similar to those observed in real-world networks with higher order interactions, see e.g., Ref. 5,7,8 for related questions in networks without higher order interactions.Another pertinent question concerns the implications of nodes with high quad clustering coefficients on dynamical processes, such as, percolation.Since highly clustered nodes do not appear in random hypergraphs, they may play an important role in dynamical processes governed on real-world networks.We used the databases NDC-substances 28 , Youtube 29,30 , Food recipe 31 , Github 29,32 , Crime involvement 29 and Wallmart 33 as the real-world undirected hypergraph.And as a directed hypergraph, we used DNC-email 29 , English thesaurus 34 and Human metabolic pathways 35 database.And we implemented computation algorithms in Fortran to compute nondirected and directed quad clustering coefficients in a hypergraph, available from https://github.com/Gyeong-GyunHa/qch.

Appendix A: Alternate expression for the denominator of the quad clustering coefficient
In this Section we show that q max , defined by Eq. ( 16), can also be expressed by Eq. (17).
We can express Eq. ( 16) To proceed, we introduce the definitions and q i,χ (⃗ χ(I); I) ≡ ∑ α;χ≤χ α (I) and where Θ(x) is the Heaviside function as defined below Eq. (10).We illustrate this configuration in Panel (a) of Fig. 14 for the case of Q i (I) = 6.Seven nodes, viz., i and six other nodes, are incident to the two hyperedges γ and β , yielding Q i = 6.

Biased case
This the opposing case where quads are fully assigned to one hyperedge, before assigning them to the other hyperedges.In this case, we get and In Panel (b) of Fig. 14 we illustrate the biased case when Q i (I) = 6.
Appendix D: Average quad clustering coefficent for random hypergraph models with regular cardinalities 3]25 , we derive in this Appendix the expressions ( 31), ( 34) and (36) for the average quad clustering coefficients of random hypergraph models with regular cardinalities.In Appendix D 1, we derive Eq. (31), and in Appendix D 2, we derive Eq. (36).Since ( 34) is a special limiting case of (36), we do not discuss it separately.

χ-regular ensemble
We derive the formula (31) for the average quad clustering coefficient of hypergraphs drawn from the ensemble P χ (I) as defined in Eq. (27).

a. Normalisation constant of P χ
The normalisation constant in Eq. ( 27) is given by as each hyperedge is connected to χ nodes that are randomly selected from the N available options.

b. Average clustering coefficient
Substituting the definition of the quad clustering coefficient, Eq. ( 13), into the expression (28) for the ensemble average clustering coefficient yields where we have used the notation Performing the sum over all the entries I jα of the incidence matrix I yields Expanding the power expressions in (D4) and integrating over the Ξγ variables we get Further, expanding the power in (D5) and integrating over q reduces the expression into Lastly, dividing (D6) by the normalisation constant (D1) gives Eq. ( 31), which we were meant to derive.

χ-regular with prescribed degree sequence
We derive the formula (36) for the average quad clustering coefficient of the χ-regular hypergraph ensemble with a prescribed degree sequence ⃗ k, as defined in Eq. (35), in the limit N → ∞ with fixed ratio and where The calculations are facilitated by rewriting the expression for P ⃗ k,χ in the following form where M ⃗ k,χ is the new normalisation constant that depends on the value of p * ∈ [0, 1].When p * = 1/2, we recover the expression Eq. (35).Introducing a value p * ̸ = 1/2 is a calculation trick that does not affect the average value of observables, such as ⟨C q i ⟩ ⃗ k,χ , but it does alter the normalisation constant.
In Appendix D 2 a, we determine the normalisation constant M ⃗ k,χ , and in Appendix D 2 b we calculate the average clustering coefficient.

a. Normalisation constant of P ⃗ k,χ
From the definition of M ⃗ k,χ as the normalisation constant of P ⃗ k,χ , as defined in Eq. (D9), it follows that where we have expressed the Kronecker delta functions as integrals in order to get an expression that factorises in the I variables.Summing over the I-variables we get We set p * = ρ * /N and take the limit N → ∞ for fixed M/N to obtain where O(1/N) represents a subleading order term that decays as ∼ 1/N for large valus of N. The constant ρ * ∈ R + is an arbitrary constant that determines the normalisation constant but disappears in the final expression of the average clustering coefficient.
Identifying the term ∑ M α=1 e −i Ξα in the exponent, and introducing the Dirac distribution We determine the integrals by expressing the exponentials in terms of their Taylor series, 2π 0 Using the expressions (D15) and (D16) in Eq. (D14) gives 1) .(D17) Using M = µN and making the transformation ι → µN ι, we get the saddle point integral In the limit of N → ∞, the saddle point dominates, and we get the expression where ι * and ι * solve the saddle point equation and where with H the Hessian of the function Ψ evaluated at the saddle point.Using Eq. (D21) in (D20) we obtain the final expression As will become evident, the prefactor Φ ⃗ k,χ cancels out with an identical prefactor that appears in the numerator of the derivation for ⟨C q i (I)⟩ ⃗ k,χ , which we do in the next Section.

b. Average clustering coefficient
Using the definition of the clustering coefficient, given by Eq. ( 13), and the fact that in this model all cardinalities are fixed to χ, i.e., χ α = χ, we get We represent the Kronecker delta functions in Eq. (D24) as integrals, and then sum over the I-variables, yielding where ∏ ′ (g,ε) is a product over all pairs (g, ε) ∈ V × W , but excluding {(i, α), (i, β ), ( j, α), ( j, β )}.Setting p * = ρ * /N and taking the limit N → ∞, we get for all g / ∈ {i, j} that and for g ∈ {i, j} we get this simplifies into 1) .
Integrating over kn and Ξξ and using the formula 1) .
In the limit of N → ∞, the saddle point dominates.However, since the exponent is identical to the one appearing in Eq. (D18) for M ⃗ k,χ , we get the simpler expression which is identical to Eq. (36) in the main text.A comparison between Eq. (D30) and the average quad clustering coefficient of large numerically generated random graphs shows an excellent agreement (results not shown).If all terms of the degree sequence are equal (i.e., it is (c, χ)-regular hypergraph), then Eq. (D30) becomes

Appendix E: Average quad clustering coefficient for biregular cardinalities
We obtain the Eq.(41) for the ensemble averaged quad clustering coefficient of the model (39) with biregular cardinalities.By assumption, there are M 1 hyperedges with cardinality χ 1 and M 2 = M −M 1 hyperedges with cardinality χ 2 .The hyperedges and nodes are connected randomly, given their prescribed cardinalities.Therefore, the normalisation constant in Eq. ( 39) is given by Using the definition of the clustering coefficient, Eq. ( 13), in the definition of the average clustering coefficient, Eq. ( 40), yields Representing the Kronecker delta functions with integrals, we get Summing over the I variables, and subsequently integrating over the qn and Ξξ variables, we get the expression Lastly, integrating over the variables û and v yields where W is an integer valued function that is independent of the symmetry of the links (i, α) and (i, β ) (as determined by I ↔ iα and I ↔ iβ ).In what follows we specify W X iα , X iβ for the four possible scenarios, viz., (  In this case W X iα (I ↔ ), X iβ (I ↔ ) ≡ 4 min X iα (I ↔ ) ∪ X iβ (I ↔ ) .(H2) Figure 16 shows two examples, one for which C q i = 0 and another one for which C q i = 1.
2. χ in α,i = χ out α,i and χ in β ,i ̸ = χ out β ,i We define the minimum cardinality by χ min ≡ min χ in α,i , χ in β ,i , χ out β ,i and the maximum value by χ max ≡ max χ in α,i , χ in β ,i , χ out β ,i . In case the three values χ in α,i , χ in β ,i and χ out β ,i are distinct, we use the notation χ med for the median value.Using this notation, we can express 2χ min + 2χ med , if min X iα ∪ X iβ ̸ = χ in α,i and X iα ∪ X iβ = 3, 2χ min + 2χ max , otherwise.
(H3) Fig. 17 shows examples with C q i = 0 and C q i = 1 for each the three above cases.For the case with χ in α,i ̸ = χ out α,i and χ in β ,i = χ out β ,i an analogous expression applies with the two indices α and β swapped.As in Appendix H 2, we use the notation χ min ≡ min χ in α,i , χ out α,i , χ in β ,i , χ out β ,i and χ max ≡ max χ in α,i , χ out α,i , χ in β ,i , χ out β ,i .In addition, if X iα ∪ X iβ = 3, then a third medican value exists, which we denote by χ med .Using this notation , we get 3χ min + χ med , if X iα ∪ X iβ = 3 and min (X iα ) = min X iβ , 2χ min + 2χ med , if X iα ∪ X iβ = 3 and either max (X iα ) = min X iβ or max X iβ = min (X iα ), 2χ min + χ max + χ med , if X iα ∪ X iβ = 3 and max (X iα ) = max X iβ .(H4) Fig. 18 shows examples with C q i = 0 or C q i = 1 for each of the four cases mentioned in formula (H4).In this case, the four cardinalities in the set χ in α,i , χ out α,i , χ in β ,i , χ out β ,i are all different.We order them from small to large and use the notation χ smallest < χ small < χ large < χ largest so that χ smallest ≡ min χ in α,i , χ out α,i , χ in β ,i , χ out β ,i , and so forth.The expression for (a) i C W takes then the form W X iα , X iβ ≡ 2χ smallest + 2χ small , if max (X iα ) < min X iβ or min (X iα ) > max X iβ , 2χ smallest + χ large + χ small , otherwise.
(H5) Fig. 19 shows the examples of C q i = 0 and C q i = 1 for both cases in Eq. (H5).Furthermore, as the hypergraph is nondirected, χ in α,i = χ out α,i and χ in β ,i = χ out β ,i , and hence the case of Appendix H 1 applies for W and its expression is given by Eq. (H2).Using this formula we obtain Eliminating the 4 from both the numerator and the denominator in the right-hand side of (I2), we recover the quad clustering coefficient C q i as defined in Eq. ( 13), which completes the derivation.

FIG. 1 :
FIG. 1: Illustration of a hypergraph, its different representations, and the quad motif.The upper panel shows the three ways of representing a hypergraph, namely, as a bipartite graph, as an incidence matrix, and as a graph with higher order interactions.The illustrated hypergraph in the left top panel has one quad, highlighted in magenta, consisting of the hyperedges α and β and the nodes 4 and 5.The lower panel visualises the three different components of a hypergraph, namely, the set of nodes V , the set of links E , and the set of hyperedges W .

FIG. 2 :
FIG. 2: The quad clustering coefficient of a node in a few simple examples.Node i is connected to hyperedges α and β with cardinalities 3 and 4, respectively.Depending on the number of quads, C q i equals 0 (a), 1/2 (b), and 1 (c), respectively.
), we see that Zhang et al. considered yet another way of counting the maximal, possible number of quads.In the example of Fig. 2, we get C Zhang i (I) = 0 for (a), C Zhang i (I) = 1/4 for (b) and C Zhang i

FIG. 5 :
FIG.5: Distribution of quad clustering coefficients in nondirected hypergraphs.Comparison between the distributions P(C q ; I real ) of quad clustering coefficients in real-world hypergraphs (light grey histograms) and the average distribution ⟨P(C q ; I)⟩ (dark grey histograms) of the corresponding configuration model with a prescribed degree sequence ⃗ k(I real ) and cardinality sequence ⃗ χ(I real ).The estimate of ⟨P(C q ; I)⟩ has been obtained from 100 graph realisations.The inset shows the distribution P(C pi ; A proj real ) of pairwise clustering coefficients in the projected network A proj real formed from pairwise interactions obtained with the formula(10).Note that the distributions P(C q ; I real ) show a peak at C q = 1, while the distributions P(C pi ; A proj real ) do not show a peak at C pi = 1 [except for Hypergraph ( f )].The real-world hypergraphs considered are: (a) NDC-substances, (b) Youtube, (c) Food recipe, (d) Github, (e) Crime involvement and ( f ) Wallmart; see Table

FIG. 6 :FIG. 7 :
FIG. 6: Comparison of distributions of three clustering coefficients examined in the real-world hypergraphs.The light grey histograms represent the distributions of the quad clustering coefficient P(C q ; I real ).The grey bar graphs show the distributions of Lind's clustering coefficient P(C Lind ; I real ).And the dark grey histograms denote the distributions of Zhang's clustering coefficient P(C Zhang ; I real ).Panels represent different real-world hypergraphs, as explained in the caption of Fig. 5.Note the discontinuous scale on the y-axis, with a linear scale for y > 0.5 and a logarithmic scale for y < 0.5.

P 1 FIG. 9 :
FIG. 9: Distributions of degrees of highly clustered nodes in real-world hypergraphs, and comparison with the full hypergraph degree distribution.The plot shows the degree distributions P(k * ; I real ) (blue, circles) and P(k * |C q = 1; I real ) (red, squares) for the six canonical real-hypergraphs considered in this paper.The number of nodes with C q = 1 are 490 (a), 560 (b), 18 (c), 1683 (d), 12 (e), and 288 ( f ).Panels represent the different hypergraphs, as explained in the caption of Fig. 5.

1 FIG. 10 :
FIG. 10: Distributions of the cardinalities of hyperedges that are incident to a highly clustered node, and comparison with the corresponding distribution for generic nodes Comparison between the distributions W * (χ; I real ) (blue circles) and W * (χ|C q = 1; I real ) (red squares) as defined in Eqs.(50) and (51), respectively, for the six canonical real-world hypergraphs considered in this Paper.Panels represent different real-world hypergraphs, as explained in the caption of Fig. 5.

FIG. 11 :
FIG. 11:Representations of directed hypergraphs.The figure illustrates with an example the three hypergraph representations, viz., with incidence matrices, as a bipartite graph, or as a graph with higher-order interactions.
consider the example in Panel (b) of Fig. 12.In this case, Q ↔ i (I ↔ ) = 4, as the motif contains the four quads in the left column of Panel (a) of Fig 12.Alternatively, we can express Q ↔ i (I ↔ ) in terms of the number of closed paths of length 4 (see Panel (a) of Fig 12 for all possible types of closed paths of length 4) with the formula

FIG. 12 :
FIG. 12: Counting the number of directed quads incident to a node i.(a) The 16 directed quads that contribute to Q q i (I).(b) Example graph with C q,↔ i = 1.(c) Two example graphs with C ACKNOWLEDGMENTS G.-G. Ha thanks D.-S.Lee, J.W. Lee, S.H. Lee, S.-W.Son, H.J. Park, M. Ha and N.W. Landry.This work was supported by the Engineering and Physical Sciences Research Council, part of the EPSRC DTP, Grant Ref No.: EP/V520019/1.

1 FIG. 13 :
FIG. 13: Distribution of quad clustering coefficients in directed hypergraphs.The light grey histograms represent the distributions of the directed quad clustering coefficient measured in real-world hypergraphs.The grey bar graphs show the distributions of the directed quad clustering coefficient measured in the hypergraph configuration model that preserves the in-/out-degree and in-/out-cardinality sequences extracted from the real-world hypergraphs.Each plots are extracted from (a) DNC-email, (b) English thesaurus, (c) Human metabolic pathways

FIG. 14 :
FIG. 14: Illustration of the configurations of quads in the uniform and biased case as defined in Appendices C 1 and C 2, respectively, for the case Q i = 6.The yellow shaded area bounded by a dash-dotted line denotes hyperedge α; the blue shaded area bounded by a dashed line represents hyperedge β ; and the orange shaded area with a dotted border represents hyperedge γ.Panel (a): Three nodes, viz., i and two other nodes, are incident to the three hyperedges α, β , and γ, yielding Q i = 6.Panel (b):Seven nodes, viz., i and six other nodes, are incident to the two hyperedges γ and β , yielding Q i = 6.

FIG. 16 :
FIG. 16: Two motifs consisting of a node i linked with two hyperedges α and β and for which χ in α,i = χ out α,i = 1 and χ in β ,i = χ out β ,i = 2, corresponding with Appendix H 1. Left panel shows an example with C q i = 0 and the right panel has C q i = 1.

TABLE I :
Characteristics of the real-world hypergraphs considered in this Paper: number of nodes N and hyperedges M, mean degree k and mean cardinality χ, mean quad clustering coefficient C q (I real ), mean Lind's clustering coefficient C Lind (I real ), mean Zhang's clustering coefficient CZhang (I real ), the average mean quad clustering coefficient ⟨C q (I)⟩, the average mean Lind's clustering coefficient ⟨C Lind (I)⟩ and the average mean Zhang's clustering coefficient ⟨C Zhang (I)⟩ of the corresponding configuration model with fixed degree sequence ⃗ k(I real ) and cardinality sequence ⃗ χ(I real ).For more details see Appendix F.
P (C; I real )

TABLE II :
Network characteristics of the real-world directed hypergraphs: number of nodes N and hyperedges M, mean directed quad clustering coefficient C q↔ (I ↔ real ) and the average, mean directed quad clustering coefficient ⟨C q↔ (I ↔ )⟩ of the corresponding configuration model.