We propose an approach to modeling large-scale multi-agent dynamical systems allowing interactions among more than just pairs of agents using the theory of mean field games and the notion of hypergraphons, which are obtained as limits of large hypergraphs. To the best of our knowledge, ours is the first work on mean field games on hypergraphs. Together with an extension to a multi-layer setup, we obtain limiting descriptions for large systems of non-linear, weakly interacting dynamical agents. On the theoretical side, we prove the well-foundedness of the resulting hypergraphon mean field game, showing both existence and approximate Nash properties. On the applied side, we extend numerical and learning algorithms to compute the hypergraphon mean field equilibria. To verify our approach empirically, we consider a social rumor spreading model, where we give agents intrinsic motivation to spread rumors to unaware agents, and an epidemic control problem.

Recent developments in the field of complex systems have shown that real-world multi-agent systems are often not restricted to pairwise interactions, bringing to light the need for tractable models allowing higher-order interactions. At the same time, the complexity of analysis of large-scale multi-agent systems on graphs remains an issue even without considering higher-order interactions. An increasingly popular and tractable approach of analysis is the theory of mean field games. We combine mean field games with higher-order structure by means of hypergraphons, a limiting description of very large hypergraphs. To motivate our model, we build a theoretical foundation for the limiting system, showing that the limiting system has a solution and that it approximates finite, sufficiently large systems well. This allows us to analyze otherwise intractable, large hypergraph games with theoretical guarantees, which we verify using two examples of rumor spreading and epidemics control.

## I. INTRODUCTION

In recent years, there has been a surge of interest in large-scale multi-agent dynamical systems on higher-order networks due to their great generality and practical importance, e.g., in epidemiology,^{1} opinion dynamics,^{2,3} network synchronization,^{4,5} neuroscience,^{6} and more. We refer interested readers to the excellent review articles.^{7–9} In addition to providing a more realistic description of the underlying processes, such large-scale systems with higher-order interactions pose interesting control problems for the reinforcement learning and control communities.^{10–12} To this end, a big challenge has been to find tractable solutions.^{13,14}

An increasingly popular and recent approach to the tractability issue has been to use the framework of learning in mean field games (MFGs)^{15–23} and their cooperative counterpart commonly known as mean field control (MFC).^{24–30} It is important to note that here, learning refers to the classical learning—i.e., iterative computation—of equilibria in game theory, as opposed to, e.g., reinforcement learning; see also, e.g., the discussion in Laurière *et al*.^{31} Here are also some extensive reviews on mean field games.^{32–36} Popularized by Huang *et al*.^{37} and Lasry and Lions^{38} in the context of differential games, mean field games and related approximations have since found application in a plethora of fields, such as transportation and traffic control,^{39–41} large-scale batch processing and scheduling systems,^{42–44} peer-to-peer streaming systems,^{45} malware epidemics,^{46} crowd dynamics and evacuation of buildings,^{47–49} as well as many other applications in economics^{50} and engineering.^{51} Tractably finding competitive equilibria and decentralized, cooperative optimal control solutions has been the focus of many recent works.^{52–58} Since then, mean field systems have also been extended to dynamical systems on graphs, typically using the theory of large graph limits called graphons.^{59,60} The graphon mean field systems can be considered either as the limit of systems with weakly interacting node state processes,^{57,61} or alternatively as the result of a double limit procedure where each node constitutes a large population, or “cluster” of agents, each of which interacts with each other via inter- and intra-cluster coupling. First, infinitely many nodes are considered according to the graphon, and then infinitely many agents are considered per node (see, e.g., Caines and Huang^{62,63}).

In this work, we will consider the former. The goal of our work is the synthesis of dynamical systems on hypergraphs with competitive or selfish agents. Existing analysis of hypergraph mean field systems typically remains restricted to special dynamics such as epidemiological equations^{64–66} or opinion dynamics^{67} on sparse graphs. In contrast, our work deals with general, agent-controlled non-linear dynamics and equilibrium solutions. We build upon prior results for discrete-time, graph-based mean field systems^{57,61,68} and extend them to incorporate higher-order hypergraphs as well as multiple layers.

Our contribution can be summarized as follows: (i) To the best of our knowledge, ours is the first general mean field game-theoretical framework for non-linear dynamics on multi-layer hypergraphs. Multi-layer networks^{69} have proven extremely useful in many application areas, including infectious disease epidemiology, where different layers could be used to describe community, household, and hospital settings.^{70} (ii) We prove the existence and the approximation properties of the proposed mean field equilibria. (iii) We propose and empirically verify algorithms for solving such hypergraphon mean field systems and thereby obtain a tractable approach to solving and analyzing otherwise intractable Nash equilibria on multi-layer hypergraph games. The proposed framework is of great generality, extending the recently established graphon mean field games and thereby also standard mean field games (via fully connected graphs).

After introducing some graph-theoretical preliminaries, in Sec. II, we will begin by formulating the motivating mathematical dynamical model and game on hypergraphs, as well as its more tractable mean field analog. Then, in Sec. III, we will show the existence of solutions for the mean field problem as well as quantify its approximation qualities of the finite hypergraph game, building a mathematical foundation for hypergraphon mean field games. Last, in Sec. IV, we will evaluate our model numerically for an illustrative rumor spreading game, verifying our theoretical approximation results and the obtained equilibrium behavior. All of the proofs can be found in Appendixes A and B.

*Notation. On a discrete space $A$, define the spaces of all (Borel) probability measures $P(A)$ and all sub-probability measures $B(A)$, equipped with the $L1$ norm. Define the unit interval $I:=[0,1]$ and its $N$ equal-length subintervals $I1N,\u2026,INN$ such that $\u2a06i=1NIiN=I$ for any integer $N$, where $\u2a06$ denotes disjoint union and each $IiN$ includes its rightmost point $i/N$. Denote the expectation and variance of random variables $X$ by $E\u2061[X]$, $V\u2061[X]$. Define the indicator function $1A(x)$ mapping to $1$ whenever $x\u2208A$ and $0$ otherwise. For any integer $k$, define $[k]:={1,\u2026,k}$. Let $r(A,m)$ denote the set of all distinct non-empty subsets of any set $A$ with at most $m$ elements and denote the set of all distinct non-empty, proper subsets by $r<(A):=r(A,|A|\u22121)$ as well as the set of all distinct non-empty subsets by $r(A):=r(A,|A|)$. To keep the notation simple, in the following, we write $r<[k]:=r<([k])$, $r[k]:=r([k])$ and identify, e.g., $r<[k]$ with $[|r<[k]|]:={1,\u2026,|r<[k]|}$ whenever helpful. Denote the set of permutations of a set $A$ as $Sym\u2061(A)$. Define the space of bounded, $r<[k]$-dimensional, symmetric functions $Sym<ind\u2061[k]$ induced by permutations of the underlying set $[k]$, i.e., any bounded function $f:Ir<[k]\u2192R$ is in $Sym<ind\u2061[k]$ whenever $f$ is invariant to all permutations $\sigma \u2208Sym\u2061([k])$, $f(x1,\u2026,xk,x11,x12,\u2026)=f(x\sigma (1),\u2026,x\sigma (k),x\sigma (1)\sigma (1),x\sigma (1)\sigma (2),\u2026)$. Analogously, we define spaces of such functions $Sym\u2264ind\u2061[k]$ and $Symind\u2061[k]$ over $r[k]$ and $[k]$, respectively.*

## II. MATHEMATICAL MODEL

Before we formulate the stochastic dynamic hypergraph game and its limiting analog in Secs. II A and II B, we discuss some graph-theoretical preliminaries. A (undirected) hypergraph is defined as a pair $H=(V,E)$ of a set of vertices $V$ and a set of hyperedges $E\u22862V\u2216{\u2205}$. In contrast to edges in graphs, here hyperedges may connect an arbitrary number of vertices instead of only two. If there is no scope of confusion, we will call hyperedges of a hypergraph just edges. Denote by $V[H]$ and $E[H]$ the vertex set and edge set of a hypergraph $H$. The maximum cardinality of all edges of a hypergraph $H$ is called its rank. A $k$-uniform hypergraph is defined as a hypergraph where all edges have cardinality $k$. A multi-layer hypergraph $H=(V,E1,\u2026,ED)$ with $D$ layers is obtained by allowing for multiple edge sets $E1,\u2026,ED\u22862V\u2216{\u2205}$, and we analogously write $Ed[H]$ for the $d$th set of edges of a multi-layer hypergraph $H$. We define the $d$th sub-hypergraph $Hd$ of a multi-layer hypergraph $H$ as the hypergraph with vertex set $V[H]$ and edge set $E[Hd]=Ed[H]$.

Consider any (non-uniform) hypergraph $H$ with bounded rank $kmax$. Observe the isomorphism between multi-layer uniform hypergraphs and such $H$ by splitting hyperedges of each cardinality $k\u2264kmax$ into their own layer. Since this procedure can be repeated for each layer of a multi-layer hypergraph, any multi-layer hypergraph is, therefore, equivalent to a correspondingly defined multi-layer uniform hypergraph. Hence, from here on, it suffices to define and consider $[k1,\u2026,kD]$-uniform hypergraphs $H$ as $D$-layer hypergraphs, where each layer $d=1,\u2026,D$ is given by a $kd$-uniform hypergraph with $kd\u2264kmax$ (see also Fig. 1 for a visualization). For instance, in social networks, each layer could model, e.g., the $k$-cliques of acquaintances formed at work, friendship at school, or family relations.

To formulate the infinitely large mean field system, we define the limiting description of sufficiently dense multilayer hypergraphs as the graphs intuitively become infinite in size, called hypergraphons.^{71} Here, dense means a number of edges on the order of $O(N2)$, where $N$ is the number of vertices, to which existing hypergraphon theory remains limited to. However, we note that an extension to more sparse models by fusing the theory of hypergraphons with $Lp$ graphons^{72–74} could be part of future work. The space of $k$-uniform hypergraphons $Wk$ is now defined as the space of all bounded and symmetric functions $W\u2208Sym<ind\u2061[k],W:Ir<[k]\u2192I$ that are measurable. We equip $Sym<ind\u2061[k]$ with the cut (semi-)norm $\Vert \u22c5\Vert \u25fbk\u22121$ proposed by Zhao,^{75} defined by

which (see, e.g., Lemma 8.10 of Ref. 59) coincides with the standard graphon case for $k=2$,

To analytically connect $k$-uniform hypergraphs to hypergraphons, we define the step-hypergraphons of any $k$-uniform hypergraph $H$ as

For motivation, note that for any sequence of graphs with converging homomorphism densities, equivalently the step graphons converge in the cut norm to the limiting graphon, and their limiting homomorphism densities can be described by the limiting graphon.^{59} Similarly, cut-norm convergence for the more general uniform hypergraphs at least implies the convergence of hypergraph homomorphism densities.^{75} Accordingly, we assume hypergraph convergence in each layer of a given sequence of $[k1,\u2026,kD]$-uniform hypergraphs $(HN)N\u2208N$ via the convergence of their step-hypergraphons $WdN:=W[HNd]$ to a limiting hypergraphon $Wd\u2208Wkd$ in the cut norm as visualized in Fig. 2, similar as in standard graphon mean field systems.^{57,61}

### A. Finite hypergraph game

In this subsection, we will formulate a dynamical model on hypergraphs where each node is understood as an agent that is influenced by the state distribution of all of its neighbors, according to some time-varying dynamics. Furthermore, each agent is expected to selfishly optimize its own objective, which gives rise to Nash equilibria as the solution of interest.

Consider a $[k1,\u2026,kD]$-uniform hypergraph and let $T$ be the time index set, either $T={0,1,\u2026,T\u22121}$ or $T=N0:={0,1,2,\u2026}$. We define $N$ agents $i\u2208[N]$ each endowed with local states $Xti$ and actions $Uti$ from a finite state space $X$ and finite action space $U$, respectively. Here, $X$ and $U$ are assumed finite for technical reasons, though we believe that results could be extended to more general spaces in the future. States have an initial distribution $X0i\u223c\mu 0\u2208P(X)$. For all times $t\u2208T$ and agents $i\u2208[N]$, their actions are random variables following the law

with policy (i.e., probability distribution over actions) $\pi i\u2208\Pi :=P(U)T\xd7X$, that, for each node $i$, depends on the $i$th state at time $t$. Then, the states are random variables following the law

with transition kernels $Pt:X\xd7U\xd7B(X)\u2192P(X)$ that, for each node $i$, depends on the $i$th state and action at time $t$, and $\nu tN,i$. Here, the $\xd7d=1DP(Xkd\u22121)$-valued multi-layer empirical neighborhood mean field $\nu tN,i$ is defined as

in its $d$th layer, consisting of the unnormalized state distributions of an agent $i$’s neighbors on each layer. In other words, the state dynamics of an agent depend only on the states of nodes in their immediate neighborhood and can be influenced by the agent via its actions $Uti$.

For example, in an epidemics spread scenario, the states of each agent could model their infection status, while the actions of an agent could be to take protective measures. As a result, each agent will randomly become infected with probability depending on how many neighboring agents are infected and whether the agent is taking protective measures.

The cost functions $Rt:X\xd7U\xd7B(X)\u2192R$ with discount factor $\gamma \u2208(0,1)$ or in the finite horizon case $\gamma \u2208(0,1]$ define the objective function for the $i$th agent,

which can also describe, e.g., random rewards $Rti$ that are conditionally independent given $Xti,Uti,\nu tN,i$ by the law of total expectation and taking the conditional expectation, $Rt(Xti,Uti,\nu tN,i)\u2261E[Rti|Xti,Uti,\nu tN,i]$.

Our goal is now to find Nash equilibria, i.e., stable policies where no agent can singlehandedly deviate and improve their own objective. Note that finding Nash equilibria in games such as the above is difficult, since (a) even existence of Nash equilibria under the above, decentralized information structure of policies is hard to show, and (b) computation of the Nash equilibria fails due to both curse of dimensionality under full observability and general complexity of computing Nash equilibria^{76} (see also Saldi *et al*.^{68} and the discussion therein).

Thus, in the finite game, we are interested in finding the following weaker notion of approximate equilibria,^{17,57} where a negligible fraction of agents that remain insignificant to all other agents may remain suboptimal.

While it may seem excessive to reduce to approximate optimality limited to a fraction of the agents, it is always possible under Assumption 1 for a finite number of agents to deviate arbitrarily from the limiting system description. Therefore, under our assumptions, it is only possible to obtain an approximate equilibrium solution for almost all agents via the mean field formulation. Although we could make stronger assumptions on the mode of convergence for hypergraphons, such a concept of convergence would be difficult to motivate from a graph theoretical perspective. Therefore, we restrict ourselves to the cut-norm convergence^{75} and the above solution concept.

### B. Hypergraphon mean field game

Next, we will formally let $N\u2192\u221e$ and obtain a more tractable, reduced model consisting of any single representative agent and the distribution of agent states, the so-called mean field.

To analyze the case $N\u2192\u221e$, however, we first introduce some preliminary definitions. We define the space of mean fields $M\u2286P(X)T\xd7I$ such that $\mu \u2208M$ whenever $\alpha \u21a6\mu t\alpha (x)$ is measurable for all $t\u2208T$, $x\u2208X$. Intuitively, a mean field is the distribution of states each of the infinitely many agents in $I$ is in. Analogously, the space of policies $\Pi \u2286\Pi I$ is given by policies $\pi \u2208\Pi I$ where $\alpha \u21a6\pi t\alpha (u\u2223x)$ is measurable for any $t\u2208T,x\u2208X,u\u2208U$. Intuitively, $\pi \u2208\Pi I$ defines the behavior for each agent $\alpha \u2208I$. For any $f:X\xd7I\u2192R$ and state marginal ensemble $\mu \u2208P(X)I$, define

In the limit of $N\u2192\u221e$, assuming that all agents follow a policy $\Pi \u2286\Pi I$, we obtain infinitely many agents $\alpha \u2208I$, for each of whom we define the limiting hypergraphon mean field dynamics analogously to the finite hypergraph game.

The agent states have the initial distribution $X0\alpha \u223c\mu 0\u2208P(X)$. For all times $t\u2208T$ and agents $\alpha \u2208I$, their actions will be random variables following the law

under the policy $\pi \alpha \u2208\Pi $, while their states follow the law

with the limiting, now deterministic neighborhood mean field $\nu t\alpha \u2208\xd7d=1DP(Xkd\u22121)$. Informally, by a law of large numbers, we have replaced the distribution of finitely many neighbor states by the limiting mean field distribution $\nu t\alpha $. The $d$th component of this mean field is given by

where for readability, $(\u22c5)$ denotes separate coordinates of the input (the order does not matter due to symmetry). In other words, the $d$-layer neighborhood mean field distributions are functions

that give the probability of random neighbors of a shared hyperedge on layer $d$ to be in states $(x1,\u2026,xkd\u22121)\u2208Xkd\u22121$.

Note that the same, shared $\alpha \u2208I$ is used for all $D$ layers, i.e., all layer neighborhood distributions of agents jointly converge to the limiting descriptions $\nu t\alpha $. This makes sense, since by Assumption 1, we assume that the agents are already ordered such that the corresponding step-hypergraphons converge to the limiting hypergraphon in cut norm on all layers jointly.

Finally, the objective will be given by

which leads to the mean field counterpart of Nash equilibria. Informally, a mean field (Nash) equilibrium is given by a “consistent” tuple of policy and mean field, such that the policy is optimal under the mean field and the mean field is generated by the policy. As a result, if all agents follow the policy, they will be optimal under the generated mean field, leading to a Nash equilibrium.

More formally, we define the maps $\Phi :M\u21922\Pi $ mapping from fixed mean field $\mu \u2208M$ to all optimal policies $\pi \u2208\Pi :\u2200\alpha \u2208I:\pi \alpha \u2208argmax\pi ~J\alpha \mu (\pi ~)$ and similarly $\Psi :\Pi \u2192M$ mapping from policy $\pi \u2208\Pi $ to its induced mean field $\mu \u2208M$ such that for all $\alpha \u2208I$, $t\u2208T$ we have the initial distribution $\mu 0\alpha =\mu 0$ and mean field evolution,

A Hypergraphon Mean Field Equilibrium (HMFE) is a pair $(\pi ,\mu )\u2208\Pi \xd7M$ such that $\pi \u2208\Phi (\mu )$ and $\mu =\Psi (\pi )$.

Importantly, the mean field game will be motivated rigorously in the following, and its computational complexity is independent of the number of agents. Instead, the complexity of the problem will scale with the size of agent state and action spaces $X$, $U$ and the considered time horizon in case of a finite horizon cost function, since we will solve for equilibria by repeatedly (i) computing optimal policies for discrete Markov decision processes^{77} $\pi \alpha \u2208argmax\pi ~J\alpha \mu (\pi ~)$ and (ii) solving the mean field evolution equation (15). In particular, mean field equilibria are guaranteed to exist, and the corresponding equilibrium policy will provide an equilibrium for large finite systems.

To obtain meaningful results, we need a standard continuity assumption (e.g., Ref. 61), since otherwise weak interaction is not guaranteed: Without continuity, a change of behavior in only one of many agents could otherwise cause arbitrarily large changes in the dynamics or rewards.

Let $Rt$, $Pt$, $W$ each be Lipschitz continuous with Lipschitz constants $LR,LP,LW>0$.

Note that our model is quite general: In particular, it is also possible to model dynamics and rewards dependent on the state-action distributions instead of only state distributions, replacing $\delta \xd7j\u2260iXtmj$ by $\delta \xd7j\u2260i(Xtmj,Utmj)$ in *(7)*. This can be done by reformulating any problem as follows. Assume a problem with state and action spaces $X$, $U$ and dependence of rewards and transitions on joint state-action distributions. We can rewrite the problem as a new problem with new state space $X\u222a(X\xd7U)$, where in the new problem, each two decision epochs $t$, $t+1$ correspond to a single original decision epoch, where in the first step $t$ ,we transition deterministically from $Xtmj$ to $(Xtmj,Utmj)$ for the taken action $Utmj$, while in the second step $t+1$, we transition and compute rewards according to the original system, ignoring any second actions taken. Choosing the square root of the discount factor and normalizing rewards will give a problem in our form that is equivalent to the original problem.

## III. THEORETICAL RESULTS

In this section, we rigorously motivate the mean field formulation by providing existence and approximation results of an HMFE. Essentially, HMFEs are guaranteed to exist and will give approximate Nash equilibria in finite hypergraph games with many agents. The reader interested primarily in applications may skip this section.

We lift the empirical distributions and policies to the continuous domain $I$, i.e., for any $(\pi 1,\u2026,\pi N)\u2208\Pi N$, we define the step policy $\pi N\u2208\Pi $ and the step empirical measures $\mu N\u2208M$ by

Proofs for the results to follow can be found in Appendixes A and B and are at least structurally similar to proofs in Cui and Koeppl,^{57} though they contain a number of additional considerations that we highlight in Appendixes A and B.

### A. Existence of equilibria

First, we show that there exists an HMFE. We do this by rewriting the problem in a more convenient form as done in Cui and Koeppl.^{57} Consider an equivalent, more standard mean field game with states $(\alpha t,X~t)$, i.e., we integrate the graphon indices $\alpha $ into the state. The newfound states follow the initial distribution $X~0\u223c\mu 0$, $\alpha 0\u223cUnif(I)$. Then, the actions and original state transitions follow as before, while the $\alpha t$ part of the state remains fixed at all times, i.e.,

where we used the standard (non-graphical) mean field $\mu ~t\u2208P(X\xd7I)$ (cf. Saldi *et al*.^{68}) and let

Using existing results for mean field games,^{68} we obtain the existence of a potentially non-unique HMFE.

Under Assumption 2, there exists a HMFE $(\pi ,\mu )\u2208\Pi \xd7M$.

For uniqueness results, we refer to existing results such as the classical monotonicity condition.^{38,55} However, using existing theory will not analyze the finite hypergraph structure and instead directly uses the limiting hypergraphons. In the following, we, thus, show also that the finite hypergraph games are indeed approximated well.

### B. Approximation properties

Next, we will show that the finite hypergraph game and its dynamics are well approximated by the hypergraphon mean field game, which implies that the HMFE solution of the hypergraphon mean field game will give us the desired $(\epsilon ,p)$-Nash equilibrium in large finite hypergraph games.

To begin, we define and obtain finite $N$-agent system equilibria from an HMFE via the policy sharing map $IdN(\pi ):=(\pi 1,\u2026,\pi N)\u2208\Pi N$, i.e., $IdN$ is defined such that each agent will act according to its position $\alpha $ on the hypergraphon,

Now consider $(i,\pi ^)$-deviated policy tuples where the $i$th agent deviates from an equilibrium policy tuple to its own policy $\pi ^$, i.e., policy tuples $(\pi 1,\u2026,\pi i\u22121,\pi ^,\pi i+1,\u2026,\pi N)$. Note that this includes the deviation-free case as a special case. In order to obtain a $(\epsilon ,p)$-Nash equilibrium, we must show that for almost all $i$ and policies $\pi ^$, the $(i,\pi ^)$-deviated policy tuple will be approximately described by the interaction with the limiting hypergraphon mean field. For this purpose, the first step is to show the convergence of agent state distributions to the mean field.

Define for any $n\u2208N$ the evaluation of measurable functions $f:Xn\xd7In\u2192R$ under any $n$-dimensional product measures $\u2297n\mu \u2208P(Xn)In$ as

where $\u2297n\mu $ denotes the $n$-fold product of the measure $\mu $, i.e., the $n$-dimensional distribution over agent states.

Then, our first main result is the convergence of the finite-dimensional agent state marginals to the limiting deterministic mean field, given sufficient regularity of the applied policy. For this purpose, we introduce and optimize over a class $\Pi Lip$ of Lipschitz-continuous policies up to at most $D\pi $ discontinuities, i.e., $\pi \u2208\Pi Lip$ whenever $\alpha \u21a6\pi t\alpha $ at any time $t$ has at most $D\pi $ discontinuities. Note, however, that we could in principle approximate non-Lipschitz policies by classes of Lipschitz-continuous policies.

As a special case, by considering $n=1$ and $f=1{x}$ for any $x\u2208X$, we find convergence in $L1$ of the empirical distribution of agent states $1N\u2211i\u2208[N]\delta Xti$ to the limiting mean field $\u222bI\mu t\alpha d\alpha $.

Our second main result is the (uniform) convergence of the system for almost any agent $i\u2208[N]$ with deviating policy $\pi ^\u2208\Pi $ to the system where the interaction with other agents is replaced by the interaction with the limiting deterministic mean field. Hence, we introduce new random variables for the single deviating agent, beginning with initial distribution $X^0iN\u223c\mu 0$. The action variables follow the deviating policy

with the state transition laws

i.e., we assume that all other agents act according to their corresponding equilibrium policy $IdN(\pi )$, such that the neighborhood state distributions of most agents can be replaced by the limiting term $\nu tiN$ with little error in large hypergraphs.

As a corollary, we will have a good approximation of the finite hypergraph game objective through the hypergraphon mean field objective, and correspondingly the approximate Nash property of hypergraphon mean field equilibria, motivating the hypergraphon mean field game framework.

Consider an HMFE $(\pi ,\mu )\u2208\Pi Lip\xd7M$. Under Assumptions 1 and 2, for any $\epsilon ,p>0$ there exists $N\u2032$ such that for all $N>N\u2032$, the policy $(\pi 1,\u2026,\pi N)=IdN(\pi )$ is an $(\epsilon ,p)$-Nash equilibrium.

Therefore, we find that a solution of the mean field system is a good equilibrium solution of sufficiently large finite hypergraph games.

The assumption of a class $\Pi Lip$ of Lipschitz-continuous policies up to finitely many discontinuities may seem restrictive. However—similar to Theorem 5 of Ref. 57—we may discretize and partition $I$ in order to solve hypergraphon mean field games to an arbitrary degree of exactness, preserving the good approximation properties on large hypergraph games.

## IV. NUMERICAL EXPERIMENTS

In this section, we shall introduce an exemplary numerical problem of rumor spreading and show associated numerical solutions to demonstrate the hypergraphon mean field framework, verifying the theoretical results.

In order to learn an HMFE in our model, we shall adopt the well-founded discretization method proposed in Cui and Koeppl^{57} analogous to the technique used in the proof of Theorem 1 to convert the graphon mean field game into a classical mean field game and thereby allow application of any existing mean field game algorithms such as fixed point iteration to solve for an equilibrium. In other words, we will split $I$ into subintervals $I1N,\u2026,INN$, for each of which we will pick a representing $\alpha \u2208IiN$. This $\alpha $ together with an agent’s original state in $X$ will constitute the new state. In Appendix B, we perform additional experiments for another numerical problem of epidemics control, where existing algorithms fail, pointing out potential future work.

### A. Hypergraphons

In our experiments, we shall sample finite hypergraphs directly from given limiting hypergraphons, which should ensure that we obtain hypergraph sequences that fulfill Assumption 1 analogous to the standard graphon case at rate $O(1log\u2061N)$ (see Lemma 10.16 of Ref. 59). To sample a $k$-uniform hypergraph with $N$ nodes from a $k$-uniform hypergraphon $W$, we sample $|r<[k]|$ uniformly distributed values from the unit interval ${\alpha j:\alpha j\u223cUnif([0,1])}j\u2208r<[k]$. Then, we add any hyperedge $B\u2286[N]$ with probability $W(\alpha r<(B))$.

For the sake of illustration, unless otherwise noted, we will in the following consider two-layer hypergraphons, where the first layer is a two-uniform hypergraph (standard graph), while the second layer shall be a three-uniform hypergraph. For the first layer, we consider the uniform attachment graphon,

the ranked attachment graphon

and the flat (or $p$-ER) random graphon

In particular, the uniform attachment graphon is the limit of a random graph sequence where we iteratively add a new node $N$ and then connect all unconnected nodes with probability $1N$. Similarly, for the ranked attachment graphon, at each iteration $n$, we first add a new ($n$th) node. Before adding the node, the nodes $1,\u2026,n\u22121$ exist from prior iterations. The new node $n$ is connected to all previous nodes $i=1,\u2026,n\u22121$ with probability $1\u2212in$. Then, all other nodes that are not yet connected with each other will connect with probability $2n$ (see also Chap. 11 of Ref. 59 and Fig. 3).

For the second, three-uniform layer, we similarly consider the hypergraphon resulting from converting all triangles in a standard $p$-ER graph into hyperedges,^{75}

as well as the uniform attachment hypergraphon

and its inverted version

resulting from a similar construction as in the standard case.

### B. Rumor spreading dynamics

In this section, we will describe some simple social dynamics and epidemics problems to illustrate potential applications of hypergraphon mean field games. Here, each layer could model different types of interpersonal relationships. In our particular example of two-uniform and three-uniform layers, the latter can model small cliques of friends, while the former could model general acquaintanceship. We do note that social networks are typically more sparse, possessing significantly less edges than on the order of $O(N2)$. However, our model is a first step toward rigorous limiting hypergraph models and in the future could be extended by using other graph limit theories such as $Lp$ graphons^{72–74} by extending their theory toward hypergraphons. We further imagine that similar approaches could be used, e.g., in economics^{50} or engineering applications.^{51}

In the classical Maki–Thompson model,^{78,79} spread of rumors is modeled via three node states: ignorant, spreader, and stifler. Ignorants are unaware of the rumor, while spreaders attempt to spread the rumor. When spreaders attempt to spread to nodes that are already aware of the rumor too often, they stop spreading and become a stifler. In this work, instead of *a priori* assuming the above behavior, we will give agents an intrinsic motivation to spread or stifle rumors, giving rise to the Rumor problem. We shall consider ignorant ($I$) and aware ($A$) nodes. The behavior of aware nodes is then motivated by the gain and loss of social standing resulting from spreading rumors to ignorant and aware nodes, respectively. The possible actions $U:={S\xaf,S}$ of nodes are to actively spread the rumor ($S$) or to refrain from doing so ($S\xaf$). The probability of an ignorant node becoming aware of the rumor at any decision epoch is then simply given by a linear combination of all layer neighborhood densities of aware, spreading nodes.

Since transition dynamics will depend on the spreading actions of neighbors, following Remark 2 we define instead the extended state space $X={I,A}\u222a({I,A}\xd7U)$. We then assume the dynamics are given at all times by

for all $x\u2208{I,A}$, $u\u2208U$ and similarly the rewards

with $R\u22610$ otherwise. In other words, any aware and spreading agent obtains a reward in each layer that is proportional to the probability of a neighbor of any hyperedge sampled uniformly at random out of all connected hyperedges to be ignorant. In our experiments, we use $\tau 1=0.3$, $\tau 2=0.5$, $rd=0.5$, $cd=0.8$, $\mu 0(A)=0.01$, and $T={0,1,\u2026,49}$.

### C. Numerical results

In our experiments, we restrict ourselves to finite time horizons with $\gamma =1$, $50$ discretization points, and use backward induction with exact forward propagation to compute exact solutions. Note that a simple fixed point iteration by repeatedly computing an arbitrary optimal deterministic policy and its corresponding mean field converges to an equilibrium in the Rumor problem. In general, however, fixed point iteration (as well as more advanced state-of-the-art techniques) may fail to converge [see, e.g., the susceptible-infected-susceptible (SIS) problem in Appendix B].

In Fig. 4, we can observe that the behavior for the rumor problem is as expected. At the equilibrium, agents will continue to spread rumors until the number of aware agents reaches a critical point at which the penalty for spreading to aware agents is larger than the reward for spreading to ignorant agents. The agents with higher connectivity are more likely to be aware of the rumor. Particularly in the uniform attachment hypergraphon case, the threshold is reached at different times, since the neighborhoods of different $\alpha $ reach awareness at different rates depending on their connectivity. Here, a number of nodes with very low degrees will continue spreading the rumors. In Appendix B, we show additional results for inverted $3$-uniform hypergraphons, which give similar results to the ones seen here. Furthermore, as can be seen in Fig. 5, the $L1$ error between the empirical distribution and the limiting mean field system (as vectors over time),

goes to zero as the number of agents increases, showing that the finite hypergraph game is well approximated by the hypergraphon mean field game for sufficiently large systems, though the error remains somewhat large due to the high variance from our sparse initialization $\mu 0(I)=0.01$. Here, we estimated the error $\Delta \mu $ for each $N$ over $50$ realizations. Due to the $O(N2)$ complexity of simulation and computational constraints, our experiments remain limited to the demonstrated number of agents. We repeat the experiment in Fig. 6 with a more dense initialization $\mu 0(A)=0.1$ to reduce the aforementioned high contribution of variance from random initializations. Here, we observe that the resulting convergence is significantly faster.

Last, in Fig. 7, we demonstrate some interesting non-linear behavior for a two-layer setting where both layers consist of three-uniform hypergraphs. Here, for the first layer, we use the block hypergraphon

for $p=0.5$, while for the second layer, we again use the inverted uniform attachment hypergraphon. In other words, we have a structure of two blocks on the first layer, while the second layer is more globally connected. Furthermore, we will initialize the rumor in the second block where $\alpha >0.5$, i.e., $\mu 0\alpha (A)=1(0.5,1](\alpha )$. As we can see in Fig. 7, in the beginning, the rumor spreads in the second block $\alpha >0.5$ where it originated from. After a while, however, the rumor begins to spread faster in the first block $\alpha \u22640.5$, since nodes with low $\alpha $ are significantly more interconnected on the second layer.

Overall, we can see that multi-layer hypergraphon mean field games allow for more complex behavior and modeling of connections than a single-layer graphon approach.

## V. CONCLUSION

In this work, we introduced a model for dynamical systems on hypergraphs that can describe agents with weak interaction via the graph structure. The model allows for a rigorous and simple mean field description that has a complexity independent of the number of agents. We verify our approach both theoretically and empirically on a rumor spreading example. By introducing game-theoretical ideas, we, thus, obtain a framework for solving otherwise intractable large-scale games on hypergraphs in a tractable manner.

We hope our work forms the basis for several future works, e.g., extensions to directed or weighted hypergraphs in order to generalize to arbitrary network motifs,^{80} adaptive networks,^{81} cooperative control, or consideration of edge states in addition to the vertex states we have considered in this work. Furthermore, it may be of interest to consider graph models with more adjustable clustering parameters. An extension of our rumor model and theory to continuous-time models could be fruitful. Finally, so far our work remains restricted to dense graphs and deterministic limiting graphons, while in practice, this is not always the case (e.g., preferential attachment graphs^{82}). Here, $Lp$ graphons^{72–74} could provide a description for less dense cases, which are of great practical interest. We also hope that our work inspires future applications in inherently (hyper-)graphical scenarios.

## ACKNOWLEDGMENTS

This work has been funded by the LOEWE initiative (Hesse, Germany) within the emergenCITY center. The authors also acknowledge support by the German Research Foundation (DFG) via the Collaborative Research Center (CRC) 1053—MAKI. W.R.K. received no specific grant for this research from any funding agency in the public, commercial, or not-for-profit sectors. The authors acknowledge the Lichtenberg high performance computing cluster of the TU Darmstadt for providing computational facilities for the calculations of this research.

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

### Author Contributions

**Kai Cui:** Investigation (equal); Writing – original draft (equal). **Wasiur R. KhudaBukhsh:** Supervision (equal); Writing – review & editing (equal). **Heinz Koeppl:** Funding acquisition (equal); Supervision (equal); Writing – review & editing (equal).

## DATA AVAILABILITY

Data sharing is not applicable to this article as no new data were created or analyzed in this study.

### APPENDIX A: PROOFS

#### 1. Proof of Theorem 1

Under our assumptions, we can verify Assumption 1 of Ref. 68 for the equivalent standard mean field game given by (19) as in Cui and Koeppl.^{57} By Theorem 3.3 of Ref. 68, there exists a mean field equilibrium $(\pi ~,\mu ~)$ for (19). The policy $\mu ~$ is $\alpha $-a.e. optimal under the mean field $\mu ~$ by Theorem 3.6 of Ref. 68. For all other $\alpha $, there trivially exists an optimal action, i.e., we can change $\pi ~$ such that it is optimal for all $\alpha $. Since the change is on a null set of $I$, $(\pi ~,\mu ~)$ remains a mean field equilibrium. Define the hypergraphon mean field policy $\pi $ by $\pi t\alpha (u\u2223x)=\pi ~t(u\u2223x,\alpha )$, then $\pi $ is optimal under the hypergraphon mean field $\mu $ where $\mu =\Psi (\pi )$, since $\mu t\alpha =\mu ~t(\u22c5,\alpha )$ for almost every $\alpha $. Finally, both $\pi $ and $\mu $ are measurable. Therefore, we have proven existence of the HMFE $(\pi ,\mu )$.

#### 2. Proof of Theorem 2

In this section, we provide the full proof of Theorem 2. In contrast to prior work, such as Cui and Koeppl,^{57}, we (i) extend existing mean field convergence results to $n$-fold products of the state distributions; and (ii) replace the state distributions by their symmetrized version, in order to obtain convergence results under the generalized cut norm (1). Propagating these changes forward, the rest of the proof is (somewhat) readily generalized and given in the following.

To begin, we introduce some notation to improve readability. Define the $D$-dimensional neighborhood mean fields $\nu W\alpha ,\mu $ with $d$th component

for all $\mu \u2208P(X)I$, $W:=(W1,\u2026,WD)\u2208\xd7d=1DWkd$ as well as the transition operator $PWt,\pi ,\mu :P(X)I\u2192P(X)I$ such that

for all $\mu \u2032\u2208P(X)I$, $\pi \u2208P(U)X$, such that, e.g.,

Since $g$ is a measurable function bounded by $Mf$, due to the prequel it suffices at any time $t\u2208T$ to prove (23) for $n=1$, which will imply the statement for all $n\u2208N$.

#### 3. Proof of Theorem 3

The proof of Theorem 3 mirrors the proof in Ref. 57 apart from propagating the multidimensional convergence results forward, and we give the entire proof for completeness and convenience. Again, we introduce some notation to improve readability. For any $\alpha \u2208I$, $d\u2208[D]$, define maps $\nu d\alpha :P(X)I\u2192P(X)$ and $\nu N,d\alpha :P(X)I\u2192P(X)$ as

with $D$-dimensional shorthands

such that by definition $\nu t\alpha =\nu \alpha (\mu t)$ and $\nu tN,i=\nu NiN(\mu tN)$.

This completes the proof of (26) $\u27f9$ (27) at any time $t$, since by the prequel, the intersection of all correspondingly chosen, finitely many sets $JiN$ for sufficiently large $N$ has at least $N\u2212\u2211i\u2308piN\u2309$ elements, which is always larger than $N\u2212\u2308pN\u2309$ for any $p>0$ by choosing $pi$ sufficiently small.

#### 4. Proof of Corollary 1

#### 5. Proof of Corollary 2

### APPENDIX B: ADDITIONAL EXPERIMENTS

In Fig. 8, we show additional results for the Rumor problem and inverted three-uniform hypergraphons. There, we find almost inverted results as in Fig. 4, indicating that the influence of connections from the second layer are more important under the given problem parameters. However, we note that surprisingly, the highest awareness is reached for intermediate $\alpha $.

As an additional example, in the timely SIS problem, we assume that there exists an epidemic that spreads to neighboring nodes according to the classical SIS dynamics (see, e.g., Ref. 83). Analogously, we may consider extensions to arbitrary variations of the SIS model, such as susceptible-infected-recovered (SIR) or susceptible-exposed-infected-recovered (SEIR). Each healthy (or susceptible, $S$) agent can take costly precautions ($P$) to avoid becoming infected ($I$), or ignore ($P\xaf$) precautions at no further cost. Since being infected itself is costly, an equilibrium solution must balance the expected cost of infections against the cost of taking precautions.

Formally, we define the state space $X={S,I}$ and action space $U={P\xaf,P}$ such that

with infection rates $\tau d>0$, $\u2211d\tau d\u22641$, recovery rate $\delta \u2208(0,1)$, and rewards $R(x,u,\u22c5)=cP1{P}(u)+cI1{I}(x)$ with infection and precaution costs $cP>0$, $cI>0$. In our experiments, we will use $\tau d=0.8$, $\delta =0.2$, $cP=0.5$, $cI=2$, $\mu 0(I)=0.5$, and $T={0,1,\u2026,49}$.

Existing state-of-the-art approaches such as online mirror descent (OMD)^{55}(and similarly fictitious play, see, e.g., Ref. 19) as depicted in Figs. 9 and 10 for ten discretization points did not converge to an equilibrium in the considered $2000$ iterations, though we expect that the methods will converge when running for significantly more iterations—e.g., $400000$ iterations as in Ref. 58—which we could not verify here due to the computational complexity. We expect that existing standard results using monotonicity conditions^{38,55} can be extended to the hypergraphon case in order to guarantee convergence of aforementioned learning algorithms. However, this remains outside the scope of our work. In particular, for the ranked attachment graphon and hypergraphon, the final behavior as seen in Fig. 9 remains with an average final exploitability $\Delta J$ of above $0.25$, which is defined as

and must be zero for an exact equilibrium.

## REFERENCES

*Emerging Frontiers in Nonlinear Science*(Springer, 2020), pp. 131–159.

*2019 IEEE 58th Conference on Decision and Control (CDC)*(IEEE, 2019), pp. 6479–6486.

*Learning for Dynamics and Control*(PMLR, 2020), pp. 256–266.

*Proceedings of the AAAI Conference on Artificial Intelligence*(AAAI Press, 2020), pp. 7143–7150.

*International Conference on Artificial Intelligence and Statistics*(PMLR, 2021), pp. 1909–1917.

*2021 60th IEEE Conference on Decision and Control (CDC)*(IEEE, 2021), pp. 3048–3053.

*53rd IEEE Conference on Decision and Control*(IEEE, 2014), pp. 1669–1674.

*2021 IEEE 60th Conference on Decision and Control (CDC)*(IEEE, 2021), pp. 5239–5246.

*Paris-Princeton Lectures on Mathematical Finance 2010*(Springer, 2011), pp. 205–266.

*Encyclopedia of Systems and Control*(Springer, 2021), pp. 1197–1202.

*2016 IFIP Networking Conference (IFIP Networking) and Workshops*(IFIP Open Digital Library, 2016), pp. 386–394.

*2016 35th Chinese Control Conference (CCC)*(IEEE, 2016), pp. 9218–9223.

*2017 IEEE 56th Annual Conference on Decision and Control (CDC)*(IEEE, 2017), pp. 1052–1057.

*Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems*(International Foundation for Autonomous Agents and Multiagent Systems, 2019), pp. 251–259.

*2021 American Control Conference (ACC)*(IEEE, 2021), pp. 730–736.

*International Conference on Learning Representations*(ICLR, 2022).

*et al.*, arXiv:2203.11973 (2022).

*2018 IEEE Conference on Decision and Control (CDC)*(IEEE, 2018), pp. 4129–4134.

*2019 IEEE 58th Conference on Decision and Control (CDC)*(IEEE, 2019), pp. 286–292.