Interpersonal trust: Asymptotic analysis of a stochastic coordination game with multi-agent learning

We study the interpersonal trust of a population of agents, asking whether chance may decide if a population ends up in a high trust or low trust state. We model this by a discrete time, random matching stochastic coordination game. Agents are endowed with an exponential smoothing learning rule about the behaviour of their neighbours. We find that, with probability one in the long run the whole population either always cooperates or always defects. By simulation we study the impact of the distributions of the payoffs in the game and of the exponential smoothing learning (memory of the agents). We find, that as the agent memory increases or as the size of the population increases, the actual dynamics start to resemble the expectation of the process. We conclude that it is indeed possible that different populations may converge upon high or low trust between its citizens simply by chance, though the game parameters (context of the society) may be quite telling.

We are interested in a finite population playing a stochastic coordination game in random pairwise interactions which we use to model an interaction requiring trust.Consider an agent's point of view: 'If others are abusing trust I prefer not to act trustingly while if others are honorable, then I would prefer to trust them.'Agents ideally coordinate on either trusting/acting honorably or on distrusting/abusing trust.The agents in our model are endowed with an adaptive learning rule.Models of learning in evolutionary game theory often assume that the agents' opponent(s) are playing a fixed strategy and aim to learn this from experience.This makes little sense when all the agents are employing that learning rule (and thus updating their strategy).Our learning rule instead is an exponentially weighted moving average of the agents observations (also called simple exponential smoothing) which does not assume that the opponents' strategy is fixed.Our model of a stochastic coordination game does away with fixed payoffs with noise, and instead allows for a very broad category of random variables as payoffs with very mild conditions on their distributions.This is a new approach in the context of agents who learn based on experience, though common in the study of the replica-tor equation.Another distinguishing element is that the stochastic game is dynamic as the payoffs are drawn at random anew at each time step.We prove convergence of belief and behavior in the long run to pure Nash equilibria (always: trust/act honorably, or distrust/abuse trust).We conclude with simulations to explore the relationship between model parameters and relative probability of convergence to the trustful steady state and the rate of convergence.We see the surprising result that a shorter agent memory (i.e., more weight to the most recent observation) dampens the effect of the payoff distributions on the model outcomes.

I. INTRODUCTION
Trust is beneficial to a societies' functioning yet there is not a uniform amount of trust in different societies across the world [1].Is there an inherent difference between the societies which exhibit high trust and those that exhibit low trust?In Denmark, it is common practice to leave sleeping babies in their prams outside on the pavement, while its parents are shopping inside [2].In contrast, one need only look around and observe widespread electric fences and infrared alarm systems in a typical South African arXiv:2402.03894v3[physics.soc-ph]17 May 2024 neighborhood to see that its citizens do not believe each other to be trustworthy.
What are the antecedents of low and high trust?Is there some insurmountable difference between the citizens of Denmark and those of South Africa, or is it possible that these differences are matters of chance?Furthermore we ask what effect the structural properties of the interaction have on the probability of low or high trust emerging.
We apply the tool of evolutionary game theory with agent learning to the question of interpersonal trust.Instead of studying an N -person trust game [3] as in [4][5][6][7][8], we make a simplification.In particular we study a 2-person coordination game played by two random agents drawn repeatedly from a population of N agents.This allows us to study what happens if both agents involved in the interaction learn from the interaction.This approach also illuminates the dynamics that occur for non-coordination games in which agents have positive externalities (utility derived from taking the same action as other players).We consider the problem of placing trust and acting in a trustworthy manner in society as a coordination game.We condense the matter of placing trust and honoring placed trust into the action trust.Similarly, not placing trust and abusing others' trust are condensed into the action doubt.Agents are happy to act trustingly while others are acting in a trustworthy manner, but prefer to act distrustfully when their neighbors are abusing trust, hence the payoffs follow those of a coordination game.In the model, a population of agents interacts by a random matching of two agents per round.These agents take the action which they believe to maximize their one-round expected utility.This expectation is based on their belief of 'the probability that a random other in the population acts trustingly,' and the randomly realized payoff in their stochastic coordination game.In this fictitious play-like [9] model of learning, agents update their belief based on the exponential moving average rule.
We prove convergence of behavior and beliefs to the always-trust or always-doubt steady states in the long run.Furthermore, we highlight the impact that the exponential smoothing learning parameter has on the relative probability of convergence to the always-trust versus always-doubt steady state.Surprisingly, a greater learning rate (akin to a shorter memory of the agents) has dampening effect on the impact of the game payoff parameter distributions.Conversely, based on our numerical simulations we conjecture that a learning rate that approaches zero (an infinite memory) may create a phase transition between the always-trust and the always-doubt steady states depending on the distributions of the payoff variables.
We note the rich history of the fully stochastic payoff-matrix [31][32][33][34][35][36][37][38][39][40][41][42].In particular there is a line of work on the statistics of the replicator (or the replicator-mutator) dynamics under the random draw of the game payoff-matrix [34,35].This is in contrast with [36][37][38][39] which study the replicator dynamics but in a changing environment (i.e., the payoff-matrix is drawn in each generation from some distribution).The replicator equation models a process of selection in which the reproductive fitness of a species (or strategy) is equal to its average payoff in the game.The object of interest in such studies is the number of (mixed) equilibria and their stability.Learning by imitation has also been studied in the context of a stochastic payoff-matrix [40][41][42].In such models agents choose a neighbor and compare their most recent payoffs.If their neighbor has a greater payoff, the agent adopts their neighbors strategy with a given probability.In this branch of the literature one looks for elements in the randomness or the population structure which either promote or inhibit the emergence of trust.
The context of our model is experience based learning, i.e., the agents are learning about the population based on the strategies agents observe their opponents using and use this information to optimize their actions.Examples of this kind of learning with payoff matrices which are either static, or perturbed by small noise can be found in [43][44][45][46], and [19,27] respectively.The fact that agents converge to the always-trust or the always-doubt state complements (is in qualitative agreement with) the existing results of convergence to pure Nash equilibria [47] in the context of small perturbations and a decreasing learning rate in [17][18][19]48].
Our model illustrates that a small population may, by chance, end up in a low or a high trust state.As the size of the population grows however, chance plays less of a role and the outcome is almost entirely decided by the distributions of their payoff parameters.The distributions of the payoff variables, represent the context that a population is in.In situations where the payoff distributions promote doubting (low trust) it would behoove the individuals to have a shorter memory so that happy mistakes could lead to long-term trust.
Thus because neither South Africa nor Denmark have 'small' populations, our model leads us to believe that the low and high trust observed respectively may be largely due to the structural properties of interactions.By this we mean the learning rate or the distributions of the payoffs in the coordination game.Our stylized mathematical model is of a high level of abstraction and aims to capture the essence of the underlying dynamics.Evidently, in the (inherently more complex) real dynamics, there are various other differences between South Africa and Denmark.
In §II we formulate the model formally, which we follow in §III with an asymptotic analysis.This culminates in the main theorem regarding convergence of belief and behavior of the agents.In §IV we provide an illustration of the model with only two agents which highlights the perhaps surprising nature of our main results.Having identified the asymptotic behavior of the dynamics, we introduce and discuss the results of a simulation study in §V.The simulation serves to elucidate the interdependence of the parameters on the chance of convergence to high or low trust behavior.We conclude in §VI with a discussion of the results and possible future work.

II. MODEL
We consider a population of N ∈ N (N ≥ 2) agents who engage in a game played repeatedly in discrete rounds indexed by t ∈ N. N represents the natural numbers, 1, 2, 3, . . . in this paper.At the start of each round t ∈ N, a pair of agents (I(t), J(t)) is chosen uniformly at random from the set of tuples {(i, j) ∈ {1, . . ., N } : i ̸ = j}.This may either be interpreted as if the population is fully mixed and there is no population structure, or equivalently as if the structure imposed on the population is the fully connected network.The chosen pair of agents play a 2 × 2 coordination game.In this game I(t) takes the role of agent k = 1 (the row agent) and J(t) takes the role of agent k = −1 (the column agent).We define g(k, t): This allows us to retrieve an agent's index within the population given their role in the game.The action of agent k ∈ {1, −1} at time t ∈ N is denoted A k (t) ∈ {T, D}.The action T denotes trusting and D denotes the action of doubting.We define the payoff bimatrix of the game as Π(t) at time t ∈ N: The first value of the matrix entry Π l,m (t) is the reward obtained by the row agent when playing the action in row l against a column agent who plays the action in column m during round t ∈ N. Conversely, the second value of the matrix entry Π l,m (t) is the reward obtained by the column agent when playing the action in column m against a row agent who plays the action in row l during round t ∈ N.
We define Π k (t) as the matrix containing only the payoffs to agent k ∈ {−1, 1} at time t ∈ N.
We assume that the payoffs in ( and for k ∈ {1, −1} and all t ∈ N. Remark 1.The restrictions (2) and ( 3) are stronger than what we need for our results.Our analysis holds whenever and for all t ∈ N, k ∈ {1, −1}.We add the assumptions (2) and (3) to show the relevance of the analysis to a coordination game.
In our setting, the agents model the population behavior with a belief on the probability that a randomly chosen individual that they play against would trust.Let x i (t) denote the belief of agent i ∈ {1, . . ., N } at the beginning of round t ∈ N on the likelihood that they will encounter an agent playing trust.The vector holding the beliefs of the agents in the population at time t is: At the start of each round each agent k ∈ {1, −1} only observes their own payoffs . Furthermore, the agents are not aware that their opponent's payoffs follow the structure of a coordination game.The agents model their opponent's behavior entirely by their belief on the likelihood that their opponent take the trust action.We define u T k (t) and u D k (t) as the expected utility for agent k ∈ {1, −1} playing T and D respectively during round t ∈ N based on agent k's belief: Myopic decision making is a common assumption [49][50][51][52][53][54].Furthermore, there is experimental work that suggests that humans indeed act at least semi-myopically [55,56].As such, we assume the agents take actions myopically: Assumption 1 (Myopic rationality).We assume agents to be myopically rational, taking the action which maximizes the 1-round expected utility, with ties favoring trust: for t ∈ N and k ∈ {1, −1}.
By letting ties favor trust we obtain the weak inequality.We do this to facilitate a clean analysis.It should be noted however, that event in which equality holds has probability zero because the payoffs are continuous random variables.We will often be interested in the case when both agents take the same action.As such we define A(t) for t ∈ N without subscript as: When modeling agents that learn in the context of game theory, there is a variety of learning mechanisms which may be implemented.For example the agents could implement a basic form of reinforcement learning which would be an adjustment to our Assumption 1, combining belief and action on belief into one process.Alternatively, agents could make repeated application of Bayes' rule given the interaction they observe, or use stochastic approximation algorithm [57] to update their belief on the probability that a random opponent will trust.The agents in our model use the exponential moving average [58] of their experiences.This is a straightforward, but powerful rule of thumb applied in signal processing.We introduce ᾱ := 1 − α for improved legibility of future equations.
Assumption 2 (Belief updating).The agents that are selected to play a game in round t update their belief based on the outcome of that game using exponential smoothing with learning rate α ∈ (0, 1): for k ∈ {1, −1} and t ∈ N.All other agents i ∈ {1, . . ., N } \ {I(t), J(t)} retain their most recent belief: We make this assumption mainly in order to facilitate a clean analysis.Furthermore, we believe this belief updating rule is simple enough for it not to be overly unrealistic to assume that people might learn in a similar fashion.
We note the work of Sato and collaborators [43][44][45] which also features updates according to the exponential moving average.An important difference between that work and ours is that we consider the discrete and random dynamics while they make use of a separation of time scales which forces actual dynamics to resemble the expectation.This last element is something, we shall see, emphatically not present in our model.
The learning rate α is the weight of an agents most recent observation and may thus be interpreted as the agent's memory.A greater weight to the most recent observation means that the earlier observations weigh less and thus are more quickly forgotten.
To an outsider who has not observed Π k (t) for k ∈ {1, −1} during round t ∈ N, the action taken looks random.Specifically, if the outsider is privy to the distributions of U k , Y k , V k and W k , as well as the belief x k (t) then the probability that agent k ∈ {1, −1} plays trust in round t ∈ N is defined as p k (t): To clarify, for the agent k ∈ {−1, 1} who knows the values in Π k (t) and their own belief x k (t), the truth of the inequality u T k (t) ≥ u D k (t) is not random.For an outsider who is aware of the belief x k (t) and the distributions but not the realizations of the payoffs, the agent's actions seem random and follow the probability defined in (6).The behavior defined in (6) may be restated for k ∈ {1, −1} and t ∈ N: for which we define Z k (t) as the random variable: for k ∈ {−1, 1}, t ∈ N. We acknowledge the freedom we have in defining a cdf for Z k (t) by allowing correlation between its constituent random variables.
Using this freedom we continue the discussion and analysis using Z k (t), and defining its cdf: which then also defines agent behavior p k (t) = F (x k (t)) for k ∈ {1, −1} and t ∈ N. We make the following assumption on F , which is essentially of a technical nature.
This assumption implies that if an agent believes that their opponent will trust (or doubt) with probability 1, that they too will trust (or doubt) with probability 1.Furthermore, that F (•) is a cdf and therefore non-decreasing, means that an agent with higher belief is at least as likely to trust than an agent with lower belief.Because U k (t), Y k (t), V k (t) and W k (t) are respectively iid between rounds t ∈ N as well as players k ∈ {−1, 1}, the cdf F (•) does not change from round to round or from player to player.
The model we have described is highlighted by the flow chart in Figure 1.This shows the process which the two randomly chosen agents follow in one given round.We are interested in the evolution of the belief vector x(t) for t ∈ N. Specifically we ask whether there is convergence of beliefs to an equilibrium state.

III. ASYMPTOTIC ANALYSIS
In this section we analyze the long-term dynamics of the process in the limit as time t → ∞.We first show that it is possible for the N agents to absorb (never exit an ϵ-ball around) at x = 0 and x = 1 where agent behavior converges to alwaysdoubt and always-trust respectively.Subsequently we show that the process may end up in these corners from the interior of the state space.Finally we combine these two sub-results in a Borel-Cantelli argument which proves that the process will converge to one of these corners in the long run with probability one.On an intuitive level this is explainable by the fact that in a coordination game, the agents will 'try' to coordinate their behavior.Thus under learning and myopic rationality it is rational for a population to always trust or always doubt as this guarantees 100 percent coordination.
We state the main theorem, subsequently we will state and prove the lemmas required in the proof of this main theorem.We end this section by providing the proof of Theorem 1.
In the statement of the main theorem, we use Lagrange notation to define F (n) (x) as the n-th derivative of F at x.
Theorem 1 (Absorption in a corner is guaranteed).
If α > ϵ, and which may be written as The conditions on F in this theorem are mild.All that we require of F is that there exist finite integers n 1 and n 2 such that the n 1 -th derivative of F at x = 0 exists and is finite and that the n 2 -th derivative of F at x = 1 exists and is finite.

A. Absorption is possible
The first lemma we will need pertains to the possibility of a population converging in an ϵ-ball around zero: Lemma 1 (Absorption at zero for N agents).Let α ∈ (0, 1), ϵ ∈ (0, α), and suppose x(t 0 ) ∈ [0, ϵ] N for some t 0 ∈ N, then In most of the probabilities that we write there is a condition on the state of the system at t 0 .In order to fit equations into the available space as well as for general legibility we define the following notation: Proof.In the new notation, we will show that We prove this by induction.As base case for our induction we prove that absorption is possible for N = 2: When x(t) ∈ [0, ϵ] 2 , then even one agent playing T in some round t 1 ∈ N, t 1 ≥ t 0 implies that x(t 1 +1) / ∈ (0, ϵ) 2 because xᾱ + α > α > ϵ for all x ∈ (0, 1) and α ∈ (0, 1) and any ϵ ∈ (0, α).Thus the probability of remaining in [0, ϵ] 2 for all t = t 0 , t 0 + 1, . . ., t 0 + n is the probability of both agents playing doubt in all rounds from t 0 until (and including) round t 0 + n: Note that we are interested in the limit of the above expression as n → ∞ because the agents are required to play D for all future rounds.The probability that agent 1 plays D in round t is given by 1 − F (x 1 (t)).
For legibility we define F (•) := 1 − F (•).Because the agents' actions during any round are independent, we have the above probability restated as In order to simplify expression we define Let z 0 := max{x 1 (t 0 ), x 2 (t 0 )} and subsequently z m := ᾱm z 0 ∀m ∈ N. We bound (15) by Figure 1: A flow chart of the model.This shows the process which the pair of randomly selected agents go through for one round.In each round two agents are drawn from the population at random to play the stochastic coordination game once against each other.Each of these agents observes an independent draw of the random payoff matrix and compares this with their belief.If their expected reward for doubting is greater than that for trust, then they doubt, and otherwise they trust.Then each of these agents adjusts its belief based on the action taken by its respective opponent and returns to the pool of possible agents to be selected in the next round.because z m ≥ x i (t 0 )ᾱ m for all m ∈ N and because F (x) ≥ F (y) as long as x ≥ y by virtue of being a distribution function.We take the logarithm of both sides as well as the limit as n → ∞, and change the logarithm of a product on the right to a sum of logarithms: We intend to bound the right hand side of (17).To do this observe first that e −x ≥ 1 − x.This may be checked by evaluating the tangent of e −x at x = 0 and noting that e −x is convex and so remains above this tangent line.We rearrange to obtain: 1 − e −x ≤ x, ∀x ∈ R.
We substitute x = log(y) to get 1 − e − log(y) ≤ log(y), which simplifies to We use the above inequality with y = F (z m ) to bound each term in the sum from below by Implanting this into the sum in (17) gives, In order to invoke Abel's convergence test (see for example [59]) we note that the sum in (19) is the product of the sequences {F (z m )} and {1/(F (z m ) − 1)}.The second of these is bounded from above by −1 and is monotone increasing in m.This is because F (z m ) is monotone decreasing in m and bounded by 0. Therefore, if ∞ m=0 F (z m ) converges, then the right hand side of ( 19) also converges.
To prove convergence of ∞ m=0 F (z m ) we use the ratio test.We will show that The inequality is a result of an (possibly repeated) application of L'Hôpital's rule.First note that and so the limit of the quotient in (20) yields an indeterminate form 0/0. We apply L'Hôpital's rule: • we apply L'Hôpital's rule n times until Thus all conditions for Abel's convergence test are satisfied and we have shown convergence of the right hand side of (19).In particular this will converge to a negative number because all the terms are negative as a result of the denominator.By taking the exponential of both sides of (19) we get the probability of both agents playing doubt indefinitely on the left.On the right hand side we get a positive number.Considering that the probability of convergence is equal to the probability of both agents playing doubt indefinitely we have shown that this positive: This proves our base case: q > 0. Our induction hypothesis thus states that N players can converge.As such we denote by q N the probability of N players converging in an ϵ-ball at zero: Now we intend to show that We identify (arbitrarily) the first agent and collect all the rounds in which agent i = 1 is selected to play in M : We note that as before, if at least one agent plays trust in any round t then x(t + 1) / ∈ [0, ϵ] N +1 .Thus we have the relationship: We may write the probability of all agents playing D persistently as a product of conditional probabilities as follows: Because we are dealing with multiplication on the right hand side, the ordering of terms does not matter.We are thus free to collect all the terms involving agent 1 in one product (a) and the remaining terms in a different product (b): For the rounds t ∈ M we introduce the notation t i,l to be the time of the l-th round in which agent 1 is chosen to play against agent i.Note that because the dynamics never end, agent one will be chosen to play with each other agent for an infinite number of rounds.Thus we can split a further and in terms of agent beliefs: We can bound the beliefs of the agents involved because we know that at least they have played against each other l times, and always played doubt (by the conditioning).So we set xl := ϵᾱ l−1 for l = 1, 2, . . .and note that x j (t i,l ) ≤ xl for both j = 1, and i as well as all l = 1, 2, . . . .
We can bound the probability in the product by: We take the logarithm on both sides: From here we can repeat the steps followed in the proof of our base case from (17) to conclude that the sum converges.This means that a > 0 as long as N < ∞.
We proceed to show that b > 0 (which is defined in (25)).We note that b > q N , the probability of N agents absorbing at zero.To see that this is true consider the last N agents and some order of games for them to play against each other.By the induction hypothesis this group of N players have positive probability of always playing doubt.But this group and these games are interspersed with matches between some agent in the group of N and the first agent.We have conditioned on both agents playing doubt in all games until the current game which includes those involving the first agent.This implies that the beliefs of agents who were chosen for games against the first agent have a lower belief after this game and play doubt at an even higher probability in their next match than in the original set of games in which there was already a probability q > 0 of all agents always doubting.
We have thus shown that a > 0 and b > 0 and therefore conclude that also a • b > 0.
We now state a similar lemma for absorption of N agents around one.
Lemma 2 (Absorption at one for N agents).Let α ∈ (0, 1), ϵ ∈ (0, α), and suppose x(t 0 ) ∈ [1 − ϵ, 1] N for some t 0 ∈ N, then The proof is similar to the proof of Lemma 1.The difference is that instead of playing doubt the agents are required to play trust indefinitely.This happens at probability F (x) rather than F (x). Furthermore the agent belief at the start of the l-th round after t 0 is 1 − (1 − x(t 0 )) ᾱl−1 rather than x(t 0 )ᾱ l .
Intuitively our first two lemmas imply that a population that is within an ϵ-ball around zero or one, can remain there.This means a population believing that everyone is 100% (or almost 100%) trustworthy or untrustworthy, can retain that belief forever.

B. Reaching the corners is possible
Our next result proves sufficient conditions for the population to reach the ϵ-ball around zero with positive probability.For legibility we define: which we call the interior.
Lemma 3 (Population of N agents reaches zero with positive probability.).Let α ∈ (0, 1) and x(t 0 ) ∈ I N for some t 0 ∈ N and ϵ ∈ (0, α), then Proof.We construct a path from I N which depends on agents always playing doubt against one another.
We split this into two cases, in the first we show that there is a path from x(t 0 ) ∈ (ϵ, 1 − ϵ) N to (0, ϵ) N of positive probability.In the second we show that there is a path from x(t 0 ) ∈ I N with some number 0 < h < N of agents with belief x(t 0 ) ≥ 1 − ϵ to (ϵ, 1 − ϵ) N of positive probability and therefore also a path to (0, ϵ) N .
From now until round m agent 2k − 1 is matched to play against agent 2k where k = r mod N/2 [60] where r = t − t 0 .The probability of each round of this matching is given by 1/(N (N − 1)).The probability of this pattern of matching for m rounds is then given by Note that each pair of agents (2k − 1 and 2k) is independent of the other agents for these m rounds, in which they each play 2m/N games.Let t i,l for l = 1, 2, . . ., 2m/N index the time of the round in which agent i plays their l-th game (after t 0 ), then supposing all agents play D in all rounds until round m, then agent i's belief follows: x i (t i,l ) = ᾱl x i (t 0 ), for all l = 1, 2, . . ., 2m/N.Let κ i ∈ N be the minimum number of games agent i has to play (both players always playing doubt D) for their belief to be distance ϵ from 0. Specifically κ i is the least value that satisfies: By dividing through by x i (t 0 ) > 0, and taking the logarithm we see that This is finite for all x i (t 0 ) ≤ 1 − ϵ, ϵ > 0 and α < 1 and is maximized at For x(t) ∈ (0, ϵ) N we need x i (t) < ϵ to be true for all agents i = 1, 2, . . ., N .We define m to be the least value m such that each agent has played enough games to be less than distance ϵ from zero: By rearrangement we have which is finite because κ i is finite.This gives us our first intermediate result: The probability of the matching we have created is thus strictly greater than zero.We call this p m > 0.
We now turn to the other requirement of this path to (0, ϵ) N which is that all agents play D in each of their 2m/N < ∞ games.As noted before, each pair of agents interacts exclusively with one another and so we focus on one such pair and the rounds in which they play.For pair k = 1, 2, . . ., N/2 denote the probability that there exists a round t k for which x 2k−1 (t k ), x 2k (t k ) < ϵ: Let xk,0 := max{x 2k−1 , x 2k } and subsequently xk,l = ᾱl xk,0 .We use this to bound the beliefs of the k-th pair in their l-th interaction.Then we can bound the right hand side: because xk,l ≥ x 2k−1 (t 0 )ᾱ l , x 2k (t 0 ) ᾱl for all l ∈ N and because F (x) ≥ F (y) as long as x ≥ y by virtue of being a distribution function.The terms in the product are all strictly greater than 0 and because it is a finite product we know that its result is also strictly greater than 0 for all k = 1, 2, . . ., N/2.Finally we have N/2 of these pairs and so the probability of reaching the corner is the probability of the matching times the probability of the individual games proceeding as described: This proves the statement in case 1 where all the agent beliefs were x i (t 0 ) ≤ 1 − ϵ for all i = 1, 2, . . ., N .
Here we have chosen the first agent to be the agent with belief x(t 0 ) ≤ 1 − ϵ without loss of generality as their naming convention plays no role.We now construct a finite path of positive probability to get from this state to the state we assumed at the start of the proof (x i (t 0 ) < 1 − ϵ for all i = 1, 2, . . ., N ).
For k = 1, 2, . . ., N −1, we specify the games between I and J as follows: (I(t), J(t)) * t=t0+1,...,t0+2(N −1) The probability of such a sequence of matches is: The above is a finite product of positive numbers and so is greater than zero.The general form of this sequence is that some agent from the set {1, . . ., k} plays against the first agent that isn't in the set twice.In the first of these two games, player I plays doubt in both rounds, while player J plays trust in the first round, and doubt in the second round.In this way, after the two games, both agents have x < 1 − ϵ.
To see that this is the case consider agent I whose belief starts at x(t 0 ) ≤ 1 − ϵ.After playing against T their belief updates to x(t 0 + 1) ≤ 1 − ϵᾱ < 1. Subsequently they play against D and so in the following round they hold belief x(t 0 + 2) ≤ 1 − α − ϵᾱ 2 < 1 − α < 1 − ϵ.Now we consider the agent J whose belief starts at x(t 0 ) ≤ 1.After playing against D twice their belief is bounded by For one such pair of games define B the event of the sequence of actions (starting in round 2k + 1): We call p 2g , the probability of both players acting according to the event B, and we note that it is bounded: The right hand side of ( 35) is strictly greater than zero because by Assumption 3 F is only zero at zero, and so we have product of 4 numbers, all greater than zero.Thus p 2g > 0.
The probability then of getting into case 1 is given by the probability of the matching (I, J) * multiplied by the probability of all (N − 1) pairs of games going as planned which are independent and the probability is: At time t 0 + 2(N − 1) + 1 all the agents' belief is x < 1 − ϵ.Proceeding as in case 1 we know that there is a path of positive probability to (0, ϵ) N .
We now state a similar lemma for the probability of the population reaching the ϵ-ball around 1 from the interior.
Lemma 4 (Population of N agents reaches one with positive probability.).Let α ∈ (0, 1) and x(t 0 ) ∈ I N for some t 0 ∈ N and ϵ ∈ (0, α), then The proof is similar to that of Lemma 3. The differences are akin to the differences between Lemma 1 and Lemma 2. Additionally, instead of looking for a least κ i that satisfies (28) we look for a least κ i that satisfies 1 − (1 − x(t 0 ))ᾱ κi > 1 − ϵ.This can however be translated into (28).We now state the main analytical result of the paper.By the dynamics we describe, a population of any finite size N , is guaranteed to converge, in belief as well as behavior, to one of the corners 0 or 1 of the state space [0, 1] N .
Our second pair of lemmas tell us that a population can end up believing that the rest of the population is 100% trustworthy (or untrustworthy).Thus a natural reinforcement of beliefs regarding trustworthiness can take place and result in complete trust among the population or a complete lack thereof.

C. Proof of the main theorem
We now present the proof of the main theorem of our paper which states that a population of agents will converge with probability one at either the alwaystrust or the always-doubt corner.
Proof.Our objective is to show that if α > ϵ, and By Lemmas 3 and 4 we know that the process may reach of) rounds with positive probability.
By Lemma 1 and 2 we know that the belief vector, upon reaching A has positive probability of being absorbed there.As such we define the probability of reaching and being absorbed in A in s + 1 rounds as: and note that p > 0. Define τ 1 as the time of the last entry into A: Thus after s + 1 rounds (supposing we can read the future about possible absorption) one of three things will have occurred: 1.The belief vector x(t 0 + s + 1) is absorbed in A, and so t 0 + s + 1 ≥ τ 1 .
If (3) happens then at some point the belief will have to again be in [0, 1] N \ A (with probability 1 as this is implied by not being absorbed).Once the process is again in [0, 1] N \ A either directly in case of (2) and after some finite time in case (3), we reset the clock and note that after another s + rounds the process is either absorbed in A (with probability p) or not (with probability 1 − p).We therefore know that the probability of the belief vector x not being absorbed in A after n ∈ N such meta-experiments is given by (1 − p) n and the probability of never being absorbed is: Thus the probability of the complement is P(∃t 0 : In this section we showed that it is certain that the process ends in one of the corners in the long run [61].The natural next question is: With what probability does the system absorb in the alwaystrust corner?Define the probability of (eventual) absorption in the always-trust corner x ∈ [1 − ϵ, 1] N as: Similarly we define the probability of absorption in the always doubt corner: Because absorption in one of the two corners is guaranteed by Theorem 1, we have the following relationship for all x ∈ [0, 1] N : By this result we are able to focus our investigation into the dichotomy of absorption at 1 or at 0, and not miss dynamics in which the process reaches a steady state in the interior of the state space.
Remark 2. The main result of this section, Theorem 1 also holds true for any finite connected population structure determining agent pairings.Absorption of the process x at 0 or 1 is guaranteed in the long-term.
The proofs for the corresponding results of Lemmas 1-4, would need to be adjusted by accounting for the restrictions on which agents may interact with one another.With some care on the order of agent pairings, as long as the network of agents in connected, it should be possible to construct a sequence of pairings which leads to reaching the ϵ-balls at zero and one.The same holds true for absorption.

IV. ILLUSTRATION FOR TWO AGENTS
In this section we illustrate the model by means of exploring the two agent case.Suppose that N = 2, then we have two agents who are matched to play against one another in all rounds t = 1, 2, . ... Note that the agents assignment to I or J is merely a matter of notation.Thus we can assume that agent 1 is always assigned k = 1 and agent 2 is assigned k = −1 and refer to them by their k assignment for the remainder of this section.
We have the following stochastic dynamical system: where ξ t is defined as the vector of noise which accounts for the randomness: and we define h(•) to be the expected evolution of the system.We may calculate h(x) explicitly in terms of the cdf F : We note that any points with x 1 = F (x −1 ) and x −1 = F (x 1 ) map to themselves in expectation.Furthermore, because F (x) ≥ F (y) whenever x ≥ y (because F is a distribution function) the points that map to themselves in expectation must satisfy: For the function F (x) = x + 2x(x − 1/2)(x − 1), we plot the vector field of the expected change: h(x)−x in Figure 2. Looking at the regions (0, 0.4) 2 and (0.6, 1) 2 in Figure 2, we might guess that when F (x) = x + 2x(x − 1/2)(x − 1) convergence will be to the center of the domain: x = (0.5, 0.5).However, by Theorem 1 we know that the process must converge in one of the two corners in the long run.
Heuristically this can be explained by the progression of agent beliefs around zero and one versus around 0.5.There are paths by which the agents' beliefs can keep getting closer to zero and one respectively.Meanwhile, at 0.5, should the agents' beliefs get arbitrarily close, their next belief will be roughly |0.5α| away from 0.5.
In short; knowing the result of Theorem 1 and comparing this with Figure 2, we realize that we should be careful not to assume that the asymptotic dynamics will be dictated by their expectation.

V. SIMULATION
We analyze the effect of F and α on p T and the time until first entry to A by means of simulation.
Because it requires substantially less simulated time steps to simulate the process with N = 2 agents, we begin in this setting, and later include a numerical simulation to verify that the results are qualitatively similar to the case when N > 2.
We parameterize F by r ∈ [0.5, 2] and take only functions of the form F (x) = x r .In particular note that F (x) = √ x and F (x) = x 2 are included in this parameterization.The two agents are initialized with beliefs chosen uniformly at random over the belief space (0, 1).Thus x(0) ∼ U 2 [0,1] .In order to determine how many time steps ought to be simulated to allow process to be absorbed in one of the two steady states we run a preliminary simulation for the values α = {0.01,0.05, 0.25, 0.5} and r = {0.5, 1, 2}.We run 1000 iterations for 10 000 times steps keeping track of the average Manhattan distance to the nearest steady state over time.The resulting 95% confidence bounds of the average Manhattan distance to A is depicted in Figures 3a-3d.
We show the probability of being absorbed in the always-trust corner of the state space in Figure 4a.We notice that increasing α has the effect of smoothing the transition between being absorbed in (0, 0) and in (1,1).Conversely, decreasing α results in a sharper transition between being absorbed in (0, 0) with probability one when r > 1 and absorbed in (1, 1) with probability one when r < 1.The following conjecture elaborates on this.
The machinery used in the proofs requires ϵ < α.This provides clear criteria for events that lead the population to 'jump out' of the ϵ-balls around zero and one.It also ensures that the population can converge to an ϵ-ball around zero or one in finite time (see division by log( ᾱ) = log(1 − α) in ( 29) for example).New techniques would thus have to be applied in order to prove Conjecture 1. Intuitively however, as the learning rate α goes to zero, the size of the steps agents make when they adjust their    ) and the time it takes to be absorbed at either corner.We see the transition between being absorbed in (1, 1) with probability 1 when r < 1 and probability zero when r > 1.A greater α has the effect of smoothing this transition.We also see that increasing α has a hastening effect on the time to absorption.
belief shrink.This seems to imply that an infinite number of steps might be needed but it is only their direction that matters and so movement towards the expectation (to zero when r > 1 and toward one when r < 1) will eventually come to pass.
Phase transitions are quite common in evolutionary game theory, with some recent results in [35,38,41].Duong and Han [35] also conjecture a phase transition (regarding the expected number of equilibria) whose proof is beyond the standard techniques for phase transitions in the replicator dynamics.Zheng et al. [38] and Zeng et al. [41] present phase transitions for imitation learning dynamics, which they observe in simulations.Though beyond the scope of this paper, Conjecture 1 could be investigated in future work by means of an extensive simulation study.
In Figure 4b we show the time average time until the simulation entered A for the first time.By choosing ϵ = 10 −5 , we suggest that the time it takes to enter A = [0, ϵ] 2 ∪ [1 − ϵ, 1] 2 is proportional to the time it takes to also be absorbed in A. Supposing this is the case, then we see that a greater α has the effect of speeding up the dynamics.This is sensible as the step sizes made by the process are bigger for a bigger α.We also see that the closer r is to 1, the longer the dynamics take.This is explainable by the fact that with a greater r, while the process is still far from A, the probability of steps toward the center are still fairly likely.In short, smaller |1 − r| implies that the process spends more time in the center of the state space, while lower α means that the process is moving to the corners at a slower pace.
In order to show how the behavior changes as the population size is increased, we present simulations for N = 5 and N = 10.The resulting probability of absorption at 1 N is depicted in heatmap-format in Figure 5a-5b.For N = 5, in Figure 5a we already see that the effect of α observed for N = 2 is less pronounced, though still present.Furthermore, for N = 10 in Figure 5b we see that α almost plays no role in terms of the relative absorption probabilities.Thus as N increases we suspect that the steady state in which the process will be absorbed depends more on the cdf F and less on the learning dynamics of the individual agents.
The time to absorption for the simulation runs with N = 5 and N = 10 are shown in Figure 5c and 5d respectively.We see that, qualitatively, the interplay between α and r on the time to absorption is the same.That is, bigger α and |1 − r|, speed up the dynamics.

VI. CONCLUSION
We modeled the problem of trusting strangers in society as game in which agents of a population are randomly matched, two per round, and tasked with a coordination game with random payoffs.These agents are endowed with a learning procedure by which they update their belief on the probability that a random stranger would trust using the exponential moving average of their passed observations.We have shown for that for any finite population of size N , with mild conditions on the cdf of the payoff parameters, and a constant learning rate (or memory) α ∈ (0, 1) the process is absorbed with probability 1 at one of the two steady states: always-trust, and always-doubt.This result is not immediately obvious because looking at the expected change of the process there are F for which it would seem that all agents believing x = [0.5]N is an attracting steady state.We conjecture that as α → 0 there will emerge a sharp phase transition at r = 1, such that the process is absorbed in the always-trust steady state with probability 1 when r < 1 and with probability 0 when r > 1.
A similar phase transition might occur as the population size is increased.A broad take away of this model is that differences in trust between populations, might simply boil down to chance; the process by which a population learns on which action to coordinate is random and is not necessarily fully determined by the nature of the game (F ), the size of the population (N ) or the rate of learning (α).However, in the case of large populations our simulations suggest that the context of the interaction (distributions of the payoff parameters) will play a role in determining whether or not they end up with longterm trust or long-term doubting.In terms of the numerical simulation, the probability of converging to the always-trust steady state decreases (increases) as the population grows, for r > 1 (r < 1).This implies that the effect of increasing population is not determined by the type of game (like for the oneshot N -player public goods and prisoner's dilemma games shown in [62]) but really the specific payoff structure.For instance, our model corresponds to an all-or-nothing version of the 2 player public goods game when F is a step function with one step from zero to one.Players either pay one (when A k = T ) or zero (when A k = D) while the reward γ ∈ R is only paid out if both players payed the cost.Then whether or not a larger population benefits one or the other steady state depends not on the fact that it is a public goods game but specifically where the step function steps from zero to one.Specifically the location of this step corresponds to 1/γ.
As possible answer to our research question, we see that societies may be low or high trust largely due to the context of the interactions within them.The context being the distributions of the payoffs in the 'coordination game' the people are playing with one another.Adjusting the payoffs of the social interactions might thus be the most effective way to foster more trust in a population.Chance plays a bigger role in smaller groups than with large ones.Future research could be to investigate whether a large population could be modeled by a lose grouping of smaller cliques and that the average trusting behavior of the population is an aggregation of these cliques.In this case chance may indeed play a big role in determining the amount of trust on a clique level, which indirectly effects the trust in the population as a whole.
The model we consider assumes a population structure in which all agents are connected with one another.Although the result of Theorem 1 should also hold for any other network topology as long as this is  connected, it might interesting to look for the possibility of pseudo steady states in networks with a strong community structures.It could also be interesting to characterize a mapping between the structure of the graph defining the interactions and the rate of convergence to a steady state.Other future work could focus on extending the convergence result (or proving its opposite) in the case of Bayesian learning or another simple learning rule that the agents might employ.
Our model is such that if there exists even one agent who always trusts regardless of the actions taken by the rest of the population, the population will converge to the trust action with probability one (because convergence to always doubt is impossible).This is in contrast to Wang and Sun [16] in whose model the presence of 'zealous' always cooperators does not always promote cooperation.However, in our model the converse is also true: One constant doubter forces the always doubt equilibrium.
Studying the transient dynamics of our model with both constant trusters and constant doubters might lead to results with oscillations between trusting and doubting as in [63].
Another line of future work could do away with the assumption of selfish rationality and instead have agents using some form of moral preferences showcased in [31,64].For example agents who are interested in the welfare of both players in the game.
A facet of trust we have not explored here has to do with believing what has been said by others.
Recently, the evolution of honesty in the senderreceiver game has been studied under imitation of better-performing strategies by means of the Monte Carlo method [65][66][67].In particular, agents in these studies copy the behavior of other agents at probability proportional to the difference between their payoffs.It may be fruitful to adapt the approach we have taken here to study the evolution of honesty in the sender-receiver game under experience based learning instead of imitation.In this case it may be necessary for agents to have two beliefs: x (likeli-hood of being believed) and y (likelihood of others being honest).In particular it may be interested to investigate under which conditions the golden rule (lie and disbelieve or be honest and believe) emerges.

Figure 2 :
Figure 2: The expected value of h(x) − x for the function F (x) = x + 2x(x − 1/2)(x − 1).This figure highlights the at first counter intuitive nature of the result in Theorem 1.The expectation of the dynamics would indicate that the process absorbs at x = (0.5, 0.5), but we know by Theorem 1 that the process absorbs with probability one at one of the two corners x = (0, 0) or (1, 1).

Figure 3 :
Figure 3: Average minimum Manhattan distance to A over time for various α and r.

Figure 4 :
Figure 4: Heatmaps of p T (x(0)) and the time it takes to be absorbed at either corner.We see the transition between being absorbed in (1, 1) with probability 1 when r < 1 and probability zero when r > 1.A greater α has the effect of smoothing this transition.We also see that increasing α has a hastening effect on the time to absorption.

Furthermore, we analyzed
the effect of F , α and N by simulating the model with parameterized F (x) = x r for r ∈ [0.5, 2] and α ∈ [0.05, 0.5].The results of this simulation show that • decreasing α exaggerates the effect r • increasing N decreases the effect of α (a) p T for N = 5 (b) p T for N = 10 (c) Time N = 5 (d) Time N = 10

Figure 5 :
Figure 5: Heatmaps of p T (x(0)) and the time it takes to be absorbed at either corner.Increasing the number of agents in the population dampens the effect of the model learning rate α on the outcome or sharpening the effect of the payoff distribution parameter r.