Human memory is an incredibly complex system of vast capacity but often unreliable. Measuring memory for realistic material, such as narratives, is quantitatively challenging as people rarely remember narratives verbatim. Cognitive psychologists developed experimental paradigms involving randomly collected lists of items that make possible quantitative measures of performance in memory tasks, such as recall and recognition. Here, we describe a set of mathematical models designed to predict the results of these experiments. The models are based on simple underlying assumptions and surprisingly agree with experimental results quite well, in addition to that they exhibit quite interesting mathematical behavior that can partially be understood analytically.

Our memory is often unreliable. One usually remembers where she parked her car in the morning when leaving work in the evening, but sometimes this fails, and what is worse, one can never know when this happens and why. This kind of everyday observations seem to indicate that human memory cannot be predicted, let alone described by mathematical equations. However, quantitative measures of memory performance were collected over years of study, in particular, with recognition and recall tasks utilizing random lists of words (see, e.g., Refs. 14), and a set of powerful and complex models was proposed to capture these measures.4–7 These models are characterized by a large number of parameters that have to be tuned to data and cannot be analyzed mathematically. We recently proposed a different type of models that are based on simple fundamental principles and as a result can be analyzed mathematically.8,9 Moreover, while these models necessarily provide a highly simplified description of data, they predict the important performance measures in recall and recognition experiments quite well and, most importantly, display a certain degree of universality. In this paper, we review the above models and the precision with which they predict memory performance and focus on their mathematical aspects, in particular, on the unsolved issues.

Here, we introduce the experiments that we performed on the Amazon’s Mechanical Turk platform with a relatively large number of participants of unknown age and education level. All experiments involved randomly assembled lists of common nouns (e.g., “table” and “car”) or common facts (e.g., “Earth is round” and “birds fly”). We assembled multiple word lists of seven lengths L = 8, 16, 32, 64, 128, 256, and 512 presented consecutively at two speeds (1 and 1.5 s per word) and fact lists of four lengths L = 8, 16, 32, and 64 presented at the speed of 3 s per fact, 18 conditions all-together. Each participant performed a recall experiment with a list of a particular presentation condition (words/facts with a particular length and presentation speed) and one recognition test with the different list of the same condition. The recognition test consisted of a pair of items, one of them from the presented list and one not, with the participant being required to point to the item from the list. More experimental details can be found in Refs. 9 and 10. For each presentation condition separately, we obtained the empirical distribution of the number of words recalled R, P(R) and the fraction of correct recognitions, C, both across the group of participants performing experiments with this condition. In Fig. 1, we plot the average R and C as a function of list length L separately for words of two presentation speeds and for facts. As in numerous other publications (see, e.g., Refs. 3 and 11), we observe the decline of memory performance with L and with presentation speed, which is quite intuitive. We now introduce a novel performance measure that we call “memory invariant” (X) defined as

(1)

As shown in Fig. 2, the value for X is approximately the same for all conditions, converging to an asymptotic value toward longer lists. The main goal of the recall model that we present next is to account for this behavior (in fact, the invariant was derived from model predictions as will be seen shortly). The black horizontal line in Fig. 2 is the theoretical prediction for the invariant, given by

(2)
FIG. 1.

Recall and recognition performance for lists of increasing length (L). (a) Average number of items recalled vs list length for lists of words with two presentation speeds and lists of facts. (b) Same as (a) for probability for correct recognition.

FIG. 1.

Recall and recognition performance for lists of increasing length (L). (a) Average number of items recalled vs list length for lists of words with two presentation speeds and lists of facts. (b) Same as (a) for probability for correct recognition.

Close modal
FIG. 2.

Invariant measure of memory. The value of memory invariant defined in Eq. (1) vs list length for lists of words with two presentations speeds and lists of facts. Colors are the same as in Fig. 1. The black horizontal line is the theoretical prediction for the invariant, given by Eq. (2).

FIG. 2.

Invariant measure of memory. The value of memory invariant defined in Eq. (1) vs list length for lists of words with two presentations speeds and lists of facts. Colors are the same as in Fig. 1. The black horizontal line is the theoretical prediction for the invariant, given by Eq. (2).

Close modal

In another set of experiments performed with random lists of words, we measured how the probability to recognize a word from the list declines as a function of the lag between its presentation time and the test, which is called the retention curve (RC).9 There is a wide literature in psychology where mathematical forms of RCs were evaluated, with the power-law form emerging as one of the best candidates.12 The hope is that the shape of the RC will shed some light into the mechanisms of forgetting, which did not bear fruit still (see, e.g., Ref. 13). The most accepted mechanism in the literature is “retrograde interference,” according to which memories are not erased passively but rather due to acquisition of new memories.14 In our experiments, we present to a participant a list of 500 common nouns at the speed of 1.5 s per word, interspersed with recognition trials of three types: one including a word presented two time steps before the test (two-back recognition); another one including a word presented ten time steps before the test (10-back recognition); and finally, the third type with one of the first 25 words of the list tested at various later time points during presentation. Recognition trials of all three types were presented throughout the trial at random times. The purpose of the two-back trials was to select the participants who were focused on the task; the ten-back trials were performed to check for the possible effects of fatigue and/or forward interference from previously presented words; finally, the third type of trials was the principal one to find the shape of the decay of recognition performance with the number of intervening items since presentation, i.e., the RC. Results of these experiments averaged over participants with perfect 2-back recognition are shown in Fig. 3. For participants with perfect 2-back performance, 10-back performance does not depend much on the position of the test, indicating that effects of pro-active interference are negligible. On the other hand, recognition of early items steadily declines toward the chance level of 50% with increasing lag between presentation and test. We also show a theoretical prediction for the shape of RC, resulting from the model that will be described in detail below.

FIG. 3.

Recognition experiments. (Lower panel) Experimental protocol. Vertical bars represent word presentations. Pairs of horizontal bars represent a delayed recognition task, where participants were presented with one word shown to them previously and one lure word. Participants were requested to click on the word they felt that they saw before. In total, 500 words were presented and all first 25 words were queried at different moments. Additionally, participants undertook recognition tests for the second (25) and tenth (25) back word from the time of inquiry. (Upper panel) Recognition performance for selected participants who reached perfect performance in a 2-back task. The blue solid line is the retention curve (recognition performance vs the lag between word presentation and delayed recognition task). The dashed lines represent theoretical retention curves computed with the recognition model described in Sec. IV for three different values of parameter n. Adapted from Ref. 9.

FIG. 3.

Recognition experiments. (Lower panel) Experimental protocol. Vertical bars represent word presentations. Pairs of horizontal bars represent a delayed recognition task, where participants were presented with one word shown to them previously and one lure word. Participants were requested to click on the word they felt that they saw before. In total, 500 words were presented and all first 25 words were queried at different moments. Additionally, participants undertook recognition tests for the second (25) and tenth (25) back word from the time of inquiry. (Upper panel) Recognition performance for selected participants who reached perfect performance in a 2-back task. The blue solid line is the retention curve (recognition performance vs the lag between word presentation and delayed recognition task). The dashed lines represent theoretical retention curves computed with the recognition model described in Sec. IV for three different values of parameter n. Adapted from Ref. 9.

Close modal

This model, introduced in a series of our previous publications, is based on three basic principles: (i) We assume that encoding and forgetting of items in memory is a binary process, i.e., at each moment, a given item is either present in memory or not, and all items that remain in memory after presentation of the list are candidates for recall; (ii) items of a given type are encoded in dedicated memory networks as sparse random groups of neurons, i.e., each neuron encodes a given memory with some small probability, independently for all neurons and items; and (iii) the recall trajectory is determined by the matrix of encoding overlaps that we call “similarity matrix” (SM) between items, computed as the number of common neurons for each pair of items. The trajectory is generated as follows: the first item is chosen randomly; at each step of the process, the next item is chosen as the one with the largest similarity to the current item, chosen out of all items except for the one that was visited at the previous step. Mathematically, each element of SM is defined as a scalar product of the binary index vectors for the corresponding items,

(3)

where N is the number of neurons in the memory network and ξik=1(0) if the neuron with index i participates (does not participate) in the encoding of the item with index k, which has a probability f(1 − f), where f is the sparseness of memory representations.

(i) is a simplifying assumption that allows us to consider, at each moment, the number of items from the list that are encoded in memory, which we denote as M. This, in turn, defines the probability to give a correct answer in a recognition trial as C=12(1+ML), which is derived by assuming that if the word is in memory, the participants give a correct answer; otherwise, it is guessing. Inverting this relation, and taking into account that different participants remember different number of items after list presentation, results in the following expression for the average number of items in memory, M:

(4)

(ii) and (iii) results in a simple recall algorithm illustrated in Fig. 4. For each row of the similarity matrix, we mark the position of the maximal and second-maximal elements [black and red spots, correspondingly, on Fig. 4(a)] and construct a graph with M nodes, where each node emits two arrows of corresponding colors pointing to the nodes given by the positions of black and red spots in the corresponding row. Beginning from a random node, the recall trajectory follows black arrows, unless it goes back to the previous node producing a 2-node loop, in which case the red arrow is chosen instead; see Fig. 4(b) for an example trajectory. Following this trajectory, one can see the “collision” where the previously visited node is reached for the second time (node 10), after which it transverses the original trajectory in the opposite order for several steps eventually breaking into new nodes, until finally converging to a cycle after the same transition is taken for the second time (12 → 16). As illustrated in this example, the recall model is mathematically quite involved and it is not currently clear how it can be solved to find the distribution of the trajectory lengths, corresponding to the number of words recalled, for an arbitrary M, over realizations of SM. In Ref. 10, we found a good asymptotic solution to this problem in the limit of large M by connecting it to a much simpler model with fully random SM with no restriction on avoiding 2-nodes loops (i.e., trajectory following black arrows on the corresponding graph). This model is then equivalent to a random map problem (also called “birthday paradox”), for which the trajectory enters a cycle after the first collision. Since all transitions in this model are equally probable, the probability for a collision with any one of the previously visited nodes are given by

(5)

The probability for having a trajectory of length R in a graph of M nodes can be easily written down as

(6)

The first moment of this distribution, m1 = ⟨R⟩, can be expressed via the Ramanujan function θ as

(7)

(see the  Appendix for the derivation), where the Ramanujan function quickly converges to 1/3 for large M, as can be seen from the following exact bound:15 

(8)

and all higher moments can also be computed, e.g.,

(9)

In the limit M, the first moment quickly converges to its asymptotic behavior,

(10)

In fact, one can derive the asymptotic expression for the distribution of R directly from Eq. (6) by replacing each bracket factor by the corresponding exponent, e.g., 1kM1exp(k/M), resulting in

(11)

from which all the moments can be computed. Our model with SM, defined as, the overlaps between random item encodings [Eq. (3)], is much more complex, and we currently do not have a precise formula for the probability distribution of R analogous to Eq. (6), the reasons for which will be apparent shortly. Here, we only consider the limit of very sparse encoding, f → 0 [see Eq. (3)], in which case the correlation between different elements of the SM can be neglected, and one can approximate the matrix of encoding overlaps as a random symmetric SM of size M by M (see Ref. 16 for the analysis of a more general case of finite f). This model still differs from the one considered above in two important ways. First, because of the symmetry of SM, the probability of a collision with any one of the previously visited nodes, which is given by p0 ≈ 1/M in the model with random SM, is now given by p0 ≈ 1/(2M), i.e., approximately two times less (we are considering the asymptotic limit of M in this analysis). The reason for this is that if, say, the process is currently at node k and we want to estimate the chance that it returns to the previously visited node l, we need to take into account that when the process was at node l, it did not choose the transition to node k, i.e., the SM element Skl is not the largest out of all M − 2 relevant elements of the l’s row of the SM. With this constraint, the chance that it will be the largest in the kth row is ∼1/(2M) as can be easily estimated.16 This estimation does not take into account other constraints, namely, that the current node k was not chosen at all previous steps of the process, but this constraint can be shown to be negligible in the asymptotic limit of large M. The second difference of the model from the one solved above is in the possibility of continuing the recall trajectory after the collision, as illustrated in Fig. 4(b). This happens if and only if the previous transition from the node on which a collision happens was following the red arrow, [10 → 7 in Fig. 4(b)], i.e., the largest element of the corresponding row of SM would bring the process back to the previous node, 14 in this case. One can estimate the probability for this event to be 1/3 asymptotically.16 Taking these two estimations together, the probability for a collision that results in a recall process entering a cycle is given by

(12)

This estimation ignores the cases in which the collision happens to the initial node of the process or to a node that was already transversed twice in opposite directions because in both of these cases, the process always enters a cycle; these cases, however, can be neglected in the asymptotic limit. Comparing Eqs. (5) and (12), we conclude that for large M, the statistic of recall trajectories in the model with symmetric SM asymptotically approaches that of the model with fully random SM with the substitution M → 3M, i.e., the probability distribution of the number of recalled items can be obtained from Eq. (11) as

(13)

with the corresponding moments being

(14)
(15)

Going back to Eq. (4), we can substitute this expression of M in terms of C in Eq. (1) and obtain the theoretical expression for the memory invariant mentioned above in Eq. (2) that is shown in Fig. 2.

FIG. 4.

Associative search model of free recall. (a) SM (similarity matrix) for a list of 16 items (schematic). For each recalled item, the maximal element in the corresponding row is marked with a black spot, while the second maximal element is marked with a red spot. (b) A graph with 16 nodes illustrates the items in the list. The recall trajectory begins with the first node and proceeds to an item with the largest similarity to the current one (black arrow) or the second largest one (red arrow) if the item with the largest similarity is the one recalled just before the current one. When the process returns to the tenth item, a second subtrajectory is opened up (shown with thinner arrows) and converges to a cycle after reaching the 12th node for the second time. Adapted from Ref. 10.

FIG. 4.

Associative search model of free recall. (a) SM (similarity matrix) for a list of 16 items (schematic). For each recalled item, the maximal element in the corresponding row is marked with a black spot, while the second maximal element is marked with a red spot. (b) A graph with 16 nodes illustrates the items in the list. The recall trajectory begins with the first node and proceeds to an item with the largest similarity to the current one (black arrow) or the second largest one (red arrow) if the item with the largest similarity is the one recalled just before the current one. When the process returns to the tenth item, a second subtrajectory is opened up (shown with thinner arrows) and converges to a cycle after reaching the 12th node for the second time. Adapted from Ref. 10.

Close modal

Figure 5 contains the results of numerical simulations that illustrate the convergence of distribution of recalled items to its asymptotic form with increasing M, the same for the first and second moments. Interestingly, the first moment converges to its asymptotic value of Eq. (14) much faster than the second moment, but we could not yet estimate analytically the finite-M corrections to the distribution function of R and the moments. Another interesting open feature of the model that can potentially be observed experimentally is the number of recall cycles for a given SM when choosing different initial items for recall. In the random SM model, the number of cycles was shown to grow very slowly with the size the matrix,17,19 but we did not manage to generalize this result to the symmetric SM model.

FIG. 5.

Comparison of simulations and asymptotic calculations for the recall model. Top row: probability distribution of the number of recall items, numerical simulations (blue histograms) and asymptotic approximation from Eqs. (14) and (15) (solid red curves) for M = 8, 32, 12, 512. Bottom two leftmost subplots show recall moments from these equations and from simulations. Bottom right: ratio of numerical values for two recall moments to their asymptotic estimates.

FIG. 5.

Comparison of simulations and asymptotic calculations for the recall model. Top row: probability distribution of the number of recall items, numerical simulations (blue histograms) and asymptotic approximation from Eqs. (14) and (15) (solid red curves) for M = 8, 32, 12, 512. Bottom two leftmost subplots show recall moments from these equations and from simulations. Bottom right: ratio of numerical values for two recall moments to their asymptotic estimates.

Close modal

In this section, we introduce a family of models of forgetting that are based on the idea of “retrograde interference,” according to which memories are erased due to acquisition of new memories rather than passively by the passage of time.14 The simplest way to realize this process is to assume that each acquired memory item is characterized by a scalar “valence” measure and that at every time step a new item is presented to the system with a valence randomly sampled over some distribution. Each time a new memory of valence V is acquired, either the whole set of existing memories with valences smaller than V are erased (model I) or only one memory of this set with the smallest valence is erased if the set is not empty (model II). Finally, the third model (model III) that we proposed generalizes model I to multidimensional valences such that each time a new memory with an n-dimensional valence V is acquired, the set of existing memories with valences smaller than V along each dimension are erased. Figure 6 illustrate these three models. Model I is very easy to solve: the probability that the item stays in memory for at least t steps since its acquisition, which we call retention curve RC, is the same as the probability that its valence is higher than all of the t subsequently presented items,

(16)

i.e., it has the power-law shape compatible with a variety of psychology studies on forgetting. The items that remain in memory after many steps of acquisition occupy the tail of the valence distribution, and moreover, at each moment, the valence of the retained memories is monotonically increasing with their “age” (time since acquisition). The average number of items accumulated in memory, N, grows with time as

(17)

The distribution of N(T) can also be computed with the observation that if one considers the valences of presented items in the backward order, from last to first, the items that are retained correspond to the running maxima of the valences, also called “high water marks.” The distribution of the number of retained items is, therefore, the same as the distribution of the number of high water marks over the permutations of a list of T numbers, which is given by the unsigned Stirling number of the first kind s(Tk). This result can be obtained as a bijection between the distribution of the number of high watermarks and the distribution of the number of cycles in the corresponding permutation group.22 Another way to compute it is through a relation to dominance in random games (see Ref. 19 for an explicit derivation of the distribution) yielding

(18)

where Stirling numbers of the first kind are defined algebraically,

(19)
FIG. 6.

Interference models of forgetting. Model I: each item is represented as a thin vertical bar. The height of the bar corresponds to the valence of an item. The top row bars above the black line represent items that are stored in memory just before the acquisition of a new item, shown on the right (sample). All the items that have smaller valences (bar height) than the new item are discarded from memory (crossed by red bar), and the new item is added. The bottom row represents the memory content after the new memory is acquired. Model II: the same as model I but only one item with the smallest valence that is smaller than that of the new item (if there is any) is discarded. Model III: the same as model I, but each memory item has two valences represented by the width and height of a rectangular. In this case, all the items that have both valences smaller than the corresponding valences of the new item are discarded. Modified from Ref. 9.

FIG. 6.

Interference models of forgetting. Model I: each item is represented as a thin vertical bar. The height of the bar corresponds to the valence of an item. The top row bars above the black line represent items that are stored in memory just before the acquisition of a new item, shown on the right (sample). All the items that have smaller valences (bar height) than the new item are discarded from memory (crossed by red bar), and the new item is added. The bottom row represents the memory content after the new memory is acquired. Model II: the same as model I but only one item with the smallest valence that is smaller than that of the new item (if there is any) is discarded. Model III: the same as model I, but each memory item has two valences represented by the width and height of a rectangular. In this case, all the items that have both valences smaller than the corresponding valences of the new item are discarded. Modified from Ref. 9.

Close modal

Model I cannot be considered realistic since the average number of items in memory only grows as a logarithm of the number of time steps, i.e., remains low in a lifetime if one assumes acquisition of a new memory every second. Models II and III are two possible ways to correct for this deficiency that we now consider one by one. Simulations of model II show an interesting behavior when the number of presented items is large: after a brief transient, items with valences above a certain threshold remain in memory indefinitely, while items below this threshold get erased eventually (see Fig. 7 that shows the probability for an item to remain in memory as a function of its valence for different total number of presented items). Showing that this is indeed the case mathematically requires some advanced techniques in probability theory (Ref. 23). If one assumes this behavior, the value of the threshold can be calculated in a following way (thanks to Friedgut for help with this derivation); without loss of generality, assume that item valences are uniformly distributed in the interval between 0 and 1. Denote the threshold as θ(0 < θ < 1). Each item below threshold (“IBT”) is eventually erased upon presentation of another item, which itself could be either an item above threshold (IAT) or below threshold. Denote p to be the fraction of IBTs that are erased by the presentation of one of IATs. Since each IAT erases exactly one IBT and all IATs remain in memory, we get

(20)

On the other hand, for each IBT with strength x (x < θ), the probability that it is erased by one of IBTs is 1θ1x; hence, p can also be obtained by averaging this probability over all x between 0 and θ,

(21)
FIG. 7.

Probability of memory retention in model II. Each line represents the probability of memory with the valence value (x-axis) to be retained after T time steps (denoted in legend). Since the model depends only on the order statistics of the valence values, there are only T different values of the valences. In order to compare models, we take vk=kT,k=1,,T. Distributions are estimated using 10 000 simulations of model II with random order statistics.

FIG. 7.

Probability of memory retention in model II. Each line represents the probability of memory with the valence value (x-axis) to be retained after T time steps (denoted in legend). Since the model depends only on the order statistics of the valence values, there are only T different values of the valences. In order to compare models, we take vk=kT,k=1,,T. Distributions are estimated using 10 000 simulations of model II with random order statistics.

Close modal

From the last two equations, we obtain θ=11e, which, in turn, implies that the number of items that are retained in memory after time T is Te. Hence, model II predicts that the number of remaining items in memory grows linearly with time and each memory has a finite probability 1/e to be available after an arbitrary time since acquisition, i.e., the retention curve is flat. We conclude that while the model is very interesting from a mathematical point of view, it does not provide a good account of experimental properties of forgetting. We, therefore, turn to model III, which is another, more successful attempt to fix model I by increasing the speed of accumulation of items in memory. Model III, as opposed to the other two models, is characterized by one free parameter, namely, the dimensionality of the valence measure for memory items, n.

The closed-form analytical solution for the retention curve in model III can be obtained by noting that if valence components are distributed uniformly between 0 and 1, for an item with valence u, the probability for a subsequent item to erase it is given by i=1n(1ui); hence, the probability for it to survive t consecutive items is 1i=1n(1ui)t. Averaging this expression over u results in the following expression for retention curve:

(22)

Alternatively, one can use the following recursive equation for this function:9 

(23)

To derive this equation inductively, consider an item acquired at time 0 followed by t other items. Let k be the rank of the original item among the group of t + 1 ones along the last valence dimension, i.e., k − 1 of the subsequently acquired t items have higher valence along this dimension, while the rest have lower valence and, hence, cannot erase the original item regardless of other dimensions. In order for the original item to survive for t time steps, it has to survive the k − 1 potentially “dangerous” items, thanks to the first n − 1 dimensions, the probability for this being RCn−1(k − 1) (definition of the retention function). Since all values of k from 1 to t + 1 are equally likely and, hence, have a probability of 1t+1, the total retention probability, averaged over possible values of k, is given by Eq. (23). One can use this equation to calculate the asymptotic behavior of RC for very large t,9,20

(24)

which has the same scaling as in the one-dimensional case [Eq. (16)] with a logarithmic correction.

As shown in Fig. 3 above, our experimental data with random lists of common nouns support the five-dimensional version of model III. Interestingly, our later experiments showed that this behavior is not universal: performing the same experiments with different types of items (verbs, short sentences, and sketches) results in retention curves better described by four-dimensional (for verbs) and seven-dimensional (sketches and sentences) variants of model III (Ref. 24).

Many properties of model III that could have experimental relevance beyond the retention curve remain open. Here, we illustrate some of them with numerical simulations. A set of items in model III can be viewed as a particular instance of a partially-ordered set (poset)25 with a product order if we say that two items are ordered if and only if one of them has higher valence in all dimensions (i.e., one item erases the other one if presented at a later time). Since any partial order can be extended to a total one20 one can always arrange the items such that all of them will remain in memory. The distribution of the smallest number of retained items over different realization of valences over the items for lists of 500 items, generated by simulating model III with n = 5, is shown in Fig. 8. This distribution can be compared with the distribution (over the realizations of valences) of the average number of retained items over different orderings of items in the list shown on the same figure.

FIG. 8.

Distribution of minimal and average retention in model III. Each histogram represents the probability to retain the specific number of items (x-axis, after 500 steps, n = 5) where items are random or sorted to retain minimal possible number of items. 100 000 simulations of model III were used to construct the histogram.

FIG. 8.

Distribution of minimal and average retention in model III. Each histogram represents the probability to retain the specific number of items (x-axis, after 500 steps, n = 5) where items are random or sorted to retain minimal possible number of items. 100 000 simulations of model III were used to construct the histogram.

Close modal

Another interesting feature of the model is possible dependencies between the retention of items in different positions. In Fig. 9, we showed several examples of simulated matrices of item-to-item correlations after the list presentation, obtained with n = 1, 3, 5, 7. As mentioned above, different values of n correspond to different types of memory items used in the experiments (namely, verbs, nouns, and visual sketches), so these correlations could potentially be measured in future experiments. While most of the correlations are quite weak, one can see interesting patterns of these correlations developing, especially for n = 3.

FIG. 9.

Correlations in model III. Correlation coefficients between memories retained after lag specified on axes obtained from 106 simulations are shown. To obtain correlation, we simulated model III and stored the presentation position of retained memories. Afterward, each simulation trial was represented by the sequence of “0”s (memory at this presentation time was forgotten by the end of presentation) and “1”s (memory retained). Each point on a graph corresponds to a correlation coefficient of “0” and “1” with the same lag to the end of presentation across simulation trials.

FIG. 9.

Correlations in model III. Correlation coefficients between memories retained after lag specified on axes obtained from 106 simulations are shown. To obtain correlation, we simulated model III and stored the presentation position of retained memories. Afterward, each simulation trial was represented by the sequence of “0”s (memory at this presentation time was forgotten by the end of presentation) and “1”s (memory retained). Each point on a graph corresponds to a correlation coefficient of “0” and “1” with the same lag to the end of presentation across simulation trials.

Close modal

We presented a number of mathematical models of human memory that are based on a small set of clear and intuitive principles and provide a good account for some of the experimental results involving recall and recognition of lists of randomly assembled words or sentences. The models have simple formulation and almost no free parameters; nevertheless, their mathematics is nontrivial and only partially explored. All of the models are formulated as deterministic discrete processes driven by certain intrinsic characterization of memory items, such as measures of their valence and inter-item similarities. Limiting the study to experiments with random lists of items allowed us to consider these measures as coming from simple statistical ensembles, resulting in the statistical nature of the mathematical results in terms of probability distributions and moments of memory performance in question. This statistical nature of the results should not hide the deterministic nature of the hypothesized memory processes that is crucial for understanding our results. Since the principles that govern the underlying processes assumed in our analysis are quite general and intuitive, they could be relevant in different contexts, such as random games.19 Whether the models considered in this study can be extended to describe memory for more natural type of information, such as meaningful narratives, is an open question for future studies.

We would like to thank Dr. Noga Alon for helpful discussions and Dr. Andrei Kupavski and Dr. Ehud Friedgut for help with mathematical derivations. This research has received funding from the European Union’s Horizon 2020 Framework Programme for Research and Innovation under Specific Grant Agreement No. 785907 (Human Brain Project SGA2), the Israeli Science Foundation (Grant Nos. 1657/19), EU-M-GATE 765549, and Foundation Adelis.

The authors have no conflicts to disclose.

Data sharing is not applicable to this article as no new data were created or analyzed in this study.

We would proceed with direct calculation. First, transform the probability distribution (6),

(A1)

Using this form, we can express the statistic as follows:

(A2)
(A3)
(A4)
(A5)
(A6)
(A7)
(A8)

We can immediately see that (6) is a distribution, since 1=1 [f(k) = 1 and f(Mn) − f(Mn − 1) = 0]. The mean value is

(A9)

Using results on Ramanujan Question 294,15,21

(A10)
(A11)

we can write the exact formula for the mean given in Eq. (7),

(A12)

which asymptotically converges to [Eq. (10)].15 

Next, we can compute

(A13)
(A14)
(A15)

from which Eq. (9) follows,

(A16)
(A17)
(A18)
(A19)
1.
N. R.
Shepard
, “
Recognition memory for words, sentences, and pictures
,”
J. Verbal Learn. Verbal Behavior
6
(
1
),
156
163
(
1967
).
2.
L.
Standing
, “
Learning 10000 pictures
,”
Q. J. Exp. Psychol.
25
(
2
),
207
222
(
1973
).
3.
B. B.
Murdock
, Jr.
, “
The immediate retention of unrelated words
,”
J. Exp. Psychol.
60
(
4
),
222
(
1960
).
4.
M. J.
Kahana
, “
Associative retrieval processes in free recall
,”
Mem. Cognit.
24
(
1
),
103
109
(
1996
).
5.
R. M.
Shiffrin
, “
Memory search
,” in
Models of Memory
, edited by D. A. Norman (Academic Press, New York, 1970), pp.
375
447
.
6.
R. M.
Shiffrin
and
M.
Steyvers
, “
A model for recognition memory: REM—Retrieving effectively from memory
,”
Psychon. Bull. Rev.
4
(
2
),
145
166
(
1997
).
7.
M. J.
Kahana
, “
Computational models of memory search
,”
Annu. Rev. Psychol.
71
(
3
),
107
138
(
2020
).
8.
M.
Katkov
,
S.
Romani
, and
M.
Tsodyks
, “
Memory retrieval from first principles
,”
Neuron
94
(
5
),
1027
1032
(
2017
).
9.
A.
Georgiou
,
M.
Katkov
, and
M.
Tsodyks
, “
Retroactive interference model of forgetting
,”
J. Math. Neurosci.
11
(
1
),
4
(
2021
).
10.
M.
Naim
,
M.
Katkov
,
S.
Romani
, and
M.
Tsodyks
, “
Fundamental law of memory recall
,”
Phys. Rev. Lett.
124
(
1
),
018101
(
2020
).
11.
W. A.
Roberts
, “
Free recall of word lists varying in length and rate of presentation: A test of total-time hypotheses
,”
J. Exp. Psychol.
92
(
3
),
365
(
1972
).
12.
J. T.
Wixted
and
E. B.
Ebbesen
, “
On the form of forgetting
,”
Psychol. Sci.
2
(
6
),
409
415
(
1991
).
13.
M. J.
Kahana
and
M.
Adler
, “
Note on the power law of forgetting
,” (published online, 2022).
14.
J. T.
Wixted
, “
The psychology and neuroscience of forgetting
,”
Annu. Rev. Psychol.
55
,
235
269
(
2004
).
15.
P.
Flajolet
,
P. J.
Grabner
,
P.
Kirschenhofer
, and
H.
Prodinger
, “
On Ramanujan’s Q-function
,”
J. Comput. Appl. Math.
58
(
1
),
103
116
(
1995
).
16.
S.
Romani
,
I.
Pinkoviezky
,
A.
Rubin
, and
M.
Tsodyks
, “
Scaling laws of associative memory retrieval
,”
Neural Comput.
25
(
10
),
2523
2544
(
2013
).
17.
B.
Harris
, “
Probability distributions related to random mappings
,”
Ann. Math. Stat.
31
,
1045
1062
(
1960
).
18.
P.
Flajolet
and
R.
Sedgewick
,
Analytic Combinatorics
, 1st ed. (
Cambridge University Press
,
2009
).
19.
Noga
Alon
Kirill
Rudov
,
Leeat
Yariv
, “
Dominance solvability in random games
,” arXiv:2105.10743v1.
20.
A. R. D.
Mathias
, “
The order extension principle
,”
Proc. Symp. Pure Math.
13
(
2
),
179
183
(
1974
).
21.
S.
Ramanujan
, “
Question 294
,”
J. Indian Math. Soc.
3
(
128
),
4
(
1911
).
22.
J.
Zuiddam
, private communication (2020).
23.
N.
Alon
, “
On a random model of forgetting
,” arXiv:2203.02614 (
2022
).
24.
A.
Georgiou
 et al., “Forgetting dynamics for items of different categories” (to be published).
25.
B.
Dushnik
and
E. W.
Miller
, “
Partially ordered sets
,”
Am. J. Math.
63
(
3
),
600
610
(
1941
).