Synaptic strength can be seen as probability to propagate impulse, and according to synaptic plasticity, function could exist from propagation activity to synaptic strength. If the function satisfies constraints such as continuity and monotonicity, the neural network under external stimulus will always go to fixed point, and there could be one-to-one mapping between the external stimulus and the synaptic strength at fixed point. In other words, neural network “memorizes” external stimulus in its synapses. A biological classifier is proposed to utilize this mapping.

Known experiment results show that synaptic connection strengthens or weakens over time in response to increases or decreases in impulse propagation.1 It is also postulated that “neurons that fire together wire together”.2,3 This biochemical mechanism, called synaptic plasticity,4,5 is believed to play a critical role in the memory formation,6–9 although it is still argued if synapse is the sole locus of learning and memory.10,11 Meanwhile, a synapse propagates impulses stochastically,12–14 which means that synaptic strength could be measured with the probability of propagating an impulse successfully. With this probabilistic treatment we find out that, in the plasticity process a synapse’s strength would be inevitably attracted towards the same fixed point regardless of its initial strength, and for a neural network there could exist a one-to-one mapping between the external stimulus from environment and the synapses’ strength at fixed point. This one-to-one mapping serves the very purpose of ideal memory: to develop different stable neural state for different stimulus from the environment, and develop the same stable neural state for the same stimulus no matter what state is initialized with. It follows that the synapses alone could sufficiently give rise to persistent memory: they could the sole locus of learning and memory.

The remainder of paper goes as follows. Section II identifies the constraints under which synaptic plasticity of one synaptic connection leads to its fixed state and one-to-one stimulus-state mapping (memory). Section III extends the concepts of fixed state and one-to-one mapping for the neural network consisting of many synaptic connections. Section IV proposes a simple neural classifier utilizing this memory to classify handwritten digit images.

Let us start with one synaptic connection as shown in FIG 1. In nature, synapses are known to be plastic, low-precision and unreliable.15 This stochasticity allows us to assume synaptic strength s to be the probability (reliability) of propagating a nerve impulse through, instead of being weight (usually unbounded real number) as in Artificial Neural Network16 (ANN). Easily we have y = xs where x, s, y∈[0, 1]. Now we treat synaptic plasticity, i.e. the relation between synaptic strength s and simultaneous firing probability y, as a function

s*=λ(y).
(1)
FIG. 1.

A synaptic connection with strength s is directed from neuron 1 to neuron 2. The stimulus from environment or upstream neurons stimulates neuron 1 to fire action potential with probability x. The synaptic connection propagates nerve impulse (action potential) to neuron 2. As a result, neuron 2 is stimulated to fire with probability y. That is, neuron 1 and 2 fire simultaneously (“fire together”) with probability y.

FIG. 1.

A synaptic connection with strength s is directed from neuron 1 to neuron 2. The stimulus from environment or upstream neurons stimulates neuron 1 to fire action potential with probability x. The synaptic connection propagates nerve impulse (action potential) to neuron 2. As a result, neuron 2 is stimulated to fire with probability y. That is, neuron 1 and 2 fire simultaneously (“fire together”) with probability y.

Close modal

Here s*∈[0, 1] represents the target value that a connection’s strength will be strengthened or weakened to if the connection is under constant simultaneous firing probability y (while s in y = xs represents current strength). By y = xs and Eq. (1), we have s* = λ(xs) stating that, under constant stimulus probability x, the connection initialized with strength s will evolve towards s*.

Function λ of Eq. (1) truly links “firing together” and “wiring together”. For comparison, Hebbian learning rule2 treats synaptic plasticity, in the context of ANN, as a function Δw = ηxy to learn connections’ weight from the training patterns; the function translates “firing together” into “neuron’s input and output both being positive or negative”. Different from ANN, our model actually makes no assumption of neuron being computational unit, and aims to show that with λ stimulus could sufficiently and precisely control the enduring fixed state of synaptic connection. The following reasoning will hinge on this “target strength function” λ, and we will put constrains on this uncharted function to see how they affect the dynamics of connection strength and most importantly how stimulus is one-to-one mapped to the strength at fixated state.

Here is our first constraint: λ is continuous on y. This constraint is neurobiologically justifiable regarding synaptic plasticity, since sufficiently small change in impulse probability would most probably result in arbitrarily small change in synaptic strength. In that case, given any x, λ(xs) is a continuous function on s from unit interval [0, 1] to unit interval [0, 1], and according to Brouwer’s fixed-point theorem17 there must exist a fixed point s+∈[0, 1] such that s+ = λ(xs+): connection strength at s+ will evolve to s+ and hence fixate, no longer strengthened or weakened. Here the crucial Brouwer’s theorem is a fixed-point theorem in topology, which states that, for any continuous function f(t) mapping a compact convex set (e.g. interval [0, 1] in our case; could be multi-dimensional) to itself, there is always a point t+ such that f(t+) = t+. Moreover, as illustrated in FIG 2, given any initial value the strength is always attracted towards fixed point. Therefore, a gentle constraint of continuity on λ function could preferably drive synaptic connection to the fixed state.

FIG. 2.

Two examples of λ(xs) are depicted as red bold lines, and their fixed points as blue dots. (a) Given any initial s1 < λ(xs1), there must exist a fixed point s+∈(s1, 1]; strength s tends to increase from s1 as long as target strength λ(xs) > s. Given any initial s2 > λ(xs2), there must exist a fixed point s+∈[0, s2); strength s tends to decrease from s2 as long as target strength λ(xs) < s. Controlled by these two tendencies, s will reach and stay at fixed point s+ such that s+ = λ(xs+). (b) There are three fixed points s1+,s2+,s3+. Starting from any initial s1(s1+,s3+), strength decreases to s1+. Starting from any initial s2(s3+,s2+), strength increases to s2+. Then strength tends to leave unstable fixed point s3+ for stable s1+ or s2+. Note that if countable fixed points exist for λ(xs), one of them must be stable.

FIG. 2.

Two examples of λ(xs) are depicted as red bold lines, and their fixed points as blue dots. (a) Given any initial s1 < λ(xs1), there must exist a fixed point s+∈(s1, 1]; strength s tends to increase from s1 as long as target strength λ(xs) > s. Given any initial s2 > λ(xs2), there must exist a fixed point s+∈[0, s2); strength s tends to decrease from s2 as long as target strength λ(xs) < s. Controlled by these two tendencies, s will reach and stay at fixed point s+ such that s+ = λ(xs+). (b) There are three fixed points s1+,s2+,s3+. Starting from any initial s1(s1+,s3+), strength decreases to s1+. Starting from any initial s2(s3+,s2+), strength increases to s2+. Then strength tends to leave unstable fixed point s3+ for stable s1+ or s2+. Note that if countable fixed points exist for λ(xs), one of them must be stable.

Close modal

To verify connection strength’s tendency towards fixed points, we design Algorithm 1 to simulate our connection model. In this simulation,18 recent simultaneous firings are recorded and the rate is supposed to approximate the simultaneous firing probability y; the connection updates its strength by a small step Δs = 10−4 each iteration to the direction of target strength. As shown in FIG 3, we run the simulation for four typical target strength functions, and the strength trajectories resulted show that the constraint of continuity ensures the tendency towards fixed points given any initial strength.

ALGORITHM 1

Connection strength’s tendency to fixed points.

Input: stimulus probability x, initial synaptic strength s0, target 
strength function λ, strength step Δs, and interations I
Output: trajectory of strength s
1: initialize fire-together recorder (104-entries array): recorder⇐0. 
2: initialize fire-together recorder pointer: p←0. 
3: initialize current strength: ss0
4: fori = 0 toIdo 
5: preset current pointed entry of recorder: recorder[p]←0. 
6: pick random r1 and r2 from uniform distribution Unif(0, 1). 
7: ifx > r1 ands > r2 then 
8: neuron 1 and 2 fire together: recorder[p]←1. 
9: endif 
10: if recorder has been traversed once (i≥104) then 
11: set y with the proportion of 1-entries in recorder. 
12: set target strength: s*λ(y). 
13: ifs* > sthen 
14: step-increase current strength: smin(s + Δs, 1). 
15: end if 
16: ifs* < sthen 
17: step-decrease current strength: smax(0, s−Δs). 
18: end if 
19: end if 
20: forward recorder pointer: p←(p + 1) mod 104
21: end for 
Input: stimulus probability x, initial synaptic strength s0, target 
strength function λ, strength step Δs, and interations I
Output: trajectory of strength s
1: initialize fire-together recorder (104-entries array): recorder⇐0. 
2: initialize fire-together recorder pointer: p←0. 
3: initialize current strength: ss0
4: fori = 0 toIdo 
5: preset current pointed entry of recorder: recorder[p]←0. 
6: pick random r1 and r2 from uniform distribution Unif(0, 1). 
7: ifx > r1 ands > r2 then 
8: neuron 1 and 2 fire together: recorder[p]←1. 
9: endif 
10: if recorder has been traversed once (i≥104) then 
11: set y with the proportion of 1-entries in recorder. 
12: set target strength: s*λ(y). 
13: ifs* > sthen 
14: step-increase current strength: smin(s + Δs, 1). 
15: end if 
16: ifs* < sthen 
17: step-decrease current strength: smax(0, s−Δs). 
18: end if 
19: end if 
20: forward recorder pointer: p←(p + 1) mod 104
21: end for 
FIG. 3.

The simulation results of Algorithm 1 for four typical λ functions. For each λ, eleven trails parameteried with incremental initial strength s0 are run for 105 iterations; all trails share the same stimulus probability x = 0.8. In each λ’s subfigure, the left chart depicts λ(y) as black line, its horizontally scaled λ(xs) = λ(0.8s) in red line and fixed points as blue dots; the right chart shows the strength trajectories starting from incremental s0. (a) λ(y) = 0.9y + 0.05. There exists one single fixed point for λ(0.8s). All strength trajectories converge to this fixed point. (b) λ(y) = 0.5sin(4πy) + 0.5. There are three fixed points for λ(0.8s), two of which are stable ones for the trajectories to converge to with oscillation. (c) λ(y) = −y + 1. All trajectories converge to one single fixed point. (d) λ(y) is discontinuous at y = 0.5. There is no fixed point since there is no s∈[0, 1] such that λ(0.8s) = s, and consequently the trajectories don’t converge. Instead, they fluctuate within “fixed interval”, which as we will see is a useable compromise of fixed point.

FIG. 3.

The simulation results of Algorithm 1 for four typical λ functions. For each λ, eleven trails parameteried with incremental initial strength s0 are run for 105 iterations; all trails share the same stimulus probability x = 0.8. In each λ’s subfigure, the left chart depicts λ(y) as black line, its horizontally scaled λ(xs) = λ(0.8s) in red line and fixed points as blue dots; the right chart shows the strength trajectories starting from incremental s0. (a) λ(y) = 0.9y + 0.05. There exists one single fixed point for λ(0.8s). All strength trajectories converge to this fixed point. (b) λ(y) = 0.5sin(4πy) + 0.5. There are three fixed points for λ(0.8s), two of which are stable ones for the trajectories to converge to with oscillation. (c) λ(y) = −y + 1. All trajectories converge to one single fixed point. (d) λ(y) is discontinuous at y = 0.5. There is no fixed point since there is no s∈[0, 1] such that λ(0.8s) = s, and consequently the trajectories don’t converge. Instead, they fluctuate within “fixed interval”, which as we will see is a useable compromise of fixed point.

Close modal

Our goal is to establish a one-to-one mapping between the stimulus and the connection strength at fixed point. Specifically, we could (1) given any stimulus x∈[0, 1], identify the fixed point s+ of connection strength without ambiguity; (2) given any strength s+∈[0, 1] at fixed point, identify stimulus x without ambiguity. Among the four target strength functions in FIG 3, λ(y) = 0.9y + 0.05 and λ(y) = −y + 1 can lead to one-to-one stimulus-strength mapping. Given any stimulus x, a synaptic connection equipped with one of these functions will have one single fixed point of strength regardless of its initial strength, such that the relation between stimulus and fixed point strength can be treated as a function s+ = θ(x). In FIG 4, simulation shows that θ could be strictly monotonic and hence one-to-one mapping from x to s+, such that θ(x) has one-to-one inverse function θ−1(s+). By contrast, FIG 5 shows that λ(y) = 0.5sin(4πy) + 0.5 cannot ensure the uniqueness of fixed point and thus there is no such one-to-one θ(x); FIG 6 shows that there is no θ either for the discontinuous λ function in FIG 3(d).

FIG. 4.

The simulation results of Algorithm 1 to reveal the relation of stimulus probability x and fixed point strength s+. For each λ, simulation is parameterized with incremental x (rather than x = 0.8 as in FiG 3) and randomized s0. Ten trails are run for each incremental x, and the ten converged s values are averaged to be the s+ value corresponding to its input x. The red line represents the averaged s+ values from simulation, while the blue line represents the true s+ = θ(x). (a) Simulation is parameterized with λ(y) = 0.9y + 0.05 and the results match θ(x) = 0.05/(1−0.9x) which is monotonically increasing. (b) Simulation is parameterized with λ(y) = −y + 1 and the results match θ(x) = 1/(1 + x) which is monotonically decreasing.

FIG. 4.

The simulation results of Algorithm 1 to reveal the relation of stimulus probability x and fixed point strength s+. For each λ, simulation is parameterized with incremental x (rather than x = 0.8 as in FiG 3) and randomized s0. Ten trails are run for each incremental x, and the ten converged s values are averaged to be the s+ value corresponding to its input x. The red line represents the averaged s+ values from simulation, while the blue line represents the true s+ = θ(x). (a) Simulation is parameterized with λ(y) = 0.9y + 0.05 and the results match θ(x) = 0.05/(1−0.9x) which is monotonically increasing. (b) Simulation is parameterized with λ(y) = −y + 1 and the results match θ(x) = 1/(1 + x) which is monotonically decreasing.

Close modal
FIG. 5.

The simulation results of Algorithm 1 for λ(y) = 0.5sin(4πy) + 0.5 in FIG 3(b) to identify the relation between x and s+. (a) Given x = 0.3 there is one single fixed point regardless of initial strength s0. (b) As with x = 0.8 in FIG 3(b), given x = 0.9 there are two stable fixed points. Higher s0 converges to higher fixed point; lower s0 converges to lower one; No convergence to the middle unstable fixed point. (c) Trails with incremental x are run and the averaged s+ values are depicted as in FIG 4. From x = 0.65 and upwards, there are two possible stable fixed points to converge to depending on what value initial strength is randomized to, which means that there exists no θ function from x to s+.

FIG. 5.

The simulation results of Algorithm 1 for λ(y) = 0.5sin(4πy) + 0.5 in FIG 3(b) to identify the relation between x and s+. (a) Given x = 0.3 there is one single fixed point regardless of initial strength s0. (b) As with x = 0.8 in FIG 3(b), given x = 0.9 there are two stable fixed points. Higher s0 converges to higher fixed point; lower s0 converges to lower one; No convergence to the middle unstable fixed point. (c) Trails with incremental x are run and the averaged s+ values are depicted as in FIG 4. From x = 0.65 and upwards, there are two possible stable fixed points to converge to depending on what value initial strength is randomized to, which means that there exists no θ function from x to s+.

Close modal
FIG. 6.

The simulation results to find the θ function with respect to the discontinuous λ in FIG 3(d). When x≳0.6, strength can evolve to any point within a “fixed interval” each time simulation is finished. The absence of fixed point doesn’t allow the existence of θ.

FIG. 6.

The simulation results to find the θ function with respect to the discontinuous λ in FIG 3(d). When x≳0.6, strength can evolve to any point within a “fixed interval” each time simulation is finished. The absence of fixed point doesn’t allow the existence of θ.

Close modal

In fact, we can pinpoint more constraints on λ as the conditions for function θ to be one-to-one mapping. In addition to constraint of continuity, let λ(y) be strictly monotonic on [0, 1] and hence one-to-one; let λ(0)≠0 to rule out fixed point s+ = 0. In that case, λ has inverse function λ−1(s) which is strictly monotonic between λ(0) and λ(1), and given any fixed point strength s+ between we can identify stimulus x = λ−1(s+)/s+. That is, function θ−1(s+) = λ−1(s+)/s+ exists. Let λ1(s)/s be strictly monotonic between λ(0) and λ(1). Then given any stimulus x∈[0, 1] there is one single fixed point s+ such that x = λ−1(s+)/s+. That is, function s+ = θ(x) exists. Both of λ(y) = 0.9y + 0.05 and λ(y) = −y + 1 obey all those constraints and their one-to-one θ functions can be verified by the simulation results in FIG 4, whereas λ(y) = 0.5sin(4πy) + 0.5 is not even strictly monotonic. However, neither λ(y) = 0.9y + 0.05 nor λ(y) = −y + 1 is ideal for our purpose. Guided by these constraints, we choose λ function carefully such that its derived θ(x) function is monotonically increasing and range of which spans nearly the entire [0, 1] interval, as shown in FIG 7. Of all the λ constraints, continuity and strong monotonicity are reasonable requirements of consistency on the neurobiological process of synaptic plasticity, whereas λ(0)≠0 and strong monotonicity of λ−1(s)/s are rather specific and peculiar claims. Admittedly, those λ constraints need to be supported by experimental evidences.

FIG. 7.

(a) Our choices of λ functions are λL(y)=0.99y+0.01 in blue and λT(y)=21+e4.4(y+0.01)1 in green. Here λT is a segment of shifted and scaled Sigmoid function. They both obey the λ constraints as discussed previously. (b) Simulation results show that, λL leads to linear-like θL in blue such that θL(x) ≈ x, and λT leads to threshold-like θT in green.

FIG. 7.

(a) Our choices of λ functions are λL(y)=0.99y+0.01 in blue and λT(y)=21+e4.4(y+0.01)1 in green. Here λT is a segment of shifted and scaled Sigmoid function. They both obey the λ constraints as discussed previously. (b) Simulation results show that, λL leads to linear-like θL in blue such that θL(x) ≈ x, and λT leads to threshold-like θT in green.

Close modal

Now we have the one-to-one (continuous and strictly monotonic) functions λ, λ−1, θ and θ−1, and in those functions initial strength s0 is irrelevant. Given s+ we can identify x and y without ambiguity, and vice versa. Our interpretation of these mappings is, the synaptic connection at fixed point precisely “memorizes” the information of what (stimulus) it senses and how it responses (with impulse propagation).

Now let us turn to the neural network shown in FIG 8. A neural network could be treated as an “aggregate connection” as it turns out. We shall see that, the definitions and reasoning for neural network align well with neural connection in last section.

FIG. 8.

A neural network (of one or multiple agents) consists of n≥2 neurons and c≥1 directed synaptic connections. An example of n = 8 and c = 7 is depicted. Each neuron receives stimulus from the environment with probability and propagates out nerve impulses throughout the synaptic connections, e.g., triggered by stimulus neuron 1 propagates impulses stochastically down along the directed paths 1⇝7⇝8⇝2 and 1⇝7⇝8⇝5⇝3. Cyclic path (e.g. 3⇝8⇝5⇝3) is allowed and yet loop (e.g. 3⇝3) isn’t. Each neuron could have either outbound or inbound connections, or neither, or both.

FIG. 8.

A neural network (of one or multiple agents) consists of n≥2 neurons and c≥1 directed synaptic connections. An example of n = 8 and c = 7 is depicted. Each neuron receives stimulus from the environment with probability and propagates out nerve impulses throughout the synaptic connections, e.g., triggered by stimulus neuron 1 propagates impulses stochastically down along the directed paths 1⇝7⇝8⇝2 and 1⇝7⇝8⇝5⇝3. Cyclic path (e.g. 3⇝8⇝5⇝3) is allowed and yet loop (e.g. 3⇝3) isn’t. Each neuron could have either outbound or inbound connections, or neither, or both.

Close modal

As with synaptic connection, we can describe a neural network by defining (1) the external stimulus as an n-dimensional vector X∈[0,1]n in which each xi is the probability of neuron i receiving stimulus; (2) the strength of all connections as a c-dimensional vector S∈[0,1]c in which each sij is the strength of connection from neuron i to neuron j (denoted as ij); (3) the simultaneous firing probabilities over all connections as a c-dimensional vector Y ∈[0,1]c in which each yij is the simultaneous firing probability over ij. In fact, one single neural connection is a special case of neural network with c = 1 and n = 2.

Stimulus and strength uniquely determine impulses propagation within neural network, so there exists a mapping Ψ:(X, S) → Y. Presumably, the mapping Ψ is continuous on S. By Eq. (1), there exists a mapping Λ:YS* such that sij*=λij(yij) for each yij in Y and its counterpart sij* in S*. Here S*∈[0,1]c is c-dimensional vector of connections’ target strength, and mapping Λ could be visualized as a vector of target strength functions such that entry Λij is λij. Then with mapping Ψ and Λ we have a composite mapping Λ○Ψ:(X, S) → S*. If each λij function is continuous on its yij, mapping Λ○Ψ must be continuous on S and according to Brouwer’s fixed-point theorem given X there must exist one fixed point S+∈[0,1]c such that Λ○Ψ(X, S+) = S+. And under constant stimulus X, neural network will go to fixed point S+ as each connection ij goes to its fixed point sij+. Our simulation verifies that tendency as shown in FIG 9. In this simulation, impulses traverse the neural network stochastically such that each neuron is fired at most once per iteration; synaptic connections update their strength as in Algorithm 1.

FIG. 9.

Simulation results of neural network’s tendency for the four typical λ functions as in FIG 3. The neural network has n = 8 and c = 19. The following observations hold true for any external stimulus and connections configuration: (a) If all connections are equipped with λ(y) = 0.9y + 0.05, the whole neural network has one single fixed point and the trajectories of mean of all connections’ strength converge to one point. (b) λ(y) = 0.5sin(4πy) + 0.5. Because each connection has two stable fixed points, there are 219 stable fixed points for the whole neural network and 20 possible convergence points of strength mean. (c) λ(y) = −y + 1. There is one single fixed point for the neural network. The trajectories converge to one point. (d) Discontinuous λ. The neural network has no fixed point as each synaptic connection has no fixed point. The trajectories don’t converge.

FIG. 9.

Simulation results of neural network’s tendency for the four typical λ functions as in FIG 3. The neural network has n = 8 and c = 19. The following observations hold true for any external stimulus and connections configuration: (a) If all connections are equipped with λ(y) = 0.9y + 0.05, the whole neural network has one single fixed point and the trajectories of mean of all connections’ strength converge to one point. (b) λ(y) = 0.5sin(4πy) + 0.5. Because each connection has two stable fixed points, there are 219 stable fixed points for the whole neural network and 20 possible convergence points of strength mean. (c) λ(y) = −y + 1. There is one single fixed point for the neural network. The trajectories converge to one point. (d) Discontinuous λ. The neural network has no fixed point as each synaptic connection has no fixed point. The trajectories don’t converge.

Close modal

Generally the number of stable fixed points for a neural network is cfij where each fij is the number of stable fixed points of ij. As in FIG 9(b), cfij can be enormous when each fij ≥ 2. As with synaptic connection, our goal is to establish one-to-one mapping between stimulus X and fixed point S+ for neural network and meanwhile keep initial strength S0 out of picture. λ’s continuity alone cannot ensure the uniqueness of fixed point, such that S0 can determine which fixed point to go for. Now with all the λ constraints, we have: (1) Λ is a one-to-one mapping and thus has inverse mapping Λ−1:S*Y; (2) there exists a mapping Θ:XS+, because under stimulus X the neural network will go to the same unique fixed point S+ no matter what initial strength S0 to begin with; (3) if Θ is a one-to-one mapping, Θ has inverse mapping Θ−1:S+X. With mapping Λ, Λ−1, Θ and Θ−1 being one-to-one, given S+ we can identify X and Y without ambiguity, and vice versa. Therefore, the same interpretation with respect to synaptic connection could apply here: the neural network at fixed point precisely “memorizes” the information about the stimulus on many neurons and the impulse propagation across many connections.

Nevertheless, even all of λ constraints are not sufficient to secure one-to-one Θ:XS+ for a neural network, as opposed to the neural connection. Here is a case. For Θ to be one-to-one, all neurons must have outbound connection. Otherwise, e.g., for a neural network with three neurons (say 0, 1 and 2) and two connections (say 0⇝1 and 1⇝2), stimulus X1 = (1, 1, 0) and X2 = (1, 1, 1) will result in the same fixed point because stimulus on neuron 3, no matter what it is, affects no connection. Or equivalently, for Θ to be one-to-one, the definition of X should consider only the neurons with outbound connections such that X’s dimension dim(X) ≤ n. In the perspective of information theory,19 many-to-one Θ introduces equivocation to the neural network at fixed point, as if information loss occurred due to noisy channel. If dim(X) > dim(S) = c, mapping Θ conducts “dimension reduction” on stimulus X, and information loss is bound to occur.

Here is a trivial case regarding stimulus dependence. Consider a neural network with 0⇝2, 1⇝2 and 2⇝3, and stimulus X = (x0, x1). When the neural network is at fixed point, x2=x0s02++x1s12+s02+s12+x0Pr(1|0) where Pr(1|0) is the probability of neuron 1 being stimulated conditional on neuron 0 being stimulated. Pr(1|0)≠x1 if stimulus on neuron 1 and 2 are not independent. Pr(1|0) affects s23+ and hence S+, or in other words the neural network at fixed point gains the hidden information of Pr(1|0). However, if Pr(1|0) varies, given mere X there will be uncertainty about S+ such that mapping Θ doesn’t exist unless stimulus X is “augmented” to X = (x0, x1, Pr(1|0)).

Ideally, a neural network with memory of stimulus X — formally, mapping Θ casts memory of stimulus X as fixed point S+ — should response to stimulus X more “intensely” than the neural network with different memory responses to X. Memory would manifest itself as impulses propagation throughout ensemble of neurons.2,20–22 Thus, it is natural to differentiate response by counting the neurons fired or synaptic connections propagated by impulses. Given the reasoning that synapse could be the sole locus of memory,10,11 we adopt the count of synaptic connections propagated as a macroscopic measure of how intensely memory responses to stimulus or stimulus “recalls” memory. And accordingly we propose a classifier consisting of g neural networks, which classifies stimulus into one of g classes by the decision criteria of which neural network gets the most synaptic connections propagated. Reminiscent of supervised learning,23 each neural network of our classifier is trained to its fixed point by its particular training stimulus, and then a testing stimulus is tested on all g neural networks independently to see which gets the most connections propagated. For simplicity we assume testing itself doesn’t jeopardize the fixed points of neural networks. And most importantly we assume that for each neural network given any stimulus there is one single fixed point such that mapping Θ:XS+ exists.

Consider a neural network in the classifier to be trained by X̌ to fixed point S+ and then tested by X. In other words, neural network memorizing X̌ as S+ is tested by X. Because impulses propagate across the neural network stochastically, the count of synaptic connections propagated in one test should be random variable. Let it be ZX̌X. Then for the neural network in FIG 8ZX̌X=czij where each r.v. zijBernoulli(xisij+), i.e., synaptic connection ij is propagated with probability xisij+ in the test such that Pr(zij=1)=xisij+. Easily zij’s expected value is E[zij]=xisij+, and its variance is Var(zij)=xisij+(1xisij+). By central limit theorem, Z’s distribution could tend towards Gaussian-like (bell curve) as c increases, even if all zij are not independent and identically distributed. We have

E[ZX̌X]=ijcE[zij]=ijcxisij+.
(2)

And when c is large,

Var(ZX̌X)ijcVar(zij)=ijcxisij+(1xisij+).
(3)

For any ij, in the training stage because S+=Θ(X̌) we have sij+=θij(X̌), and in the testing stage xi is uniquely determined by S+ and X such that xi is a function of X̌ and X.

We experiment with this classifier to classify handwritten digit images.24 Ten identical neural networks (hence g = 10) of FIG 10, each designated for a digit from 0 to 9, are trained to their fixed points by their training images in FIG 11 as stimulus, and then testing images, also as stimulus, are classified into the digit whose designated neural network gets the biggest Z value. We run many tests to evaluate classification accuracy, and collect Z values to approximate r.v. Z’s distribution. With all synaptic connections equipped with λL in FIG 7, the classifier has accuracy ∼44%, and ∼51% with λT. Note that, equipped with λL or λT, the neural network of FIG 10 will have one-to-one ΘL or ΘT according to last section. FIG 12 and FIG 13 show that, in positive testing (e.g. digit-6 image is tested in neural network trained by digit-6 images), Z’s expected value (sample mean) could be considerably bigger than that in negative testing (e.g. digit-6 image is tested in neural network trained by digit-1 images), so as to discriminate digit-6 images from the others. Given the same testing image classification target can be different test by test since the ten Z outcomes are randomized. To improve classification accuracy, we shall distance the distribution of positive testing Z as far as possible from those of negative testing Z. We present another two special neural networks in FIG 14 to demonstrate how our classifier utilizes memory to classify images and how to improve its accuracy in the neurobiological way.

FIG. 10.

The depicted neural network is basically the general one in FIG 8 except that, to mimic real-life nervous system, an array of sensor neurons are specialized for receiving stimulus from no other neurons but the environment. There are 64 sensor neurons to accommodate 8×8-pixel image, and the rest are a cluster of 50 neurons. Each sensor neuron has 6 outbound connections towards cluster, and each cluster neuron has 5 outbound connections towards within cluster. Connections are randomly put between neurons before training.

FIG. 10.

The depicted neural network is basically the general one in FIG 8 except that, to mimic real-life nervous system, an array of sensor neurons are specialized for receiving stimulus from no other neurons but the environment. There are 64 sensor neurons to accommodate 8×8-pixel image, and the rest are a cluster of 50 neurons. Each sensor neuron has 6 outbound connections towards cluster, and each cluster neuron has 5 outbound connections towards within cluster. Connections are randomly put between neurons before training.

Close modal
FIG. 11.

A digit image has 8×8 = 64 pixels, and pixel grayscale is normalized to the value between 0 and 1 (by dividing 16) as stimulus probability. The upper row shows samples of digit images, and the lower row shows the better written “average images”, each of which is actually pixel-wise average of a set of images of a digit. Each neural network is trained in each iteration by the same “average image”, or equivalently in each iteration by image randomly drawn from the set of images.

FIG. 11.

A digit image has 8×8 = 64 pixels, and pixel grayscale is normalized to the value between 0 and 1 (by dividing 16) as stimulus probability. The upper row shows samples of digit images, and the lower row shows the better written “average images”, each of which is actually pixel-wise average of a set of images of a digit. Each neural network is trained in each iteration by the same “average image”, or equivalently in each iteration by image randomly drawn from the set of images.

Close modal
FIG. 12.

The histogram (in probability density form) of Z. To collect Z values, a digit-6 image is tested many times on each of the ten trained neural networks. All connections are equipped with λT. Z66 of positive testing is in red, and the other nine Zk6 of negative testing, where k = 0, 1, 2, 3, 4, 5, 7, 8, 9, are in gray. Z’s sample mean for each digit is depicted as vertical dotted line.

FIG. 12.

The histogram (in probability density form) of Z. To collect Z values, a digit-6 image is tested many times on each of the ten trained neural networks. All connections are equipped with λT. Z66 of positive testing is in red, and the other nine Zk6 of negative testing, where k = 0, 1, 2, 3, 4, 5, 7, 8, 9, are in gray. Z’s sample mean for each digit is depicted as vertical dotted line.

Close modal
FIG. 13.

The histograms of Z for all ten digits. For each digit, randomly drawn testing image, instead of the same one, is used in each test. From digit 0 to 9, classification accuracy is approximately 70%, 41%, 56%, 42%, 53%, 33%, 77%, 51%, 57% and 32%. Generally, better Z-distribution separation of positive and nagative testing results in higher classification accuracy.

FIG. 13.

The histograms of Z for all ten digits. For each digit, randomly drawn testing image, instead of the same one, is used in each test. From digit 0 to 9, classification accuracy is approximately 70%, 41%, 56%, 42%, 53%, 33%, 77%, 51%, 57% and 32%. Generally, better Z-distribution separation of positive and nagative testing results in higher classification accuracy.

Close modal
FIG. 14.

These two neural networks inherit the sensor-cluster structure of FIG 10. (a) Each sensor neuron connects to one single cluster neuron such that each pixel stimulus xi only affects one single connection. Then si+=θi(xǐ). By Eq. (2) and Eq. (3), we have E[ZX̌X]=64xiθi(xǐ) and Var(ZX̌X)64xiθi(xǐ)[1xiθi(xǐ)]. (b) Each sensor neuron connects to its own dedicated cluster of many neurons and synaptic connections, and the clusters are of different sizes. In that case, in a test each xi causes ωi (instead of just one) synaptic connections to be propagated with probability xisi+ or none with probability 1xisi+. When each ωi is a nonrandom variable, we have E[ZX̌X]=64xiθi(xǐ)ωi and Var(ZX̌X)64xiθi(xǐ)[1xiθi(xǐ)]ωi2.

FIG. 14.

These two neural networks inherit the sensor-cluster structure of FIG 10. (a) Each sensor neuron connects to one single cluster neuron such that each pixel stimulus xi only affects one single connection. Then si+=θi(xǐ). By Eq. (2) and Eq. (3), we have E[ZX̌X]=64xiθi(xǐ) and Var(ZX̌X)64xiθi(xǐ)[1xiθi(xǐ)]. (b) Each sensor neuron connects to its own dedicated cluster of many neurons and synaptic connections, and the clusters are of different sizes. In that case, in a test each xi causes ωi (instead of just one) synaptic connections to be propagated with probability xisi+ or none with probability 1xisi+. When each ωi is a nonrandom variable, we have E[ZX̌X]=64xiθi(xǐ)ωi and Var(ZX̌X)64xiθi(xǐ)[1xiθi(xǐ)]ωi2.

Close modal

When the classifier adopts ten neural networks of FIG 14(a) and equips all connections with λL in FIG 7, classification accuracy is ∼31% and Z’s distribution for testing digit-6 images is shown in FIG 15(a). We already know that λL makes θL(x) ≈ x. Then for one test we have

E[ZX̌X]=64xiθi(xǐ)=64xiθL(xǐ)64xixǐ=X̌X.
(4)
FIG. 15.

The Z histogram of testing digit-6 images for three different classifier settings. For each classifier setting, values of Z66 and Zk6,k≠6 are transformed to z-scores (i.e. the number of standard deviations from the mean a data point is) with respect to the distribution of all Zk6,k≠6’s values combined. The distance between the distribution of Z66 and Zk6,k≠6 is approximately evaluated by E[Z66] and standard deviation σ(Z66). (a) Classifier with neural networks of FIG 14(a) and λL. E[Z66]0.93 and σ(Z66) ≈ 0.98. (b) Classifier with neural networks of FIG 14(a) and λT. E[Z66]1.33 and σ(Z66) ≈ 0.9. (c) Classifier with neural networks of FIG 14(b), λL and ωi(xǐ)=100xǐ3. E[Z66]1.39 and σ(Z66) ≈ 0.87.

FIG. 15.

The Z histogram of testing digit-6 images for three different classifier settings. For each classifier setting, values of Z66 and Zk6,k≠6 are transformed to z-scores (i.e. the number of standard deviations from the mean a data point is) with respect to the distribution of all Zk6,k≠6’s values combined. The distance between the distribution of Z66 and Zk6,k≠6 is approximately evaluated by E[Z66] and standard deviation σ(Z66). (a) Classifier with neural networks of FIG 14(a) and λL. E[Z66]0.93 and σ(Z66) ≈ 0.98. (b) Classifier with neural networks of FIG 14(a) and λT. E[Z66]1.33 and σ(Z66) ≈ 0.9. (c) Classifier with neural networks of FIG 14(b), λL and ωi(xǐ)=100xǐ3. E[Z66]1.39 and σ(Z66) ≈ 0.87.

Close modal

Here X̌X is the dot product of training vector X̌[0,1]64 and testing vector X∈[0,1]64. Generally, the dot product of two vectors, a scalar value, is essentially a measure of similarity between the vectors. The bigger E[ZX̌X] is, the more intensely neural network with memory of training X̌ responses to testing X, and the more similar X̌ and X are to each other. Therefore, Eq. (4) simply links otherwise unrelated neural response intensity and stimulus similarity. By comparing ten E[ZX̌X] values, we can tell which X̌ is the most similar to X and hence which digit is classification target. Only, ZX̌X value from test actually deviates around the true E[ZX̌X] randomly, which makes it a useable and yet unreliable classification criteria.

When the classifier equips all connections with threshold-like λT in FIG 7, classification accuracy raises to ∼44%. By comparing FIG 15(b) with FIG 15(a), the distance between Z66’s distribution and the other nine Zk6,k≠6’s distribution is bigger with threshold-like λT than with linear-like λL. This accuracy improvement can be explained conveniently with a true threshold function (or step function)

θstep(x)=0,0x<xstep1,xstepx1.

Of the sum terms in 64xixǐ of Eq. (4), θstep basically diminishes small x̌i[0,xstep) to 0 and enhances big x̌i[xstep,1] to 1, such that most probably E[Z66] would increase by having x̌i=1 in the sum terms with big xi while the other nine E[Zk6,k6] would decrease by having x̌i=0 in the sum terms with big xi, so as to preferably increase E[Z66]E[Zk6,k6]. And likewise Var(Z) would most probably decrease. As a result, θstep increases the distance between the distribution of Z66 and Zk6,k≠6 and thus better separates them.

FIG 14(b) provides another type of neural network to improve classification accuracy without adopting threshold-like λ function for all synaptic connections. Let the linear-like λL be equipped back and take ωi=100xǐ3 simply for example. With this setting our classifier has accuracy ∼47%. Here we have E[ZX̌X]64xi(100xǐ4) where 100xǐ4, like θstep, actually transforms xǐ[0,0.14) (here 0.140.56) to within [0, 10) and transforms xǐ[0.14,1] to across [10, 100] — again, the strong training pixel-stimulus are greatly weighted while the weak ones are relatively suppressed. As shown in FIG 15(c) the distance between the distribution of Z66 and Zk6,k≠6 is increased compared to FIG 15(a). Here our neurobiological interpretation regarding ωi=100xǐ3 is, the training stimulus affects not only synaptic strength, but also the growth of neuron cluster in the replication of neuron cells and in the formation of new synaptic connections. Again this claim needs to be supported by evidences.

TABLE I summarizes the performance of our classifier with different types of neural networks and target strength functions. The four typical λ functions in FIG 3 are also evaluated to demonstrate how these somewhat “pathological” target strength functions affect classification.

TABLE I.

Classification accuracy on different classifier settings. In each setting, neural network can be the one illustrated in FIG 10, FIG 14(a) or 14(b), and target strength function can be one of those in FIG 3 and FIG 7. The accuracy listed is the average of many outcomes taken from the same trained classifier, and thus could fluctuate slightly from one training to another.

λ or θ functionsFIG 10 FIG 14(a) FIG 14(b) 
λL(y)=0.99y+0.01 44% 31% 47% 
λT(y)=21+e4.4(y+0.01)1 51% 44% 51% 
θstep 48%a 60% b 
λ(y) = 0.9y + 0.05 14% 16% 19% 
λ(y) = 0.5sin(4πy) + 0.5 5%c 6% 2% 
λ(y) = −y + 1 4% 5% 1%d 
Discontinuous λ 23% 20% 28% 
λ or θ functionsFIG 10 FIG 14(a) FIG 14(b) 
λL(y)=0.99y+0.01 44% 31% 47% 
λT(y)=21+e4.4(y+0.01)1 51% 44% 51% 
θstep 48%a 60% b 
λ(y) = 0.9y + 0.05 14% 16% 19% 
λ(y) = 0.5sin(4πy) + 0.5 5%c 6% 2% 
λ(y) = −y + 1 4% 5% 1%d 
Discontinuous λ 23% 20% 28% 
a

xstep is set to 0.6.

b

xstep is set to 0.2.

c

Accuracy under 10% is actually worse than wild guessing.

d

If classification criteria is changed to “which neural network gets the fewest synaptic connections propagated”, the accuracy will be ∼40%.

By Eq. (4), the classification of handwritten digit images could be simplified to a task of restricted linear classification:23 given ten classes each with its discriminative function δi(X)=X̌iX where X̌,X[0,1]64, image X is classified to the class with the largest δi value. Our neural classifier simply takes over the computation of vectors’ dot product X̌iX and adds randomness to the ten results. To parameterize the ten δi with their X̌i, the “supervisors” could train the neural networks in classifier with the images they deem best — “average images” in our case or digits learning cards in teachers’ case. Our neural classifier is rather unreliable and primitive compared to ANN which is also capable of linear classification. On one hand, given the same image ANN always outputs the same prediction result. On the other hand, ANN is not only a classifier but also more importantly a “learner”, which learns from all kinds of handwritten digits to find the optimal X̌i for the ten δi; ANN with optimal X̌i is more tolerant with poor handwriting, and thus has less misclassification and better prediction accuracy. Only, ANN’s learning optimal X̌i, an optimization process of many iterations, requires massive computational power to carry out, which is unlikely to be provided by the real-life nervous system — there is no evidence that an individual neuron can even conduct basic arithmetic operations. Despite of its weakness, our neural classifier has merit in its biological nature: it reduces the computation of vectors’ dot product to simple counting of synaptic connections propagated; its training and testing could be purely neurobiological development and activities where no arithmetic operation is involved; its classification criteria, i.e. “deciding” or “feeling” which neural (sub)network has the most connections propagated, could be an intrinsic capability of intelligent agents. This classifier might project new insights on the neural reality, hopefully.

This paper proposes a mathematical theory to explain how memory forms and works. It all begins with synaptic plasticity. We find out that, synaptic plasticity is more than impulses affecting synapses; it actually plays as a force that can drive neural network eventually to a long-lasting state. We also find out that, under certain conditions there would be a one-to-one mapping between the neural state and the external stimulus that neural network is exposed to. With the mapping, given stimulus we know exactly what neural state will be; given neural state we know precisely what stimulus has been. The mapping is essentially a link between past event and neural present; between the short-lived and the enduring. In that sense, the mapping itself is memory, or the mapping casts memory in neural network. Next, we study how memory affects neural network’s response to stimulus. We find out that, the neural network with memory of stimulus can response to similar stimulus more intensely than to the stimulus of less similarity, if response intensity is evaluated by the number of synaptic connections propagated by impulses. That is to say, a neural network with memory is able to classify stimulus. To verify this ability, we experiment with the classifier consisting of ten neural networks, and they turn out to have considerable accuracy in classifying the handwritten digit images. The classifier proves that neurons could collectively provide fully biological computation for classification.

Our reasoning takes root in the mathematical treatment of synaptic plasticity as target strength function λ from impulse frequency to synaptic strength. We put hypothetical constraints on this λ function to ensure that the ideal one-to-one mapping exists. Although these constraints are necessary to keep our theory mathematically sound, they raise concerns. Firstly, they could be overly restrictive. Take continuity constraint for example. Even the discontinuous function of FIG 3(d), whose nonexistent θ function would map certain stimulus to any point within a “fixed interval” instead of a specific fixed point as shown in FIG 6, can be a useable λ in our classifier according to TABLE I. In this case, fixed point per se doesn’t have to exist, and mere tendency to seek out for it could serve the purpose. Secondly, as discussed in Section II those λ constraints have yet to be supported by neurobiological evidences. Above all, the evidence that reveals true λ is vital to clarify the uncertainty.

We thank the anonymous reviewers for their comments that improved the manuscript.

1.
J.
Hughes
, “
Post-tetanic potentiation
,”
Physiol Rev.
38
(
1
),
91
113
(
1958
).
2.
D.
Hebb
.
The organization of behavior: A neuropsychological theory
.
John Wiley & Sons
,
1949
.
3.
S.
Lowel
and
W.
Singer
, “
Selection of intrinsic horizontal connections in the visual cortex by correlated neuronal activity
,”
Science
255
(
5041
),
209
212
(
1992
).
4.
G.
Berlucchi
and
H.
Buchtel
, “
Neuronal plasticity: Historical roots and evolution of meaning
,”
Experimental Brain Research
192
(
3
),
307
319
(
2009
).
5.
A.
Citri
and
R.
Malenka
, “
Synaptic plasticity: Multiple forms, functions, and mechanisms
,”
Neuropsychopharmacology
33
,
18
41
(
2008
).
6.
S.
Martin
,
P.
Grimwood
, and
R.
Morris
, “
Synaptic plasticity and memory: An evaluation of the hypothesis
,”
Annu Rev Neurosci.
23
,
649
711
(
2000
).
7.
E.
Takeuchi
,
A.
Duszkiewicz
, and
R.
Morris
, “
The synaptic plasticity and memory hypothesis: Encoding, storage and persistence
,”
Philos Trans R Soc Lond B Biol Sci.
369
(
1633
) (
2014
).
8.
S.
Nabavi
,
R.
Fox
,
C.
Proulx
,
J.
Lin
,
R.
Tsien
, and
R.
Malinow
, “
Engineering a memory with LTD and LTP
,”
Nature
511
,
348
352
(
2014
).
9.
Y.
Yang
,
D.
Liu
,
W.
Huang
,
J.
Deng
,
Y.
Sun
,
Y.
Zuo
, and
M.
Poo
, “
Selective synaptic remodeling of amygdalocortical connections associated with fear memory
,”
Nat. Neurosci.
19
,
1348
1355
(
2016
).
10.
P.
Trettenbrein
, “
The demise of the synapse as the locus of memory: A looming paradigm shift?
,”
Front. Syst. Neurosci.
10
(
2016
).
11.
J.
Langille
and
R.
Brown
, “
The synaptic theory of memory: A historical survey and reconciliation of recent opposition
,”
Front. Syst. Neurosci.
12
(
2018
).
12.
C.
Laing
and
G.
Lord
,
Stochastic methods in neuroscience
(
Oxford University Press
,
2010
).
13.
G.
Deco
,
E.
Rolls
, and
R.
Romo
, “
Stochastic dynamics as a principle of brain function
,”
Progress in Neurobiology
88
(
1
),
1
16
(
2009
).
14.
T.
Branco
and
T.
Staras
, “
The probability of neurotransmitter release: Variability and feedback control at single synapses
,”
Nat Rev Neurosci.
10
(
5
),
373
83
(
2009
).
15.
C.
Baldassi
,
F.
Gerace
,
H.
Kappen
,
C.
Lucibello
,
L.
Saglietti
,
E.
Tartaglione
, and
R.
Zecchina
, “
Role of synaptic stochasticity in training low-precision neural networks
,”
Phys. Rev. Lett.
120
(
26
) (
2018
).
16.
J.
Schmidhuber
, “
Deep learning in neural networks: An overview
,”
Neural Networks
61
,
85
117
(
2015
).
17.
I.
Istratescu
,
Fixed point theory: An introduction
(
Springer
,
1981
).
18.
Source code can be found at https://github.com/lansiz/neuron.
19.
C.
Shannon
, “
A mathematical theory of communication
,”
The Bell System Technical Journal
27
(
3
),
379
423
(
1948
).
20.
G.
Buzsaki
, “
Neural syntax: Cell assemblies, synapsembles and readers
,”
Neuron
63
,
362
385
(
2010
).
21.
C.
Butler
,
Y.
Wilson
,
J.
Gunnersen
, and
M.
Murphy
, “
Tracking the fear memory engram: Discrete populations of neurons within amygdala, hypothalamus and lateral septum are specifically activated by auditory fear conditioning
,”
Learn. Mem.
22
,
370
384
(
2015
).
22.
C.
Butler
,
Y.
Wilson
,
J.
Oyrer
,
T.
Karle
,
S.
Petrou
,
J.
Gunnersen
,
M.
Murphy
, and
C.
Reid
, “
Neurons specifically activated by fear learning in lateral amygdala display increased synaptic strength
,”
eNeuro
(
2018
).
23.
T.
Hastie
,
R.
Tibshirani
, and
J.
Friedman
,
The elements of statistical learning
(
Springer
,
2009
).
24.
The dataset of 1797 handwritten digit images can be obtained with Python code “from sklearn import datasets; datasets.load_digits()”.