Here, we describe a general-purpose prediction model. Our approach requires three matrices of equal size and uses two equations to determine the behavior against two possible outcomes. We use an example based on photon-pixel coupling data to show that in humans, this solution can indicate the predisposition to disease. An implementation of this model is made available in the supplementary material.
A novel prediction method is described, implemented, and tested. The model revolves around three known states: two extreme outcomes (A and B) and one measurement (P). These states are represented by matrices that include sets of homologous parameters. An information spectrum is described as a series of predicted states (M1, M2, M3,…, Md) generated between the two extreme outcomes (A and B). The predicted states are compared with the known state (P) from the measurements to generate a similarity index. The trend generated by the values of the similarity index indicates how a system may behave against these two extreme outcomes.
Nonlinear behavior is part of all the phenomena we know in nature, from weather to living beings and beyond. In biological systems, nonlinear mechanisms are partially uncharted and a subject of general scientific interest for several decades.1,2 In medicine, perhaps the most important research segment is represented by the prediction of different diseases. The accurate prediction of the occurrence of a disease has always been regarded as the main tendency of clinical practice. Genetic diseases, which can be triggered by environmental factors, are inherently nonlinear in nature and their onset is unpredictable. Such a disease, with a non-linear behavior, is diabetes. Neural Networks (NNs) were the hope for prediction in the medical field in the 1980s and this trend was reborn in recent years.3–5 However, the medical context in which they are used and the partial subjectivity in the training of NNs slowly lose their much-anticipated value for certain specific tasks.6,7 Essentially, NNs are classifiers of information with prior or “in-flight” adaptation to the environment. Any pattern recognition strategy uses this classification approach. Nevertheless, such a classification based on prior adaptation to the environment may not imply a prediction. We are inclined to believe that NNs are misused in some cases and the classification process is often confused with the prediction process. For instance, in a previous study, we struggled with a fundamental problem in which we tried to use NNs for the prediction of diabetes onset.8 There, we agreed that our NN correctly classified a new patient into one of the two classes, namely, type 1 diabetes (T1D) or type 2 diabetes (T2D). However, such a classification was a direct indication of the current state of the disease. In other words, it was merely a medical diagnosis. We then asked ourselves whether the NN can predict the evolution of a human subject over time in the hope that we can predict the onset of the disease. Our experimental data have indicated that our NN classified the set of data in a fair manner but failed to indicate any valuable information about the evolution of a subject over time.9,10 In this respect, predictions that use Markov chains or classical statistical approaches showed more reliable results in the field of biology and medicine.11–13 However, here we propose a novel method of analysis, with implementation (see the supplementary material), as an alternative to classical NNs. Note: the word spectral refers to a series of predicted states arranged linearly between two known states.
MATERIALS AND METHODS
In order to test our model, we collected and used the data related to the electrical activity signals of the human skin from our most recent experiment.8–10
Datasets and context
The electric activity on the skin surface of the trunk was measured by using 200 sensors in three groups: a control group A—18 normal subjects, a group B—18 diabetic subjects, and a test group C—20 normal subjects (ten subjects with confirmed family predisposition for T2D and ten subjects without family predisposition for T2D).8 The electrical signals were collected using the photon-pixel coupling method and were stored as numerical values (0…100) in a 10 × 20 matrix for each subject.8,9 An average was taken along the 18 subjects in group A and group B, yielding a 10 × 20 matrix for each, namely, matrix A and matrix B.8 These average matrices represent the main characteristics of each group. A state space was considered between the two matrices of group A and group B. The number of states was established by a distance index (d) and each state in this spectrum was represented by a matrix M. Each subject in group C was then evaluated by a consecutive comparison of their matrix P with each matrix M in the spectrum.
The spectral forecast model
In our approach, we used three known matrices: A, B, and P. A matrix M was further used to formulate the entire spectrum of unknown information between matrix A and B [Fig. 1(a)]. For this calculation, we devised a novel equation shown in (1),
where Mij represents the predicted matrix at every discrete step (d), Aij represents the matrix of the normal group, and Bij is the matrix associated with the diabetic group [Figs. 1(b) and 1(c)]. Also, d stands for distance and represents the total number of discrete steps taken from matrix A to matrix B. Thus, Mijd can be considered a 3D tensor-like structure.
The evolution of P was predicted by a repeated comparison with matrix M at every discrete step (2). This comparison was made by using the similarity index,
where S is the similarity index and represents the normalized dot-product of Mij and Pij. Mij stands for the predicted matrix at every discrete step and Pij is the matrix originated from a newly measured individual. The similarity index can take values between 0 and 1. As the similarity between the corresponding i,j elements of matrix M and P increases, the similarity index S tends to 1. In contrast, as the differences between the values of the corresponding i,j elements of matrix M and P are more frequent, the similarity index S tends to 0 [Fig. 1(d)] . The main result of the method is represented by a trend dictated by the values of the similarity index [Fig. 1(d)]. The trend was taken as the evolutionary route of the disease. In the supplementary material, we show a ready-to-use implementation of the method.
Note: The total number of discrete steps was arbitrarily chosen. In this specific case, the maximum value for distance (d) was set at 100 for ease. A higher number of discrete steps increased the resolution of the prediction, which was desirable in many situations. For instance, in some cases, the trend developed both ascending and descending characteristics. At low resolutions (i.e., d < 10), many of these fluctuating features remained undetectable and the insight of the results significantly dropped.
RESULTS AND DISCUSSION
Discretization is a practical approach for many prediction algorithms and it is used for almost all computational solutions. Here, we used a discretization strategy to increase the resolution of the spectrum underlying two groups: a healthy group and a diabetic group [Figs. 2(a) and 2(b)]. The data from the healthy group were considered as the initial state (state 0) and the data from the diabetic group were used to formulate the final state (state 100). The number of intermediate states (state 1–state 99) was dictated by distance d, and the intermediate states properties were repeatedly formulated by matrix M [Figs. 2(c) and 2(d)]. To predict the evolution of a third group, a comparison was made along this spectrum [Figs. 2(c) and 2(d)]. Thus, data of matrix P from a new individual were compared to each matrix M in order to obtain the series of values for the similarity index [Figs. 2(e) and 2(f)]. To test the method, we decided to use our most recent data collected from a previous study.8–10 The predisposition trend for individuals in group C has been correctly predicted 100% of the time (Fig. 3). The subjects with family predisposition for T2D have shown an average similarity index of 0.877 ± 0.024493, whereas normal individuals have shown an average similarity index of 0.68 ± 0.024499.
In the normal group, the mean of the similarity index showed a value of 0.68 ± 0.111 and a maximum value of 0.8784 and a minimum value of 0.4592 [Fig. 3(e)]. In the T2D predisposition group, the mean of the similarity index showed a value of 0.87741 ± 0.08 and a maximum value of 0.9626 and a minimum value of 0.63086 [Fig. 3(f)]. The method has been implemented and can be found in the supplementary material.
The meaning of the trend
Based on known clinical information and the observations made on each individual of the two groups, the trend of the similarity index values indicated whether a newly measured individual showed a predisposition or protection for T2D [Figs. 3(e) and 3(f)]. We speculate that the difference between the lower and the upper limit of the trend may represent a risk score for T2D. At this stage of the investigation, we can indicate if the newly measured subject tends toward the disease [Figs. 3(e) and 3(f)]. Experimentation has shown that the trend shaped by the similarity index does not exhibit only ascending or only descending features. Ascending and descending features of the similarity index can exist within the same plot. One example can be seen in Fig. 3(e), where one individual in the normal group shows both ascending and descending features. In the future, we will try to find the meaning of such a distribution because we speculate that it might be of particular importance for the prediction process.
A link between two unrelated data of the same dimension
The core of our method is represented by Eq. (1), which can have multiple uses on a wide range of values. One of these uses would be a normalization between two unrelated matrices with the same dimension. For instance, elements of matrix A may contain integers between 1 and 2 × 106 million and the elements of matrix B may contain probability values. In this case, Eq. (1) will mix the two matrices based on distance d. In the case of two probability matrices, Eq. (1) performs a normalization in favor of one of the matrices based on distance d. In other words, as matrix M is closer to matrix A, the homologous elements of matrix M will be more similar to matrix A than to those from matrix B. As matrix M will be closer to matrix B, matrix M will be more similar to matrix B. Consequently, if d = 50, matrix M will represent a mix equally similar to matrix A and matrix B.
Thoughts for the future
The important cases are those that show a maximum similarity index between the two groups. We suggest that these peak values may be a direct indication of the state of the subject before the onset of the disease. In order to predict the onset, we wish to establish a link between the temporal line of the disease and the states generated along the spectrum. Variations of the method may be constructed and we are eager to use other datasets in the same format (A, B, P). Future uses may include the field of meteorology, medical diagnostics, forensics, economic forecasts, or in the field of genetics for establishing the relationship between species. In biology, we also believe that Eq. (1) can be used for tissue structure prediction based on two groups of histological slides.
Here, we have shown the use of a novel prediction model. We proposed a simple method that provides an insight into the evolution of natural processes. To demonstrate the method, our current example considered the predisposition to disease in human subjects based on two known groups. In this approach, we correctly predicted the evolution of new subjects by using our previous data recorded from normal subjects and T2D subjects. Other applications of the method may further indicate the ideal conditions to which our method is appropriate or the limits of precision in the prediction of various metabolic diseases. In the future, we will try to make an association between a temporal line and the steps of the spectrum to indicate the time until the disease is triggered in days, months, or years. A ready-to-use implementation is present in the supplementary material, which can also be used for other types of data.
See the supplementary material for the implemented version of the spectral forecast model.
The authors would like to thank two anonymous referees for their constructive comments, which helped improve the manuscript. The authors declare no competing financial interests. This study was funded through No. PN-III-P1-1.2-PCCDI-2017-0797: “Pathogenic mechanisms and personalized treatment in pancreatic cancer using multi-omics technologies.”