The thermal deformation of flexible protein structures affects the protein characteristics. However, the overall effects of deformation have not been fully investigated. In this study, the behaviors of tetrapeptides in active areas of a thermally deformed papain structure were analyzed by using a deep neural network, genetic programming, and computer simulations. Fifteen tetrapeptides were found to be compatible with the thermally deformed structure, and over half of them were incompatible with the structure without thermal deformation. A decision tree was used to show the attributes that governed the suitability of tetrapeptides in active areas.
I. INTRODUCTION
Flexible protein structures are deformed by thermal vibrations, and structural deformation affects the protein characteristics.1–4 Because of their specific characteristics, proteins have practical applications in a broad range of industries,5–7 and overall variations in the characteristics must be considered. The deformed structures of proteins must be precisely identified, and structure-based characteristic analysis should be repeated extensively. The obtained characteristic information is valuable for expanding the usage environments of biomedical devices such as biosensors and bioreactors.8,9
Protein papain dissolves a large variety of peptides and is widely used in areas such as medical procedures, food handling, and cosmetic products.10–12 Characteristic analysis of the structure of thermally deformed papain is of interest in a broad range of industries. Papain is a protease, and studies of the binding characteristics of proteases to inhibitors have contributed to our understanding of the structural characteristics of papain.13–17 However, details of the characteristics of a thermally deformed papain structure are not clear.
The aim of this research was to identify peptides that are compatible with the structure of thermally deformed papain and to obtain details of thermal impact on papain structure. There are a large number of possible structural patterns of peptides, and it is difficult to decide their priority levels. The selection and extraction of important features were achieved by using two different types of artificial intelligence, namely, a deep neural network (DNN)18 and genetic programming (GP).19 Structural analysis of proteins in their normal states is performed by molecular dynamic (MD) and docking simulations.13,14,20–22 The thermally deformed structure of papain was determined by MD simulations, and the compatibility of peptides with the thermally deformed structure was evaluated by docking simulations.
Tetrapeptides compatible with the papain structure at ambient temperature were identified by GP and from the simulations in our previous study.23 There were many structural patterns of tetrapeptides, whose molecular weights were relatively close to that of the cysteine protease inhibitor E64. In this study, the tetrapeptides compatible with the thermally deformed papain structure were explored by using a DNN, GP, and simulations.
II. METHODS
The selection of candidates is faster for a DNN than for GP; on the other hand, important features can be obtained by GP but not by DNN. The candidates for tetrapeptides compatible with the thermally deformed papain structure were selected by a DNN and were evaluated by simulations. The important features were obtained by GP from the evaluated results.
The targets were focused by using a convolutional neural network (CNN),24 which is a class of DNN. The CNN was constructed by using the deep learning library TensorFlow.25 Its structure consisted of an input layer, first convolutional layer, first pooling layer, second convolutional layer, second pooling layer, fully connected layer, and output layer. A tetrapeptide consists of four amino acids, and the amino acids include a variety of atoms. The number of each atom in each amino acid was counted, and the numbers were summarized in the input layer. If the tetrapeptide was stably bound to papain, the distances between the atoms in the peptide and the active center of papain were calculated. When the distances were less than 4.0 Å, the atoms were numbered. On the basis of the number of atoms, the binding states were classified as below NA1 ≧ 3 atoms, 3 atoms > NA2 ≧ 2 atoms, 2 atoms > NA3 ≧ 1 atoms, or 1 atoms > NA4. Sequential numbers from 0 to 3 were assigned to NAα (α = 1–4), and these were summarized in the output layer. 18 tetrapeptides were compatible with active sites in free papain at ambient temperature,23 and these were used as the first training data. Machine learning was executed on the data, and the next candidates were selected. The next candidates were evaluated by MD and docking simulations, and the results were added to the training data. Many tetrapeptides were derived from iterations of this process.
The important features that decide the behavior of peptides were extracted by using GP.19 The hyperparameters required for this approach were derived from our previous research,26 and the decision tree was optimized by evolutionary computation. The tree optimization method was improved in our previous research.27 In this research, the following approach was introduced to further improve it. First, a small decision tree was generated by the usual GP procedure. Next, the size of the decision tree was expanded appropriately, and a partial tree in the decision tree was optimized by the same procedure. This partial optimization process was repeated to satisfy the data. Each output of the decision tree corresponded to one NAα (α = 1–4), and these were converted to sequential numbers from 0 to 3. The four amino acids in a tetrapeptide were ordered from the N-terminus to the C-terminus (amiβ, β = 1–4). The number of each atom in each amino acid was shown by appending the atomic symbol at the end of the label amiβ. For example, ami1S means the number of sulfur atoms in the N-terminus amino acid. By using this method, 20 attributes were represented as amiβC, amiβN, amiβO, amiβS, and amiβAC (β = 1–4).23 The 21 types of operation (Add, Sub, Mul, Div, Fmod, Log, Log 10, Sin, Cos, If, Equal, NotEqual, GT, GE, And, Or, Not, Fmod2, Sqrt, IntegLog, and IntegLo g10) were inherited from our previous research.23,27 The 12 types of constant consisted of a constant integer number CSTγ (γ) (γ = 0–9) and Boolean values TRUE (1) and FALSE (0). The decision tree was generated by combining members that were included in the 20 attributes, 21 operations, and 12 constants.
The thermally deformed structure was produced by performing high-temperature MD simulations with the software AMBER 12.0.28 The temperature of the system was increased from 5 K to 310 K in 140 ps and was maintained until significant changes in the structure disappeared (100 ns). The simulations were conducted for the ff03.r1 force field.29 The structural data for papain were obtained from the Protein Data Bank (PDB) (PDB ID: 1ppn), and its structure was solvated with 11 540 TIP3P water molecules30 into the shape of a box. A simulation protocol similar to that used in our previous research23 was incorporated. The temporal step in the MD simulation to monitor the thermal deformation of papain was 10 ps. When the significant changes disappeared, the generated structures were averaged for 100 ps within the time zone.
The compatibility of the peptides with the averaged structure of thermally deformed papain was evaluated from their binding properties, which were obtained by using AutoDock Vina.31 This software package uses the Iterated Local Search global optimizer algorithm,32,33 the Broyden–Fletcher–Goldfarb–Shanno method,34 and a scoring function, which was derived by machine learning. Structural data for the tetrapeptides were acquired by using the LEaP program in AMBER 12.0,28 and the areas near the active center of papain were designated as the search space. The simulations were performed 10 times for each tetrapeptide, and the most-stable pattern was chosen. For the pattern, atoms in the tetrapeptide were counted when they were situated near the active center of papain. The counted numbers were averaged if there were several members in the most-stable pattern.
III. RESULTS
The system was held at 310 K via MD simulations to thermally deform the structure of papain, and significant changes in the structure were examined at several time steps. The root-mean-square deviation (RMSD) of the structure between successive simulations was used for examination of the significant changes. The significant changes disappeared after 96 200 ps, and the average structure was calculated in the range from 99 901 to 100 000 ps. Figure 1 shows the average structures of thermally deformed papain. The α–helix structure near the active center of papain was deformed by thermal shock. At 300 K, 18 tetrapeptides [Table S1(a)] were compatible with the sites near the active center of free papain.23 These tetrapeptides were assigned to be the first candidates, and their docking simulations were performed within the region near the active center of the average structure of papain. The most-stable states among the docking trials were selected.
Papain is a cysteine protease. The thiol (SH) group of Cys25 in a cysteine protease has a key role in peptide dissolution. The sulfur atom of this thiol group (S-Cys25) was therefore designated as the active center. To identify desirable peptides for thermally deformed papain, the atoms nearer than 4.0 Å to S-Cys25 in the most-stable binding state were counted. If there were several most-stable binding states, the obtained numbers were averaged. These results were tagged on the basis of a previous classification method and are listed in Table S2. Among the first 18 tetrapeptides, two tetrapeptides were compatible with sites near S-Cys25 of thermally deformed papain. Machine learning with the CNN was executed on the results of the 18 tetrapeptides, and the next 12 candidates [Table S1(b)] were selected from the CNN. The 12 tetrapeptides were evaluated by docking simulations (Table S2), and two tetrapeptides were compatible. Further machine learning with the CNN was executed on the results of the 30 tetrapeptides (the previous 18 and the current 12 tetrapeptides), and the next 10 candidates [Table S1(c)] were selected. The 10 tetrapeptides were evaluated by simulations (Table S2), and two tetrapeptides were identified as compatible. A similar protocol was run four times, and 40 tetrapeptides were derived (Table S2). The results of these evaluations for the 80 tetrapeptides showed that 15 tetrapeptides (Table I) were compatible with the sites near S-Cys25 of deformed papain and 15 other tetrapeptides (Table S3) were not especially compatible. To extract important features of the peptides, a decision tree was constructed from these 30 tetrapeptides. The decision tree contained 43 nodes, and its configuration is shown in Tables II and III. There were 11 attributes in the decision tree that reflected the features of the peptide structure. The 11 attributes are three ami4N, one ami1C, one ami2C, one ami2N, one ami2O, one ami3AC, one ami3C, one ami3N, and one ami4C. These attributes powerfully affect the affinity between the peptides and the active site of thermally deformed papain. The docking characteristics of the 15 compatible tetrapeptides were similarly evaluated for free papain at 300 K (Table S4), and seven tetrapeptides ((a) of Table I) were also compatible with free papain at 300 K.
(a) . |
---|
Lys–Pro–Glu–Ala |
Leu–Tyr–Ser–Phe |
Thr–Pro–Glu–Ile |
Ile–Ile–Glu–Ser |
Met–Val–Glu–Ala |
Val–Ser–Ile–Thr |
Asp–Ser–Glu–Ile |
(a) . |
---|
Lys–Pro–Glu–Ala |
Leu–Tyr–Ser–Phe |
Thr–Pro–Glu–Ile |
Ile–Ile–Glu–Ser |
Met–Val–Glu–Ala |
Val–Ser–Ile–Thr |
Asp–Ser–Glu–Ile |
(b) . |
---|
Lys–Ile–Glu–Gln |
Glu–Lys–Asp–Val |
Thr–Asp–Pro–Glu |
Ser–Asp–Thr–Val |
Ile–Asp–Thr–Ala |
Ser–Glu–Ser–Glu |
Ser–Asp–Asp–Val |
Val–Val–Asp–Thr |
(b) . |
---|
Lys–Ile–Glu–Gln |
Glu–Lys–Asp–Val |
Thr–Asp–Pro–Glu |
Ser–Asp–Thr–Val |
Ile–Asp–Thr–Ala |
Ser–Glu–Ser–Glu |
Ser–Asp–Asp–Val |
Val–Val–Asp–Thr |
γ0 . | Fmod2[α0 = β of γ1][β = Solution] . |
---|---|
γ1 | If[α0 = β of γ2][α1 = β of γ8][α2 = β of γ17][β = α0 of γ0] |
γ2 | NotEqual[α0 = β of γ3][α1 = β of γ6][β = α0 of γ1] |
γ3 | Div[α0 = β of γ4][α1 = β of γ5][β = α0 of γ2] |
γ4 | ami2C[β = α0 of γ3] |
γ5 | 6[β = α1 of γ3] |
γ6 | Log10[α0 = β of γ7][β = α1 of γ2] |
γ7 | ami1C[β = α0 of γ6] |
γ8 | Add[α0 = β of γ9][α1 = β of γ13][β = α1 of γ1] |
γ9 | Div[α0 = β of γ10][α1 = β of γ12][β = α0 of γ8] |
γ10 | Cos[α0 = β of γ11][β = α0 of γ9] |
γ11 | ami4N[β = α0 of γ10] |
γ12 | ami3AC[β = α1 of γ9] |
γ13 | Sub[α0 = β of γ14][α1 = β of γ15][β = α1 of γ8] |
γ14 | 7[β = α0 of γ13] |
γ15 | IntegLog[α0 = β of γ16][β = α1 of γ13] |
γ16 | ami4N[β = α0 of γ15] |
γ17 | If[α0 = β of γ18][α1 = β of γ23][α2 = β of γ25][β = α2 of γ1] |
γ18 | Equal[α0 = β of γ19][α1 = β of γ21][β = α0 of γ17] |
γ19 | Log[α0 = β of γ20][β = α0 of γ18] |
γ20 | ami2O[β = α0 of γ19] |
γ21 | Cos[α0 = β of γ22][β = α1 of γ18] |
γ0 . | Fmod2[α0 = β of γ1][β = Solution] . |
---|---|
γ1 | If[α0 = β of γ2][α1 = β of γ8][α2 = β of γ17][β = α0 of γ0] |
γ2 | NotEqual[α0 = β of γ3][α1 = β of γ6][β = α0 of γ1] |
γ3 | Div[α0 = β of γ4][α1 = β of γ5][β = α0 of γ2] |
γ4 | ami2C[β = α0 of γ3] |
γ5 | 6[β = α1 of γ3] |
γ6 | Log10[α0 = β of γ7][β = α1 of γ2] |
γ7 | ami1C[β = α0 of γ6] |
γ8 | Add[α0 = β of γ9][α1 = β of γ13][β = α1 of γ1] |
γ9 | Div[α0 = β of γ10][α1 = β of γ12][β = α0 of γ8] |
γ10 | Cos[α0 = β of γ11][β = α0 of γ9] |
γ11 | ami4N[β = α0 of γ10] |
γ12 | ami3AC[β = α1 of γ9] |
γ13 | Sub[α0 = β of γ14][α1 = β of γ15][β = α1 of γ8] |
γ14 | 7[β = α0 of γ13] |
γ15 | IntegLog[α0 = β of γ16][β = α1 of γ13] |
γ16 | ami4N[β = α0 of γ15] |
γ17 | If[α0 = β of γ18][α1 = β of γ23][α2 = β of γ25][β = α2 of γ1] |
γ18 | Equal[α0 = β of γ19][α1 = β of γ21][β = α0 of γ17] |
γ19 | Log[α0 = β of γ20][β = α0 of γ18] |
γ20 | ami2O[β = α0 of γ19] |
γ21 | Cos[α0 = β of γ22][β = α1 of γ18] |
γ22 . | ami4N[β = α0 of γ21] . |
---|---|
γ23 | Cos[α0 = β of γ24][β = α1 of γ17] |
γ24 | 8[β = α0 of γ23] |
γ25 | Sqrt[α0 = β of γ26][β = α2 of γ17] |
γ26 | Sub[α0 = β of γ27][α1 = β of γ42][β = α0 of γ25] |
γ27 | If[α0 = β of γ28][α1 = β of γ39][α2 = β of γ41][β = α0 of γ26] |
γ28 | Equal[α0 = β of γ29][α1 = β of γ35][β = α0 of γ27] |
γ29 | Div[α0 = β of γ30][α1 = β of γ31][β = α0 of γ28] |
γ30 | ami4C[β = α0 of γ29] |
γ31 | Add[α0 = β of γ32][α1 = β of γ34][β = α1 of γ29] |
γ32 | Log10[α0 = β of γ33][β = α0 of γ31] |
γ33 | 3[β = α0 of γ32] |
γ34 | ami2N[β = α1 of γ31] |
γ35 | Add[α0 = β of γ36][α1 = β of γ37][β = α1 of γ28] |
γ36 | 3[β = α0 of γ35] |
γ37 | Cos[α0 = β of γ38][β = α1 of γ35] |
γ38 | ami3C[β = α0 of γ37] |
γ39 | Cos[α0 = β of γ40][β = α1 of γ27] |
γ40 | 1[β = α0 of γ39] |
γ41 | 9[β = α2 of γ27] |
γ42 | ami3N[β = α1 of γ26] |
γ22 . | ami4N[β = α0 of γ21] . |
---|---|
γ23 | Cos[α0 = β of γ24][β = α1 of γ17] |
γ24 | 8[β = α0 of γ23] |
γ25 | Sqrt[α0 = β of γ26][β = α2 of γ17] |
γ26 | Sub[α0 = β of γ27][α1 = β of γ42][β = α0 of γ25] |
γ27 | If[α0 = β of γ28][α1 = β of γ39][α2 = β of γ41][β = α0 of γ26] |
γ28 | Equal[α0 = β of γ29][α1 = β of γ35][β = α0 of γ27] |
γ29 | Div[α0 = β of γ30][α1 = β of γ31][β = α0 of γ28] |
γ30 | ami4C[β = α0 of γ29] |
γ31 | Add[α0 = β of γ32][α1 = β of γ34][β = α1 of γ29] |
γ32 | Log10[α0 = β of γ33][β = α0 of γ31] |
γ33 | 3[β = α0 of γ32] |
γ34 | ami2N[β = α1 of γ31] |
γ35 | Add[α0 = β of γ36][α1 = β of γ37][β = α1 of γ28] |
γ36 | 3[β = α0 of γ35] |
γ37 | Cos[α0 = β of γ38][β = α1 of γ35] |
γ38 | ami3C[β = α0 of γ37] |
γ39 | Cos[α0 = β of γ40][β = α1 of γ27] |
γ40 | 1[β = α0 of γ39] |
γ41 | 9[β = α2 of γ27] |
γ42 | ami3N[β = α1 of γ26] |
IV. DISCUSSION
The compatibility of 80 tetrapeptides with sites near S-Cys25 of thermally deformed papain was evaluated by using a DNN and by MD and docking simulations. 15 tetrapeptides (Table I) that fitted the sites near S-Cys25 were identified. Eight of these 15 tetrapeptides ((b) of Table I) did not fit the sites near S-Cys25 of papain at 300 K. The suitability of the eight tetrapeptides was improved by thermal deformation. The remaining seven tetrapeptides ((a) of Table I) were compatible with both deformed papain and non-deformed papain. The suitability of these seven tetrapeptides was not governed by thermal deformation. Consequently, these seven tetrapeptides could withstand protein structural deformations. DNNs are an essential tool in the identification of potential candidates for drug design because they effectively identify compatible peptides. Additionally, a similar approach can also be used to investigate deformation of proteins induced by pH variations besides thermal deformation.
In the group of 18 tetrapeptides that were compatible with sites near S-Cys25 of free papain at 300 K, only two tetrapeptides were compatible with these sites in free papain at 310 K. Most of the features at ambient temperature were suppressed by thermal deformation, and only a fraction of the features were retained in both states. In the group of 18 tetrapeptides, the number of tetrapeptides that were compatible with sites near S-Cys25 of papain at 300 K was zero on a flat Si surface26 and four in a nanocubic space.27 The level of suppression of the features of thermally deformed papain was greater than that for adsorption in a nanocubic space and smaller than that for adsorption on a flat Si surface. The RMSD of the structure of free papain at 310 K from that at 300 K was appraised (Table S5). The structural changes at sites near the active center of thermally deformed papain were larger than those for adsorption in a nanocubic space and smaller than those for adsorption on a flat Si surface. The level of cutting off of the papain features can be understood from the amount of structural changes at sites near its active center. Recently developed Si-processing techniques enable fabrication of a variety of nanostructured arrays with regulated surface conditions.35–37 The thermal effect fell between the effects of two different Si surfaces; therefore, the fabrication technique can enable formation of virtual thermally deformed states of proteins at ambient temperature. Given that the thermal effect was the result of the effects of two different Si surfaces, the fabrication technique can enable formation of virtual thermally deformed states of proteins at ambient temperature.
The decision tree was constructed from the 15 compatible tetrapeptides and 15 highly incompatible tetrapeptides and consisted of 43 nodes. The breakdown of the nodes was 11 attributes reflecting features of the peptide structure, 25 operations, and seven constants (Tables II and III). The 11 attributes contained one ami1C, one ami2C, one ami2N, one ami2O, one ami3AC, one ami3C, one ami3N, one ami4C, and three ami4N. This result shows that the compatibility of the active areas of the thermally deformed papain would be greatly affected by nitrogen atoms in the C-terminus amino acid of the tetrapeptide and it would be little affected by sulfur atoms in all the amino acids of the tetrapeptide. Furthermore, it was found that the impact on fitness would be weaker for the N-terminus amino acid of the tetrapeptide than for its other parts. As described above, the configuration of the decision tree enables evaluation of the attributes governing the fitness of the tetrapeptide at sites near the active center of the thermally deformed papain structure. The 25 operations were 5 Cos, 3 Add, 3 Div, 3 If, 2 Equal, 2 Log 10, 2 Sub, 1 Fmod2, 1 IntegLog, 1 Log, 1 NotEqual, and 1 Sqrt. Among the 21 types of operation, Cos was most frequently used, and nine operations (Mul, Fmod, Sin, GT, GE, And, Or, Not, and IntegLog 10) were not needed to construct the tree. The behavior of the tetrapeptides on thermally deformed papain would be periodically altered by their constituent elements, and the behavior could not be represented in binary form.
It would be important to experimentally check the docking of peptides to interesting sites of the deformed protein. LUCK, the Laser UV Cross-linKing method, has been used to characterize protein–protein interactions in living cells.38 In this case, upon irradiation of HeLa cells under controlled conditions, cross-linked products of glyceraldehyde-3-phosphate dehydrogenase (GAPDH) were detected. This method is very flexible, and it might be used to measure docking of peptides and to verify stability, deformations, and dynamics of proteins in cases similar to papain.
V. CONCLUSIONS
A rational combination of a DNN, GP, and MD and docking simulations was achieved and enabled exploration of the detailed characteristics of thermally deformed papain. Fifteen tetrapeptides were found to be compatible with its active areas. Over half of the 15 tetrapeptides were incompatible with papain without thermal deformation, and thermal deformation gave the protein new functions. The remaining tetrapeptides were compatible with both states, and the structures of the tetrapeptides could provide valuable information for designing inhibitors of viral proteins with frequent mutations. The decision tree showed the attributes that determine the fitness of tetrapeptides in the active areas of thermally deformed papain. This tree provides a powerful tool for extracting important factors from complex situations such as biological phenomena.
SUPPLEMENTARY MATERIAL
See the supplementary material for raw data.
DATA AVAILABILITY
The data that support the findings of this study are available within the article and its supplementary material.
ACKNOWLEDGMENTS
This study was supported by the National Institute of Technology (KOSEN).