A conventional intelligent fault diagnosis approach for electric submersible pumps (ESPs) is heavily dependent on manual expertise for feature extraction. Meanwhile, a conventional convolutional neural network (CNN) exhibits an excess of parameters and requires a substantial volume of training samples. In this paper, a fault diagnosis algorithm of ESPs based on a Bayesian optimization-one-dimensional convolutional neural network-support vector machine (BO-1DCNN-SVM) is proposed by combining a 1DCNN with a SVM and using the algorithm BO to tune the improved 1DCNN. First, the method uses the self-extracting feature capability of the 1DCNN to solve the problem that traditional diagnosis methods over-rely on manual experience extraction. Meanwhile, the last layer of the convolutional neural network Softmax layer is replaced by the SVM to effectively process a few sample data. The accuracy and generalization ability of the fault classification of the electric submersible pump are improved. Then, the Bayesian optimization algorithm is used to find the optimal combination of hyperparameters for the improved 1DCNN-SVM model to further improve the prediction performance of the fault diagnosis model. Finally, the experimental results achieved a classification accuracy of 96.64%, which is 5% higher than existing CNN approaches with data samples of similar scale in this paper. The proposed method also proved to be highly accurate and robust in fault diagnosis.

## I. INTRODUCTION

As core equipment in the field of oil industry production, the electric submersible pump (ESP) system is mainly composed of an electric motor, a protector, a gas–liquid separator, a multi-stage centrifugal pump, a power cable, a transformer, a pressure sensor, and other components.^{1} It works by placing the tubing and the pump together in the well, connecting the built-in motor to the ground power supply through a special cable, and then driving the centrifugal pump through the operation of the motor to generate centrifugal force, which further lifts the crude oil from the well to the ground.^{2} ESPs are prone to failure because they operate in geological structures and well conditions that are complicated and changeable and have a strong correlation between individual components. Under high pressure environments, the sealing performance of the pump will be affected, leading to leakage and mechanical wear. Under high-temperature environments, electric motors are susceptible to thermal damage, necessitating the implementation of a cooling circulation system to ensure their stable operation. When the state of the medium is different or changes, resulting in changes in gas content will have an impact on the pump cavitation. Once a failure occurs, it can lead to shutdown for well repair operations, which is time-consuming and expensive.^{3} An effective fault diagnosis of an ESP can assist production managers in identifying faults quickly and accurately, reducing equipment maintenance time, enhancing the effectiveness of maintenance, delaying equipment obsolescence, preventing a series of domino effects, and enhancing the safety and durability of the entire system. Therefore, it is of great research significance to conduct troubleshooting studies on ESPs.

At present, the ESP fault diagnosis technology is mainly divided into traditional ESP fault diagnosis methods and modern ESP fault diagnosis methods. The traditional ESP fault diagnosis mainly includes current card diagnosis and holding pressure method diagnosis, among which Zhang *et al.*^{4} identified the fault type of electric submersible pump by identifying the fault characteristics of the current card under different working conditions and analyzed in detail the formation causes of the typical characteristics of the current card. Experts or technicians manually diagnose ESP faults based on long-accumulated experience with current cards, which can lead to some judgment errors that compromise real-time performance and accuracy. The holding pressure diagnostic method is based on quickly closing the production valve at the wellhead during normal operation and shutdown of the ESP system, monitoring the oil pressure changes at the wellhead, and plotting the holding pressure curve. Then, according to the curve reflecting the change of characteristics, the fault diagnosis analysis is carried out. The holding pressure method diagnoses few types of faults, and the accuracy of diagnosis results is also poor. Machine learning advances and enhanced computational resources make the fault diagnosis of modern ESP also rapidly developed. At present, the three most commonly used methods are the neural network, fuzzy mathematics, and fault tree analysis method.^{5} Among them, the backpropagation (BP) neural network is the most frequently used method in fault diagnosis, and many experts and scholars have started to study the application of neural networks to current card identification,^{6} which avoids the difference of personal and experience on fault identification due to the manual identification process. The fuzzy mathematical diagnosis technique takes fuzzy mathematics as the theoretical basis, constructs the fuzzy relationship matrix of fault conditions and symptoms, and finds the set of fault degrees according to the set of affiliation degrees of the fuzzy matrix.^{7} Fault tree analysis is one of the essential analysis methods in system safety engineering,^{8} which represents the logical relationship between the possible fault conditions and the causes of failure of ESP systems in a tree structure.^{9} In recent years, with the rapid development of machine learning theory and technology, machine learning methods have brought more new ideas to ESP fault diagnosis. The increasing number of ESP sensors allows downhole data to be recorded in real-time at certain time intervals, making it easier to monitor production conditions and providing a database for ESP multi-parameter diagnostics. Deep learning algorithms have also emerged to turn large amounts of raw data into high-dimensional and more abstract features through nonlinear mapping to improve data differentiation.^{10,11} Deep learning is currently an intelligent method widely used for troubleshooting mechanical equipment. Due to its special network structure and ability to handle complex recognition tasks, it has attracted many academics to theorize on its research and application, for example, Thomas *et al.*^{12} The training set is submitted to an automatic model-free learning system based on Bayesian belief networks and compared to a reference support vector machine classifier. Experiments are presented for three different condition classes, using sophisticated statistical evaluation methodologies to measure the classifier performance. Carlos *et al.*^{13} used a neural network to recognize ESP airlock events. Castellanos *et al.*^{14} used a decision tree to classify different faults by processing the raw signal characteristics of the ESP. Yang *et al.*^{15} proposed an ESP fault recognition method based on the combination of unsupervised feature extraction and migration learning. Chen *et al.*^{16} provided an ESP fault prediction and classification method combining back propagation neural nets with artificial feature extraction. Yang *et al.*^{17} proposed an operational fault diagnosis method for electric submersible pumps based on the fusion of SPC rules and *a priori* knowledge in response to the popularization and application of the ESP real-time monitoring parameter acquisition system. Yang *et al.*^{18} used a combination of Principal Component Analysis (PCA) and the marginal distance method for fault diagnosis of electric submersible pump tubing leakage. Peng *et al.*^{19} evaluated PCA as an unsupervised machine learning technique to detect the causes of broken shafts in electric submersible pumps.

These methods play a role in ESP fault diagnosis, but still some areas need to be improved:

The classifiers employed in previous studies may lead to overfitting and poor generalization ability of the diagnostic model. This can make it difficult to distinguish some similar classes.

In ESP fault diagnosis algorithms based on deep learning, artificial hyperparameters are used as input. The selection method is a trial-and-error approach, where model performance is assessed, and parameters are gradually refined and improved. Manually adjusting the hyperparameters will take much effort, and the results may not be as expected.

To address the above deficiencies, this paper proposes an ESP fault diagnosis algorithm with automatic hyperparameter optimization. The algorithm combines CNN and SVM and uses Bayesian optimization to optimize the hyperparameters of the model, saving hyperparameter adjustment workload and time. The main contributions are made as follows:

In this paper, the ESP sample data have many features, and a Batch Normalization (BN) layer is added inside the CNN to make the distribution of each feature of the same batch similar and shorten the fault diagnosis time. A 1DCNN-SVM hybrid model is introduced to be applied to the fault diagnosis of ESP. Adaptive feature extraction of faulty data is performed using CNN, and then, the Softmax layer in the conventional CNN is replaced with an SVM classifier to classify the output of the fully connected layer. In order to further improve the SVM classification performance, five-fold cross-validation is used for the ESP data to achieve the optimal selection of SVM parameters. The experimental results show that the classification accuracy of the 1DCNN-SVM model proposed in this paper improves by 2.2% over the traditional CNN, and the diagnosis time is reduced by 37.1%.

By employing Bayesian optimization, we improved the 1DCNN-SVM fault diagnosis model. This allowed us to fine-tune hyperparameters, such as the learning rate and batch size, to discover the optimal configuration, resulting in the model achieving its highest predictive performance. This approach helps us save significant time that would otherwise be spent on manual parameter tuning. This paper also compares the accuracy of 1DCNN-SVM model with grid search (GS), particle swarm optimization (PSO), and Bayesian optimization (BO) for hyper-parameter tuning in turn and concludes that the BO-1DCNN-SVM model is more accurate, and its results are also better.

The rest of this paper is presented as follows: In Sec. II, an electric submersible pump’s data source is examined, the raw data are dealt with, and linear interpolation and data normalization are used to remove duplicates and missing data. Section III introduces the structure of the CNN model and the improved 1DCNN-SVM diagnostic model used in this paper, as well as the Bayesian optimization principle. Section IV of this paper focuses on comparing and analyzing the experimental results. First, it compares the evaluation metrics of the traditional 1DCNN-Softmax model with the improved 1DCNN-SVM model. Second, the improved 1DCNN-SVM model is optimized using Bayesian optimization to find the optimal hyperparameter combination. Finally, BO-1DCNN-SVM is compared with other existing techniques, including GS-1DCNN-SVM and PSO-1DCNN-SVM. The experimental results demonstrate that the model proposed in this paper achieves the highest accuracy, with classification accuracies exceeding 90% for all three types of faulty and normal operating conditions. This validates the practicality of the proposed method.

## II. DATA PROCESSING AND ANALYSIS OF ESP

The data were obtained from the China National Offshore Oil Platform 119 System Development Production Database, which contains large-scale production data from different ESP sensors. The data record normal conditions and three different categories of abnormal operating conditions in ESP operation. All data are recorded once a day and are described by 15 production variables, as shown in Table I.

Variables . | Unit . |
---|---|

Test fluid production | m^{3}/day |

Test water production | m^{3}/day |

Test oil production | m^{3}/day |

Test gas production | 10^{4} m^{3}/day |

Water content | % |

Wellhead pressure | MPa |

Wellhead temperature | °C |

Sleeve pressure | MPa |

Pump frequency | HZ |

Pump current | A |

Pump voltage | V |

Pump inlet pressure | MPa |

Pump outlet pressure | MPa |

Pump inlet temperature | °C |

Pump motor temperature | °C |

Variables . | Unit . |
---|---|

Test fluid production | m^{3}/day |

Test water production | m^{3}/day |

Test oil production | m^{3}/day |

Test gas production | 10^{4} m^{3}/day |

Water content | % |

Wellhead pressure | MPa |

Wellhead temperature | °C |

Sleeve pressure | MPa |

Pump frequency | HZ |

Pump current | A |

Pump voltage | V |

Pump inlet pressure | MPa |

Pump outlet pressure | MPa |

Pump inlet temperature | °C |

Pump motor temperature | °C |

The following are the main reasons for the occurrence of three fault conditions in ESP systems:

Part of the pump shaft is broken: the impeller or separator is blocked, or the casing is deformed, as well as a seriously stuck pump with too much torque results in the breakage of the pump shaft.

Motor overheating due to frequent starting within a short period of time, high viscosity of well fluid, high specific gravity, improper pump selection, current and voltage imbalance, contaminated or missing motor oil, insufficient fluid supply, long-term overload, underload operation, etc.

Insufficient fluid supply from the formation: poor fluid supply capacity of the formation, followed by low pump hanging depth and oversized pump selection.

During ESP operation, these undesired operating conditions may interact or occur simultaneously. Early intervention is necessary to prevent temporary well shutdowns. Therefore, our aim is to predict the occurrence and type of failure in advance and to reduce the substantial economic losses associated with fault shutdowns for maintenance.

### A. Data pre-processing

Data collected in the industrial field usually have one or more cases of data duplication, missing data, and data exception. In the daily production data recorded by using the ESP sensor, some data are measured several times a day, resulting in repeated data recording. This will affect subsequent experimental research, and duplicate data must be removed. Delete two or more data records per day, and keep the latest data sample recorded at the time of each day. It is necessary to consider environmental factors and professional knowledge while handling outliers and missing values of ESP. During the production process of ESP, outliers may be generated due to the factors such as fault shutdown, well shutdown, and typhoon. Therefore, the production time of the ESP will be less than 24 h, and the abnormal sample data of the ESP production time less than 24 h must be removed. The missing data can be interpolated by linear interpolation, quadratic spline, cubic spline, and a nearest neighbor algorithm. In this paper, linear interpolation is used to add to the missing data. However, linear interpolation is a widespread numerical method for estimating values between two known data points. It is simpler and more convenient compared to other methods.

The records of the ESP sensors are not available every day, and to some extent, they are irregular. It is possible that two or three days of missing data are not recorded or five or six days of data are not recorded. The magnitude of missing values was estimated for adjacent feature data. Taking well name 1621 as an example, most of the characteristic data for this well from December 19, 2021, to December 22, 2021, were not recorded for the four days, resulting in missing data, and the data could not be used directly for subsequent troubleshooting predictions. Linear interpolation was used to estimate the values of the data for these four days, and Fig. 1 shows linear interpolation of characteristic data, such as wellhead temperature, pump motor temperature, pump inlet pressure, and pump outlet pressure.

Missing data in daily oil and gas well production are restored using linear interpolation, preserving data integrity and aiding subsequent experiments.

### B. Data analysis

The daily production data of an ESP can reflect the production status of the well and can be used to monitor the performance status of the ESP. Significant disparities exist among variables, directly impacting the construction of the ESP fault diagnosis model and adding to algorithmic complexities. Rapidly achieving the model’s optimal solution presents challenges. Therefore, this study employs the max–min normalization method to scale each variable into the [0,1] interval, simplifying the processing. Figure 2 represents the data trends of production parameters for well 1621 at different working conditions, where the data are collected continuously. Obviously, there are significant differences in each parameter at different working conditions.

Our goal is to obtain accurate prediction results before the failure phase occurs. Since wells are usually shut down for servicing and maintenance immediately after a failure, we categorize normal and failure sample data. The type of failure that occurs can be accurately predicted before the failure occurs based on the trend of each parameter. This allows for advanced fault prediction in the case of abnormal changes in data trends, giving workers enough time to take preventive measures.

## III. DESIGN OF FAULT PREDICTION MODEL

The flow chart of the ESP fault prediction method based on BO-1DCNN-SVM is shown in Fig. 3. There are four parts: analysis of daily output data of ESP, Bayesian optimization algorithm, convolutional neural network classification prediction, and fault diagnosis model evaluation.

The steps of fault diagnosis for ESP are as follows:

Daily production data analysis: In this paper, the daily production data of offshore oil and gas water wells are collected, in which the feature factors include 15 features, such as pump current, pump voltage, pump inlet temperature, pump outlet pressure, and pump motor temperature. In this paper, we take well 1621 as an example, normalize its daily production data, interpolate the missing data, and divide the data into the training set and test set.

The Bayesian optimization algorithm: limit the batch size, learning rate, L

_{2}-regularization, and the range of search interval of these hyperparameters. The probability distribution of the function is estimated by building a Gaussian process model, and then, the next sampling point is selected according to this probability distribution, and the Gaussian process model is continuously sampled and updated to gradually approximate the optimal solution of the function. A better combination of hyperparameters is found within a smaller number of samples, thus saving training time and computational resources.Fault detection: In this paper, there are 15 feature data, the data are tiled as 15 × 1 × 1 as the input layer, and the data are extracted by the convolutional layer for feature extraction, without going to rely on manual experience for feature extraction. The maximum pooling layer is then used to select the feature with the greatest relevance for dimensionality reduction, and the fully connected layer maps the features to the output. The output of the fully connected layer is used as the input of the SVM for feature classification, which is more robust and more effective.

Evaluation of detection results: The precision, recall, and F1-score, which are three indicators, are used to judge the accuracy of the model prediction results by comparing the traditional 1DCNN-Softmax model and the improved 1DCNN-SVM model. The confusion matrix of BO-1DCNN-SVM is obtained, as well as comparing the accuracy under different fault diagnosis models, which shows that the fault diagnosis model of BO-1DCNN-SVM proposed in this paper has the best accuracy.

### A. Convolutional neural network (CNN) model

A convolutional neural network contains a feature extractor consisting of a convolutional layer and a pooling layer, which draws out the deep sample information layer by layer. The model structure of CNN usually includes an input layer, a convolutional layer, an activation layer, a pooling layer, and an output layer (fully connected layer and Softmax layer),^{20–22} and Fig. 4 shows the structure of a convolutional neural network.

*Z*

^{l}is the output of the l-th layer, $xil\u22121$ is the output of the

*i*channel of the

*l*− 1th layer,

*c*

^{l−1}is the

*c*th channel of the

*l*− 1 layer, $wi,cl$ is the weight matrix of the convolution kernel of the

*l*layer, $bil$ is the bias term, and * is the convolution operation.

*x*

^{l}is the output feature map obtained from the convolution operation and y

^{l}is the output value of

*x*

^{l}after ReLU activation operation.

*i*th channel of the

*l*+ 1th layer,

*S*is the size of the pooling kernel, and

*j*is the step size. $qil(t)$ is the output value of the t neuron of the

*i*channel in layer

*l*.

*z*

^{k}is the output of the

*k*th layer,

*w*

^{k}is the weight,

*x*

^{k−1}is the input value of the

*k*− th layer,

*b*

^{k}is the bias term, and

*f*(·) is the activation ReLU function.

*p*(

*x*

^{i}) is the probabilistic output of the output layer neuro,

*x*

_{i}is the activation value of the

*i*th neuron in the output layer, and

*C*is the number of categories.

*m*denotes the number of input samples,

*n*denotes the number of categories of samples,

*Y*

_{ij}is the probability of distribution of true labels corresponding to the samples, and

*X*

_{ij}is the probability of distribution of sample recognition results. The CNN parameters are tuned using an optimizer that minimizes the loss function, and the Adam optimization algorithm is used in this paper.

*θ*is approximately limited by the learning rate

*α*and |Δ

*θ*

_{t,k}| < ≈

*α*, Δ

*θ*

_{t,k}are the k-th component of Δ

*θ*

_{t}. In practice, the number of updates required to approximate the optimal solution for the parameter can be roughly inferred from the order of magnitude of

*α*. Calculate the estimates

*m*

_{t}and

*u*

_{t}for the first-order and second-order matrices of

*g*

_{t}, respectively. The formulas are given as follows:

*β*

_{1},

*β*

_{2}∈ [0, 1) is the decay constant. The default settings are

*β*

_{1}= 0.9 and

*β*

_{2}= 0.999, and both

*m*

_{0}and

*u*

_{0}are initialized as d-dimensional zero vectors and correct for deviations in

*m*

_{t}and

*u*

_{t}, respectively,

### B. Support vector machine (SVM)

^{23–26}classification architecture has been widely used in many different fields and in machine fault diagnosis and is now considered one of the most powerful methods for solving multi-class classification problems in machine learning. Compared with the traditional 1DCNN using Softmax for classification, SVM has better robustness and classification performance under a few sample data

^{27,28}conditions. Its basic model is a linear classifier with a maximum interval defined on the feature space. The maximum interval makes it different from a perceptron, whose decision boundary is the maximum margin hyperplane solved for the learned samples. However, considering that the dataset is linearly inseparable, a soft interval can be introduced to solve the problem. Finally, the fault classification prediction is completed. Its mathematical expression is as follows:

*x*

_{i}∈

*R*

^{n}is the input data,

*y*

_{i}∈ {−1, +1} is the corresponding class label,

*ξ*

_{i}is the relaxation factor,

*C*is the penalty factor,

*b*is the bias,

*w*is the optimization parameter, and

*φ*(

*x*

_{i}) is a nonlinear function that maps

*x*

_{i}to a high-dimensional feature space.

*σ*is the parameter of the kernel function.

In this paper, the enhanced 1DCNN-SVM network model is designed to arrange the 15 features from the input layer in the form of a 15 × 1 × 1 matrix. The 1DCNN network architecture comprises two convolutional layers, two maximum pooling layers, and a fully connected layer. Incremental convolutional kernel settings and pooling operations are used to handle features of different scales, and batch normalization layers are added to speed up the convergence rate. The specific structure is shown in Table II.

No. . | Layer . | Activation function . | Dimension . |
---|---|---|---|

1 | Input layer | ⋯ | 1 × sample num |

2 | Convolutional layer | ReLU | 1 × 16 |

3 | Batchnorm | ⋯ | 16 |

4 | Maxpool layer | ⋯ | 2 × 1 |

5 | Convolutional layer | ReLU | 1 × 32 |

6 | Batchnorm | ⋯ | 32 |

7 | Maxpool layer | ⋯ | 2 × 1 |

8 | Flatten | ⋯ | ⋯ |

9 | SVM | ⋯ | ⋯ |

10 | Classification layer | ⋯ | ⋯ |

No. . | Layer . | Activation function . | Dimension . |
---|---|---|---|

1 | Input layer | ⋯ | 1 × sample num |

2 | Convolutional layer | ReLU | 1 × 16 |

3 | Batchnorm | ⋯ | 16 |

4 | Maxpool layer | ⋯ | 2 × 1 |

5 | Convolutional layer | ReLU | 1 × 32 |

6 | Batchnorm | ⋯ | 32 |

7 | Maxpool layer | ⋯ | 2 × 1 |

8 | Flatten | ⋯ | ⋯ |

9 | SVM | ⋯ | ⋯ |

10 | Classification layer | ⋯ | ⋯ |

The layer number 9 of 1DCNN-Softmax is the Softmax layer, and the rest of the layers have the same structure as 1DCNN-SVM. The Adam gradient descent algorithm is used for convolutional neural networks in both structures training to accelerate the training of the model and improve its accuracy of the model. The structure described above is employed in this paper because the architecture comprising two convolutional layers, two pooling layers, and a fully connected network is commonly used in practical applications. This structure strikes a reasonable balance, enabling gradual learning from local to global features, enhancing feature perception, and avoiding excessive computational costs to a certain extent.

### C. BO-1DCNN-SVM model design

^{29–31}is introduced. BO is a very effective global optimization algorithm with the objective to find the global optimal solution, which mainly relies on the probabilistic proxy model and the collection function to find the optimal. BO fits the real objective function by the agent model and actively selects the best evaluation point for evaluation based on the fitting result, avoiding unnecessary sampling. The specific optimization steps are as follows:

Step 1: Maximize the acquisition function to obtain the next evaluation point,

Step 2: Calculate the objective function value of the evaluation point,

Step 3: Integrate the new observations into the historical observation set, update the probabilistic proxy model, and determine whether the termination condition has been reached. If the condition is satisfied, then output; otherwise, return to step 1 and repeat iteration.

The BO-1DCNN-SVM model designed in this paper has the following main improvements compared with the traditional convolutional neural network:

The SVM classifier has better interpretation than Softmax. SVM provides support vectors and decision boundaries, and this information can better understand the classification decisions. In contrast, the Softmax layer can only provide probability distributions, which are difficult to explain the classification decisions. In particular, SVM has better performance than Softmax on a few sample data and high-dimensional data sets.

Random selection of learning rate, batch size, and regular coefficients may lead to the insufficient capacity of the improved 1DCNN-SVM in ESP fault diagnosis, so Bayesian optimization can find the optimal solution in fewer iterations compared with traditional grid search or random search methods, thus saving time and computational resources.

### D. Model evaluation indicators

## IV. EXPERIMENTAL RESULTS AND ANALYSIS

### A. Experimental data

In this paper, the daily production data of pump 1622 are interpolated for missing data and normalized for data according to the data of the standardized sample. The pump recorded a total of 759 data points, including 330 normal operating condition datapoints, 147 points of the motor overheating fault scenario, 143 points of the insufficient fluid supply fault case, and 139 datapoints of the ESP shaft breakage fault scenario. The dataset of the ESP is shown in Table III.

Label . | Condition . | Number of samples . |
---|---|---|

1 | Normal | 330 |

2 | Motor overheating | 147 |

3 | Insufficient fluid supply | 143 |

4 | Electric pump shaft broken | 139 |

Label . | Condition . | Number of samples . |
---|---|---|

1 | Normal | 330 |

2 | Motor overheating | 147 |

3 | Insufficient fluid supply | 143 |

4 | Electric pump shaft broken | 139 |

Well 1622 was randomly divided into the training set and test set and validation set in the ratio of 4:2:2, where the training set had 400 sets of data, the test set had 179 sets, and the validation set had 180 sets.

### B. Comparison and results analysis of fault diagnosis model

According to the improved method proposed in this paper, the 865 sets of data provided in Table III were constructed as 865 × 15 matrices as inputs to the 1DCNN-SVM model. The Adam optimizer is used in this experiment, and the activation function is ReLU. A five-fold cross-validation of the dataset was performed to obtain the penalty coefficients of the SVM set to C = 10 and the radial basis function parameter g = 0.01.

In this paper, the precision rate, recall rate, and F1-score are used as comprehensive evaluation metrics for fault recognition results. Table IV shows the evaluation index comparing the diagnostic results of the traditional CNN-Softmax model and the improved CNN-SVM model on the testing dataset.

. | Precision of . | . | Recall of . | . | F1-score of . | . |
---|---|---|---|---|---|---|

Number of fault . | traditional CNN . | Precision of CNN-SVM . | traditional CNN . | Recall of CNN-SVM . | traditional CNN . | F1-score of CNN-SVM . |

1 | 0.905 | 0.933 | 0.893 | 0.933 | 0.899 | 0.933 |

2 | 0.917 | 0.971 | 0.892 | 0.919 | 0.904 | 0.944 |

3 | 0.969 | 0.914 | 0.912 | 0.941 | 0.940 | 0.927 |

4 | 0.838 | 0.882 | 0.939 | 0.909 | 0.886 | 0.895 |

Average | 0.907 | 0.925 | 0.909 | 0.926 | 0.907 | 0.925 |

. | Precision of . | . | Recall of . | . | F1-score of . | . |
---|---|---|---|---|---|---|

Number of fault . | traditional CNN . | Precision of CNN-SVM . | traditional CNN . | Recall of CNN-SVM . | traditional CNN . | F1-score of CNN-SVM . |

1 | 0.905 | 0.933 | 0.893 | 0.933 | 0.899 | 0.933 |

2 | 0.917 | 0.971 | 0.892 | 0.919 | 0.904 | 0.944 |

3 | 0.969 | 0.914 | 0.912 | 0.941 | 0.940 | 0.927 |

4 | 0.838 | 0.882 | 0.939 | 0.909 | 0.886 | 0.895 |

Average | 0.907 | 0.925 | 0.909 | 0.926 | 0.907 | 0.925 |

As can be seen from Table IV, the average precision, average recall, and average F1-score of the improved 1DCNN-SVM model are higher than those of the traditional 1DCNN model. In addition, the total duration of the traditional 1DCNN and the improved 1DCNN-SVM was 51.3908 and 32.2838 s, respectively, and the diagnosis time was reduced by 37.1%. Although SVM was used for classification in this paper, the diagnosis time did not increase. The improved 1DCNN-SVM model has higher accuracy, faster diagnosis speed, saving time cost, and good robustness.

In this paper, a traditional 1DCNN-Softmax model test and a 1DCNN-SVM model test are performed with the same dataset. In addition, the accuracy curve and loss curve are used to compare the diagnosis results of the two methods.

As seen in Figs. 5 and 6, in terms of accuracy, after replacing the last Softmax layer of CNN with the SVM layer in the method proposed in this paper, the accuracy of the traditional 1DCNN-Softmax model reaches 90.50%, while the accuracy of improved CNN-SVM in this design is improved to 92.7%. The accuracy of the improved 1DCNN-SVM model rises faster than the conventional model. In terms of loss rate, the traditional 1DCNN-Softmax model has a slower decrease in the loss rate, a higher loss rate, and fluctuations. The experiments show that the improved 1DCNN-SVM proposed in this paper improves the accuracy and has better diagnostic performance.

### C. Experimental results analysis of BO-1DCNN-SVM

In trying to make the accuracy of the model improved, the adjustment of the hyperparameters of the convolutional neural network leads to much time-consumption and may not achieve the expected results, and the parameter definition still depends on the manual experience. In this paper, the 1DCNN-SVM network model is optimized using a Bayesian optimization method. The maximum number of iterations is set to 30, and three variables are designated for optimization to find the best feasible point observed. The accuracy of BO-1DCNN-SVM, as proposed in this paper, will be compared with BO-1DCNN-Softmax, as well as the accuracy of the 1DCNN-SVM model tuned by GS and PSO algorithms, respectively. The model accuracies for BO-1DCNN-Softmax, GS, PSO, and BO-1DCNN-SVM are 92.73%, 89.9%, 91.1%, and 96.64%, respectively. The method proposed in this paper achieves the highest accuracy. Furthermore, a comparison of computation times with BO-1DCNN-Softmax indicates that the proposed method is more time-efficient. The constraint ranges and optimization search results for the four methods in optimizing hyperparameters are presented in Table V.

Hyperparameters . | Minimum value . | Maximum value . | BO-1DCNN-softmax search results . | GS-1DCNN-SVM search results . | PSO-1DCNN-SVM search results . | BO-1DCNN-SVM search results . |
---|---|---|---|---|---|---|

Batch size | 40 | 200 | 146 | 200 | 125 | 110 |

Learning rate | 10 × 10^{−3} | 1 | 0.010 637 | 1 | 0.014 543 | 0.002 154 |

Regularization | 10 × 10^{−10} | 10 × 10^{−2} | 4.8532 × 10^{−8} | 0.01 | 5.2746 × 10^{−4} | 4.4895 × 10^{−5} |

coefficient L_{2} |

Hyperparameters . | Minimum value . | Maximum value . | BO-1DCNN-softmax search results . | GS-1DCNN-SVM search results . | PSO-1DCNN-SVM search results . | BO-1DCNN-SVM search results . |
---|---|---|---|---|---|---|

Batch size | 40 | 200 | 146 | 200 | 125 | 110 |

Learning rate | 10 × 10^{−3} | 1 | 0.010 637 | 1 | 0.014 543 | 0.002 154 |

Regularization | 10 × 10^{−10} | 10 × 10^{−2} | 4.8532 × 10^{−8} | 0.01 | 5.2746 × 10^{−4} | 4.4895 × 10^{−5} |

coefficient L_{2} |

The optimal combination of hyperparameters obtained by Bayesian optimization can be seen in Table V: the batch size is 110, the learning rate is 0.002 154 1, and the regularization factor is 4.4895 × 10^{−5}. In this process, as the number of iterations advances, the comparison of the observed minimum objective function with the predicted minimum objective function is shown in Fig. 7. At the 29th iteration, the two curves converge, indicating that the next computation is no longer able to improve the objective function and maximize the acquisition function (maximize the acquisition function) that is close to 0; then, the global optimum is found.

The accuracy of the BO-1DCNN-SVM fault diagnosis model on the training and validation sets is shown in Fig. 8. From Fig. 8, it can be seen that the fault diagnosis accuracy of the model has reached more than 80% in both the training set and the validation set after 200 training cycles, and the accuracy of the model in the subsequent training is also showing an increasing trend. Finally, the model stabilized after 600 iterations and the validation accuracy of the model was 98.89%. It indicates that the model has high accuracy and good robustness.

In this paper, we introduce the confusion matrix of the test set, illustrated in Fig. 9. This matrix provides a comprehensive representation of information such as the types of faults that were misclassified and the accuracy rate.

In Fig. 9, the vertical coordinates indicate the actual fault labels and the horizontal coordinates indicate the predicted fault labels. There are four different operating conditions. There are 71 samples for the first type of the normal operating condition, 40 samples for the second type of the motor overheating fault, 35 samples for the third type of the insufficient fluid supply fault, and 33 samples for the third type of the ESP partial shaft breakage fault. The data on the diagonal line are the correct number of each type of fault diagnosis. As seen from Fig. 9, three faults in the second category were diagnosed as normal working conditions: one fault in the third category was diagnosed as normal working conditions and two faults in the fourth category were diagnosed as normal working conditions. The accuracy rate of diagnosis reached 96.64%, which has good diagnostic performance. In Fig. 10, the number of correct diagnoses is 166, resulting in an accuracy of 92.73%. Notably, the total elapsed time for BO-1DCNN-SVM is 885.0444 s, while for BO-1DCNN-Softmax, it is 1196.5453 s. This signifies that the diagnosis time with BO-1DCNN-SVM is ∼26.03% shorter, making it a faster and more time-efficient option.

Figures 11 and 12 show the confusion matrices of the network optimized with the improved 1DCNN-SVM using grid search and particle swarm algorithms, respectively.

As evident from the diagonal, both GS-1DCNN-SVM and PSO-1DCNN-SVM correctly predicted 161 and 163 classifications out of 179 sets of test data, respectively. Among these four optimization models, the Bayesian optimization 1DCNN-SVM model proposed in this paper achieved the highest accuracy, yielding the best results.

In order to evaluate the performance of the proposed model, some deep learning models are compared with the proposed model in this paper with the same amount of data based on the fault diagnosis accuracy as the evaluation criteria. The experimental results are shown in Fig. 13.

Classical deep learning models, such as fully connected neural networks (FCNNs), Long Short-Term Memory (LSTM) networks, and CNN, are compared with SVM, 1DCNN-SVM, BO-SVM, and the proposed method in this paper. As shown in Fig. 13, SVM and BO-SVM exhibit lower accuracy on the test set, averaging only 62.15% and 63.96%, respectively. BPNN, LSTM, and CNN achieve average accuracies of 80.98%, 87.01%, and 90.78% on the test set, respectively. 1DCNN-SVM attains an average accuracy of 92.18% on the test set. However, the BO-1DCNN-SVM method proposed in this paper achieves the highest model accuracy, surpassing 95% on the test set. This results in improved fault diagnosis with reduced misclassification and enhanced stability, making it more suitable for the faulty systems of submerged oil electric pumps on offshore platforms.

## V. CONCLUSIONS

In this paper, a novel ESP defect prediction approach based on BO-1DCNN-SVM is proposed. First, the original data are preprocessed using the linear interpolation approach to address the issue of missing and invalid data in the dataset. The CNN-SVM model, which combines the benefits of CNN’s automatic feature extraction capability with the benefits of SVM’s prediction in a few sample data settings, is created by substituting an SVM classifier for the conventional 1DCNN Softmax layer. The Bayesian optimization approach is then used to optimize CNN-SVM hyperparameters, increasing the model’s predictive accuracy. The experimental findings demonstrate that the suggested method can accurately identify the failure state of electric systems since the average prediction accuracy of the BO-1DCNN-SVM model is greater than 95%, and the diagnosis time is less, saving the time cost.

## ACKNOWLEDGMENTS

This work was supported by the National Key R&D Program of China (Grant No. 2019YFC0312303-05).

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

## DATA AVAILABILITY

The data that support the findings of this study are available from the corresponding author upon reasonable request.

## REFERENCES

*Safety System Engineering*

IECON 2013–39th Annual Conference of the IEEE Industrial Electronics Society