A conventional intelligent fault diagnosis approach for electric submersible pumps (ESPs) is heavily dependent on manual expertise for feature extraction. Meanwhile, a conventional convolutional neural network (CNN) exhibits an excess of parameters and requires a substantial volume of training samples. In this paper, a fault diagnosis algorithm of ESPs based on a Bayesian optimization-one-dimensional convolutional neural network-support vector machine (BO-1DCNN-SVM) is proposed by combining a 1DCNN with a SVM and using the algorithm BO to tune the improved 1DCNN. First, the method uses the self-extracting feature capability of the 1DCNN to solve the problem that traditional diagnosis methods over-rely on manual experience extraction. Meanwhile, the last layer of the convolutional neural network Softmax layer is replaced by the SVM to effectively process a few sample data. The accuracy and generalization ability of the fault classification of the electric submersible pump are improved. Then, the Bayesian optimization algorithm is used to find the optimal combination of hyperparameters for the improved 1DCNN-SVM model to further improve the prediction performance of the fault diagnosis model. Finally, the experimental results achieved a classification accuracy of 96.64%, which is 5% higher than existing CNN approaches with data samples of similar scale in this paper. The proposed method also proved to be highly accurate and robust in fault diagnosis.

As core equipment in the field of oil industry production, the electric submersible pump (ESP) system is mainly composed of an electric motor, a protector, a gas–liquid separator, a multi-stage centrifugal pump, a power cable, a transformer, a pressure sensor, and other components.1 It works by placing the tubing and the pump together in the well, connecting the built-in motor to the ground power supply through a special cable, and then driving the centrifugal pump through the operation of the motor to generate centrifugal force, which further lifts the crude oil from the well to the ground.2 ESPs are prone to failure because they operate in geological structures and well conditions that are complicated and changeable and have a strong correlation between individual components. Under high pressure environments, the sealing performance of the pump will be affected, leading to leakage and mechanical wear. Under high-temperature environments, electric motors are susceptible to thermal damage, necessitating the implementation of a cooling circulation system to ensure their stable operation. When the state of the medium is different or changes, resulting in changes in gas content will have an impact on the pump cavitation. Once a failure occurs, it can lead to shutdown for well repair operations, which is time-consuming and expensive.3 An effective fault diagnosis of an ESP can assist production managers in identifying faults quickly and accurately, reducing equipment maintenance time, enhancing the effectiveness of maintenance, delaying equipment obsolescence, preventing a series of domino effects, and enhancing the safety and durability of the entire system. Therefore, it is of great research significance to conduct troubleshooting studies on ESPs.

At present, the ESP fault diagnosis technology is mainly divided into traditional ESP fault diagnosis methods and modern ESP fault diagnosis methods. The traditional ESP fault diagnosis mainly includes current card diagnosis and holding pressure method diagnosis, among which Zhang et al.4 identified the fault type of electric submersible pump by identifying the fault characteristics of the current card under different working conditions and analyzed in detail the formation causes of the typical characteristics of the current card. Experts or technicians manually diagnose ESP faults based on long-accumulated experience with current cards, which can lead to some judgment errors that compromise real-time performance and accuracy. The holding pressure diagnostic method is based on quickly closing the production valve at the wellhead during normal operation and shutdown of the ESP system, monitoring the oil pressure changes at the wellhead, and plotting the holding pressure curve. Then, according to the curve reflecting the change of characteristics, the fault diagnosis analysis is carried out. The holding pressure method diagnoses few types of faults, and the accuracy of diagnosis results is also poor. Machine learning advances and enhanced computational resources make the fault diagnosis of modern ESP also rapidly developed. At present, the three most commonly used methods are the neural network, fuzzy mathematics, and fault tree analysis method.5 Among them, the backpropagation (BP) neural network is the most frequently used method in fault diagnosis, and many experts and scholars have started to study the application of neural networks to current card identification,6 which avoids the difference of personal and experience on fault identification due to the manual identification process. The fuzzy mathematical diagnosis technique takes fuzzy mathematics as the theoretical basis, constructs the fuzzy relationship matrix of fault conditions and symptoms, and finds the set of fault degrees according to the set of affiliation degrees of the fuzzy matrix.7 Fault tree analysis is one of the essential analysis methods in system safety engineering,8 which represents the logical relationship between the possible fault conditions and the causes of failure of ESP systems in a tree structure.9 In recent years, with the rapid development of machine learning theory and technology, machine learning methods have brought more new ideas to ESP fault diagnosis. The increasing number of ESP sensors allows downhole data to be recorded in real-time at certain time intervals, making it easier to monitor production conditions and providing a database for ESP multi-parameter diagnostics. Deep learning algorithms have also emerged to turn large amounts of raw data into high-dimensional and more abstract features through nonlinear mapping to improve data differentiation.10,11 Deep learning is currently an intelligent method widely used for troubleshooting mechanical equipment. Due to its special network structure and ability to handle complex recognition tasks, it has attracted many academics to theorize on its research and application, for example, Thomas et al.12 The training set is submitted to an automatic model-free learning system based on Bayesian belief networks and compared to a reference support vector machine classifier. Experiments are presented for three different condition classes, using sophisticated statistical evaluation methodologies to measure the classifier performance. Carlos et al.13 used a neural network to recognize ESP airlock events. Castellanos et al.14 used a decision tree to classify different faults by processing the raw signal characteristics of the ESP. Yang et al.15 proposed an ESP fault recognition method based on the combination of unsupervised feature extraction and migration learning. Chen et al.16 provided an ESP fault prediction and classification method combining back propagation neural nets with artificial feature extraction. Yang et al.17 proposed an operational fault diagnosis method for electric submersible pumps based on the fusion of SPC rules and a priori knowledge in response to the popularization and application of the ESP real-time monitoring parameter acquisition system. Yang et al.18 used a combination of Principal Component Analysis (PCA) and the marginal distance method for fault diagnosis of electric submersible pump tubing leakage. Peng et al.19 evaluated PCA as an unsupervised machine learning technique to detect the causes of broken shafts in electric submersible pumps.

These methods play a role in ESP fault diagnosis, but still some areas need to be improved:

  1. The classifiers employed in previous studies may lead to overfitting and poor generalization ability of the diagnostic model. This can make it difficult to distinguish some similar classes.

  2. In ESP fault diagnosis algorithms based on deep learning, artificial hyperparameters are used as input. The selection method is a trial-and-error approach, where model performance is assessed, and parameters are gradually refined and improved. Manually adjusting the hyperparameters will take much effort, and the results may not be as expected.

To address the above deficiencies, this paper proposes an ESP fault diagnosis algorithm with automatic hyperparameter optimization. The algorithm combines CNN and SVM and uses Bayesian optimization to optimize the hyperparameters of the model, saving hyperparameter adjustment workload and time. The main contributions are made as follows:

  1. In this paper, the ESP sample data have many features, and a Batch Normalization (BN) layer is added inside the CNN to make the distribution of each feature of the same batch similar and shorten the fault diagnosis time. A 1DCNN-SVM hybrid model is introduced to be applied to the fault diagnosis of ESP. Adaptive feature extraction of faulty data is performed using CNN, and then, the Softmax layer in the conventional CNN is replaced with an SVM classifier to classify the output of the fully connected layer. In order to further improve the SVM classification performance, five-fold cross-validation is used for the ESP data to achieve the optimal selection of SVM parameters. The experimental results show that the classification accuracy of the 1DCNN-SVM model proposed in this paper improves by 2.2% over the traditional CNN, and the diagnosis time is reduced by 37.1%.

  2. By employing Bayesian optimization, we improved the 1DCNN-SVM fault diagnosis model. This allowed us to fine-tune hyperparameters, such as the learning rate and batch size, to discover the optimal configuration, resulting in the model achieving its highest predictive performance. This approach helps us save significant time that would otherwise be spent on manual parameter tuning. This paper also compares the accuracy of 1DCNN-SVM model with grid search (GS), particle swarm optimization (PSO), and Bayesian optimization (BO) for hyper-parameter tuning in turn and concludes that the BO-1DCNN-SVM model is more accurate, and its results are also better.

The rest of this paper is presented as follows: In Sec. II, an electric submersible pump’s data source is examined, the raw data are dealt with, and linear interpolation and data normalization are used to remove duplicates and missing data. Section III introduces the structure of the CNN model and the improved 1DCNN-SVM diagnostic model used in this paper, as well as the Bayesian optimization principle. Section IV of this paper focuses on comparing and analyzing the experimental results. First, it compares the evaluation metrics of the traditional 1DCNN-Softmax model with the improved 1DCNN-SVM model. Second, the improved 1DCNN-SVM model is optimized using Bayesian optimization to find the optimal hyperparameter combination. Finally, BO-1DCNN-SVM is compared with other existing techniques, including GS-1DCNN-SVM and PSO-1DCNN-SVM. The experimental results demonstrate that the model proposed in this paper achieves the highest accuracy, with classification accuracies exceeding 90% for all three types of faulty and normal operating conditions. This validates the practicality of the proposed method.

The data were obtained from the China National Offshore Oil Platform 119 System Development Production Database, which contains large-scale production data from different ESP sensors. The data record normal conditions and three different categories of abnormal operating conditions in ESP operation. All data are recorded once a day and are described by 15 production variables, as shown in Table I.

TABLE I.

Sample variables.

VariablesUnit
Test fluid production m3/day 
Test water production m3/day 
Test oil production m3/day 
Test gas production 104 m3/day 
Water content 
Wellhead pressure MPa 
Wellhead temperature °C 
Sleeve pressure MPa 
Pump frequency HZ 
Pump current 
Pump voltage 
Pump inlet pressure MPa 
Pump outlet pressure MPa 
Pump inlet temperature °C 
Pump motor temperature °C 
VariablesUnit
Test fluid production m3/day 
Test water production m3/day 
Test oil production m3/day 
Test gas production 104 m3/day 
Water content 
Wellhead pressure MPa 
Wellhead temperature °C 
Sleeve pressure MPa 
Pump frequency HZ 
Pump current 
Pump voltage 
Pump inlet pressure MPa 
Pump outlet pressure MPa 
Pump inlet temperature °C 
Pump motor temperature °C 

The following are the main reasons for the occurrence of three fault conditions in ESP systems:

  1. Part of the pump shaft is broken: the impeller or separator is blocked, or the casing is deformed, as well as a seriously stuck pump with too much torque results in the breakage of the pump shaft.

  2. Motor overheating due to frequent starting within a short period of time, high viscosity of well fluid, high specific gravity, improper pump selection, current and voltage imbalance, contaminated or missing motor oil, insufficient fluid supply, long-term overload, underload operation, etc.

  3. Insufficient fluid supply from the formation: poor fluid supply capacity of the formation, followed by low pump hanging depth and oversized pump selection.

During ESP operation, these undesired operating conditions may interact or occur simultaneously. Early intervention is necessary to prevent temporary well shutdowns. Therefore, our aim is to predict the occurrence and type of failure in advance and to reduce the substantial economic losses associated with fault shutdowns for maintenance.

Data collected in the industrial field usually have one or more cases of data duplication, missing data, and data exception. In the daily production data recorded by using the ESP sensor, some data are measured several times a day, resulting in repeated data recording. This will affect subsequent experimental research, and duplicate data must be removed. Delete two or more data records per day, and keep the latest data sample recorded at the time of each day. It is necessary to consider environmental factors and professional knowledge while handling outliers and missing values of ESP. During the production process of ESP, outliers may be generated due to the factors such as fault shutdown, well shutdown, and typhoon. Therefore, the production time of the ESP will be less than 24 h, and the abnormal sample data of the ESP production time less than 24 h must be removed. The missing data can be interpolated by linear interpolation, quadratic spline, cubic spline, and a nearest neighbor algorithm. In this paper, linear interpolation is used to add to the missing data. However, linear interpolation is a widespread numerical method for estimating values between two known data points. It is simpler and more convenient compared to other methods.

The records of the ESP sensors are not available every day, and to some extent, they are irregular. It is possible that two or three days of missing data are not recorded or five or six days of data are not recorded. The magnitude of missing values was estimated for adjacent feature data. Taking well name 1621 as an example, most of the characteristic data for this well from December 19, 2021, to December 22, 2021, were not recorded for the four days, resulting in missing data, and the data could not be used directly for subsequent troubleshooting predictions. Linear interpolation was used to estimate the values of the data for these four days, and Fig. 1 shows linear interpolation of characteristic data, such as wellhead temperature, pump motor temperature, pump inlet pressure, and pump outlet pressure.

FIG. 1.

Linear interpolation of well 1621 parameters.

FIG. 1.

Linear interpolation of well 1621 parameters.

Close modal

Missing data in daily oil and gas well production are restored using linear interpolation, preserving data integrity and aiding subsequent experiments.

The daily production data of an ESP can reflect the production status of the well and can be used to monitor the performance status of the ESP. Significant disparities exist among variables, directly impacting the construction of the ESP fault diagnosis model and adding to algorithmic complexities. Rapidly achieving the model’s optimal solution presents challenges. Therefore, this study employs the max–min normalization method to scale each variable into the [0,1] interval, simplifying the processing. Figure 2 represents the data trends of production parameters for well 1621 at different working conditions, where the data are collected continuously. Obviously, there are significant differences in each parameter at different working conditions.

FIG. 2.

Trend of partial ESP production parameters (normalized) for the overall failure.

FIG. 2.

Trend of partial ESP production parameters (normalized) for the overall failure.

Close modal

Our goal is to obtain accurate prediction results before the failure phase occurs. Since wells are usually shut down for servicing and maintenance immediately after a failure, we categorize normal and failure sample data. The type of failure that occurs can be accurately predicted before the failure occurs based on the trend of each parameter. This allows for advanced fault prediction in the case of abnormal changes in data trends, giving workers enough time to take preventive measures.

The flow chart of the ESP fault prediction method based on BO-1DCNN-SVM is shown in Fig. 3. There are four parts: analysis of daily output data of ESP, Bayesian optimization algorithm, convolutional neural network classification prediction, and fault diagnosis model evaluation.

FIG. 3.

Flow chart of the ESP fault prediction model.

FIG. 3.

Flow chart of the ESP fault prediction model.

Close modal

The steps of fault diagnosis for ESP are as follows:

  1. Daily production data analysis: In this paper, the daily production data of offshore oil and gas water wells are collected, in which the feature factors include 15 features, such as pump current, pump voltage, pump inlet temperature, pump outlet pressure, and pump motor temperature. In this paper, we take well 1621 as an example, normalize its daily production data, interpolate the missing data, and divide the data into the training set and test set.

  2. The Bayesian optimization algorithm: limit the batch size, learning rate, L2-regularization, and the range of search interval of these hyperparameters. The probability distribution of the function is estimated by building a Gaussian process model, and then, the next sampling point is selected according to this probability distribution, and the Gaussian process model is continuously sampled and updated to gradually approximate the optimal solution of the function. A better combination of hyperparameters is found within a smaller number of samples, thus saving training time and computational resources.

  3. Fault detection: In this paper, there are 15 feature data, the data are tiled as 15 × 1 × 1 as the input layer, and the data are extracted by the convolutional layer for feature extraction, without going to rely on manual experience for feature extraction. The maximum pooling layer is then used to select the feature with the greatest relevance for dimensionality reduction, and the fully connected layer maps the features to the output. The output of the fully connected layer is used as the input of the SVM for feature classification, which is more robust and more effective.

  4. Evaluation of detection results: The precision, recall, and F1-score, which are three indicators, are used to judge the accuracy of the model prediction results by comparing the traditional 1DCNN-Softmax model and the improved 1DCNN-SVM model. The confusion matrix of BO-1DCNN-SVM is obtained, as well as comparing the accuracy under different fault diagnosis models, which shows that the fault diagnosis model of BO-1DCNN-SVM proposed in this paper has the best accuracy.

A convolutional neural network contains a feature extractor consisting of a convolutional layer and a pooling layer, which draws out the deep sample information layer by layer. The model structure of CNN usually includes an input layer, a convolutional layer, an activation layer, a pooling layer, and an output layer (fully connected layer and Softmax layer),20–22 and Fig. 4 shows the structure of a convolutional neural network.

FIG. 4.

The structure of CNN.

FIG. 4.

The structure of CNN.

Close modal
The convolutional layer is the core of the convolutional neural network, which extracts local features of the input data through convolutional kernels. The local convolution operation is performed on them sequentially according to the step size, and the corresponding feature maps are output after the input feature maps have traversed the convolution operation once. Due to the parameter sharing mechanism of convolution kernel, each convolution kernel outputs a feature map, and the number of convolution kernels is the depth of the output feature map. The specific formula for the convolution operation is as follows:
Zl=i=1cl1wi,cl*xil1+bil,
(1)
where Zl is the output of the l-th layer, xil1 is the output of the i channel of the l − 1th layer, cl−1 is the cth channel of the l − 1 layer, wi,cl is the weight matrix of the convolution kernel of the l layer, bil is the bias term, and * is the convolution operation.
The activation function is usually used after the convolutional layer to nonlinearly transform the output feature maps to improve the nonlinear representation of the network, as well as to prevent the gradient from exploding and disappearing during the training process. Currently, the ReLU function has become a commonly used activation function for CNN network convolution, and its mathematical expression is as follows:
yl=f(xl)ReLU=max(0,xl),
(2)
where xl is the output feature map obtained from the convolution operation and yl is the output value of xl after ReLU activation operation.
Pooling is also called down sampling, as opposed to up sampling. The feature map obtained from convolution generally requires a pooling layer to reduce the amount of data while highlighting informative features, thus reducing the number of parameters and avoiding over-fitting. Currently, the two most commonly used are average pooling and maximum pooling; in this paper, we use the maximum pooling, which is to output the maximum value of the input feature map in the pooled kernel perceptual domain. Its mathematical expression is
pil+1=max(j1)S+1<t<jS{qil(t)},
(3)
where pil+1 is the output value of the ith channel of the l + 1th layer, S is the size of the pooling kernel, and j is the step size. qil(t) is the output value of the t neuron of the i channel in layer l.
The fully connected layer is generally located in the last part of the network and serves to integrate and extract again the features extracted by the alternating convolutional and pooling layers. The expression for the fully connected layer is as follows:
Zk=f((wk)Txk1+bk),
(4)
where zk is the output of the kth layer, wk is the weight, xk−1 is the input value of the k − th layer, bk is the bias term, and f(·) is the activation ReLU function.
For classification problems, the activation function at the last level is often chosen to be a “Softmax” classifier, and the Softmax function is given by the following equation:
p(xi)=exik=1Cexk,
(5)
where p(xi) is the probabilistic output of the output layer neuro, xi is the activation value of the ith neuron in the output layer, and C is the number of categories.
To measure the goodness of a model during training, a function can be used to measure the difference between the predicted and true values of the model, called the loss function or cost function. In this paper, the most commonly used loss function for classification tasks is the cross-entropy loss function. Cross-entropy is a concept in information theory, which is mainly used to measure the difference between two probability distributions. The cross-entropy loss function is also known as the Softmax loss function, and the expression is
LcrossEentry=1ni=1mj=1nYijlnXij,
(6)
where m denotes the number of input samples, n denotes the number of categories of samples, Yij is the probability of distribution of true labels corresponding to the samples, and Xij is the probability of distribution of sample recognition results. The CNN parameters are tuned using an optimizer that minimizes the loss function, and the Adam optimization algorithm is used in this paper.
Adam’s algorithm is one of the most commonly used optimization algorithms in machine learning as an alternative to the first-order optimization algorithm of the traditional stochastic gradient descent process. Adam incorporates the idea of moment estimation to dynamically adjust the learning rate by calculating and correcting the first-order and second-order moments of the gradient for each round. With Adam, the parameter change Δθ is approximately limited by the learning rate α and |Δθt,k| < ≈α, Δθt,k are the k-th component of Δθt. In practice, the number of updates required to approximate the optimal solution for the parameter can be roughly inferred from the order of magnitude of α. Calculate the estimates mt and ut for the first-order and second-order matrices of gt, respectively. The formulas are given as follows:
mt=β1mt1+(1β1)gt,
(7)
ut=β2ut1+(1β2)gt2.
(8)
β1, β2 ∈ [0, 1) is the decay constant. The default settings are β1 = 0.9 and β2 = 0.999, and both m0 and u0 are initialized as d-dimensional zero vectors and correct for deviations in mt and ut, respectively,
m̂t=mt1β1t,
(9)
ût=ut1β2t.
(10)
Adam’s parameter update equation is as follows:
θt+1=θtαûtm̂t.
(11)
Adam has the advantage of being good at dealing with non-smooth targets and is suitable for high-dimensional spaces.
In the past decade, the support vector machine (SVM)23–26 classification architecture has been widely used in many different fields and in machine fault diagnosis and is now considered one of the most powerful methods for solving multi-class classification problems in machine learning. Compared with the traditional 1DCNN using Softmax for classification, SVM has better robustness and classification performance under a few sample data27,28 conditions. Its basic model is a linear classifier with a maximum interval defined on the feature space. The maximum interval makes it different from a perceptron, whose decision boundary is the maximum margin hyperplane solved for the learned samples. However, considering that the dataset is linearly inseparable, a soft interval can be introduced to solve the problem. Finally, the fault classification prediction is completed. Its mathematical expression is as follows:
min12w||2+Ci=1Nξis.t.yi(wTφ(xi)+b)1ξi,
(12)
where xi ∈ Rn is the input data, yi ∈ {−1, +1} is the corresponding class label, ξi is the relaxation factor, C is the penalty factor, b is the bias, w is the optimization parameter, and φ(xi) is a nonlinear function that maps xi to a high-dimensional feature space.
The kernel function used in this paper is the radial basis kernel function (RBF), and its functional expression is
k(X1,X2)=expX1X2||22σ2,
(13)
where σ is the parameter of the kernel function.

In this paper, the enhanced 1DCNN-SVM network model is designed to arrange the 15 features from the input layer in the form of a 15 × 1 × 1 matrix. The 1DCNN network architecture comprises two convolutional layers, two maximum pooling layers, and a fully connected layer. Incremental convolutional kernel settings and pooling operations are used to handle features of different scales, and batch normalization layers are added to speed up the convergence rate. The specific structure is shown in Table II.

TABLE II.

The specific structure of 1DCNN-SVM.

No.LayerActivation functionDimension
Input layer ⋯ 1 × sample num 
Convolutional layer ReLU 1 × 16 
Batchnorm ⋯ 16 
Maxpool layer ⋯ 2 × 1 
Convolutional layer ReLU 1 × 32 
Batchnorm ⋯ 32 
Maxpool layer ⋯ 2 × 1 
Flatten ⋯ ⋯ 
SVM ⋯ ⋯ 
10 Classification layer ⋯ ⋯ 
No.LayerActivation functionDimension
Input layer ⋯ 1 × sample num 
Convolutional layer ReLU 1 × 16 
Batchnorm ⋯ 16 
Maxpool layer ⋯ 2 × 1 
Convolutional layer ReLU 1 × 32 
Batchnorm ⋯ 32 
Maxpool layer ⋯ 2 × 1 
Flatten ⋯ ⋯ 
SVM ⋯ ⋯ 
10 Classification layer ⋯ ⋯ 

The layer number 9 of 1DCNN-Softmax is the Softmax layer, and the rest of the layers have the same structure as 1DCNN-SVM. The Adam gradient descent algorithm is used for convolutional neural networks in both structures training to accelerate the training of the model and improve its accuracy of the model. The structure described above is employed in this paper because the architecture comprising two convolutional layers, two pooling layers, and a fully connected network is commonly used in practical applications. This structure strikes a reasonable balance, enabling gradual learning from local to global features, enhancing feature perception, and avoiding excessive computational costs to a certain extent.

Convolutional neural network parameters are generally divided into two major categories, model parameters and model hyperparameters. The hyperparameters can only be determined heuristically. It is important to determine the hyperparameters of the CNN, which will directly affect the performance of the model. Therefore, the Bayesian optimization (BO) algorithm29–31 is introduced. BO is a very effective global optimization algorithm with the objective to find the global optimal solution, which mainly relies on the probabilistic proxy model and the collection function to find the optimal. BO fits the real objective function by the agent model and actively selects the best evaluation point for evaluation based on the fitting result, avoiding unnecessary sampling. The specific optimization steps are as follows:
  • Step 1: Maximize the acquisition function to obtain the next evaluation point,

xi+1=arg maxxXα(xi,D1:i).
(14)
  • Step 2: Calculate the objective function value of the evaluation point,

yi+1=f(xi+1)+εi+1.
(15)
  • Step 3: Integrate the new observations into the historical observation set, update the probabilistic proxy model, and determine whether the termination condition has been reached. If the condition is satisfied, then output; otherwise, return to step 1 and repeat iteration.

The BO-1DCNN-SVM model designed in this paper has the following main improvements compared with the traditional convolutional neural network:

  1. The SVM classifier has better interpretation than Softmax. SVM provides support vectors and decision boundaries, and this information can better understand the classification decisions. In contrast, the Softmax layer can only provide probability distributions, which are difficult to explain the classification decisions. In particular, SVM has better performance than Softmax on a few sample data and high-dimensional data sets.

  2. Random selection of learning rate, batch size, and regular coefficients may lead to the insufficient capacity of the improved 1DCNN-SVM in ESP fault diagnosis, so Bayesian optimization can find the optimal solution in fewer iterations compared with traditional grid search or random search methods, thus saving time and computational resources.

In this study, based on the confusion matrix, an F1-score is introduced as the evaluation index of the model. The F1-score is calculated as follows:
F1 score =2×Recall×PrecisionRecall+Precision.
(16)
In Eq. (12), recall and precision define the equation as follows:
Recall=TPTP + FN,
(17)
Precision=TPTP + FP.
(18)
In Eqs. (17) and (18), TP is the number of positive classes predicted to be positive, FN is the number of positive classes predicted to be negative, and FP is the number of negative classes predicted to be positive. The higher the recall, the better the model recognizes positive samples. The higher the precision, the better the model’s ability to discriminate between negative samples. The F1-score is a combination of the two, and the higher the F1-score, the more robust the established classification model. In this study, the accuracy rate was also chosen as an evaluation index for fault diagnosis of electric submersible pumps. The accuracy rate is the ratio of the number of correctly predicted samples to the total number of predicted samples.

In this paper, the daily production data of pump 1622 are interpolated for missing data and normalized for data according to the data of the standardized sample. The pump recorded a total of 759 data points, including 330 normal operating condition datapoints, 147 points of the motor overheating fault scenario, 143 points of the insufficient fluid supply fault case, and 139 datapoints of the ESP shaft breakage fault scenario. The dataset of the ESP is shown in Table III.

TABLE III.

Electric submersible pump fault data set.

LabelConditionNumber of samples
Normal 330 
Motor overheating 147 
Insufficient fluid supply 143 
Electric pump shaft broken 139 
LabelConditionNumber of samples
Normal 330 
Motor overheating 147 
Insufficient fluid supply 143 
Electric pump shaft broken 139 

Well 1622 was randomly divided into the training set and test set and validation set in the ratio of 4:2:2, where the training set had 400 sets of data, the test set had 179 sets, and the validation set had 180 sets.

According to the improved method proposed in this paper, the 865 sets of data provided in Table III were constructed as 865 × 15 matrices as inputs to the 1DCNN-SVM model. The Adam optimizer is used in this experiment, and the activation function is ReLU. A five-fold cross-validation of the dataset was performed to obtain the penalty coefficients of the SVM set to C = 10 and the radial basis function parameter g = 0.01.

In this paper, the precision rate, recall rate, and F1-score are used as comprehensive evaluation metrics for fault recognition results. Table IV shows the evaluation index comparing the diagnostic results of the traditional CNN-Softmax model and the improved CNN-SVM model on the testing dataset.

TABLE IV.

Traditional CNN and improved CNN-SVM model diagnostic result evaluation index.

Precision ofRecall ofF1-score of
Number of faulttraditional CNNPrecision of CNN-SVMtraditional CNNRecall of CNN-SVMtraditional CNNF1-score of CNN-SVM
0.905 0.933 0.893 0.933 0.899 0.933 
0.917 0.971 0.892 0.919 0.904 0.944 
0.969 0.914 0.912 0.941 0.940 0.927 
0.838 0.882 0.939 0.909 0.886 0.895 
Average 0.907 0.925 0.909 0.926 0.907 0.925 
Precision ofRecall ofF1-score of
Number of faulttraditional CNNPrecision of CNN-SVMtraditional CNNRecall of CNN-SVMtraditional CNNF1-score of CNN-SVM
0.905 0.933 0.893 0.933 0.899 0.933 
0.917 0.971 0.892 0.919 0.904 0.944 
0.969 0.914 0.912 0.941 0.940 0.927 
0.838 0.882 0.939 0.909 0.886 0.895 
Average 0.907 0.925 0.909 0.926 0.907 0.925 

As can be seen from Table IV, the average precision, average recall, and average F1-score of the improved 1DCNN-SVM model are higher than those of the traditional 1DCNN model. In addition, the total duration of the traditional 1DCNN and the improved 1DCNN-SVM was 51.3908 and 32.2838 s, respectively, and the diagnosis time was reduced by 37.1%. Although SVM was used for classification in this paper, the diagnosis time did not increase. The improved 1DCNN-SVM model has higher accuracy, faster diagnosis speed, saving time cost, and good robustness.

In this paper, a traditional 1DCNN-Softmax model test and a 1DCNN-SVM model test are performed with the same dataset. In addition, the accuracy curve and loss curve are used to compare the diagnosis results of the two methods.

FIG. 5.

Accuracy comparison curve.

FIG. 5.

Accuracy comparison curve.

Close modal
FIG. 6.

Loss rate comparison curve.

FIG. 6.

Loss rate comparison curve.

Close modal

As seen in Figs. 5 and 6, in terms of accuracy, after replacing the last Softmax layer of CNN with the SVM layer in the method proposed in this paper, the accuracy of the traditional 1DCNN-Softmax model reaches 90.50%, while the accuracy of improved CNN-SVM in this design is improved to 92.7%. The accuracy of the improved 1DCNN-SVM model rises faster than the conventional model. In terms of loss rate, the traditional 1DCNN-Softmax model has a slower decrease in the loss rate, a higher loss rate, and fluctuations. The experiments show that the improved 1DCNN-SVM proposed in this paper improves the accuracy and has better diagnostic performance.

In trying to make the accuracy of the model improved, the adjustment of the hyperparameters of the convolutional neural network leads to much time-consumption and may not achieve the expected results, and the parameter definition still depends on the manual experience. In this paper, the 1DCNN-SVM network model is optimized using a Bayesian optimization method. The maximum number of iterations is set to 30, and three variables are designated for optimization to find the best feasible point observed. The accuracy of BO-1DCNN-SVM, as proposed in this paper, will be compared with BO-1DCNN-Softmax, as well as the accuracy of the 1DCNN-SVM model tuned by GS and PSO algorithms, respectively. The model accuracies for BO-1DCNN-Softmax, GS, PSO, and BO-1DCNN-SVM are 92.73%, 89.9%, 91.1%, and 96.64%, respectively. The method proposed in this paper achieves the highest accuracy. Furthermore, a comparison of computation times with BO-1DCNN-Softmax indicates that the proposed method is more time-efficient. The constraint ranges and optimization search results for the four methods in optimizing hyperparameters are presented in Table V.

TABLE V.

Hyperparameter ranges and optimization search results for four methods.

HyperparametersMinimum valueMaximum valueBO-1DCNN-softmax search resultsGS-1DCNN-SVM search resultsPSO-1DCNN-SVM search resultsBO-1DCNN-SVM search results
Batch size 40 200 146 200 125 110 
Learning rate 10 × 10−3 0.010 637 0.014 543 0.002 154 
Regularization 10 × 10−10 10 × 10−2 4.8532 × 10−8 0.01 5.2746 × 10−4 4.4895 × 10−5 
coefficient L2       
HyperparametersMinimum valueMaximum valueBO-1DCNN-softmax search resultsGS-1DCNN-SVM search resultsPSO-1DCNN-SVM search resultsBO-1DCNN-SVM search results
Batch size 40 200 146 200 125 110 
Learning rate 10 × 10−3 0.010 637 0.014 543 0.002 154 
Regularization 10 × 10−10 10 × 10−2 4.8532 × 10−8 0.01 5.2746 × 10−4 4.4895 × 10−5 
coefficient L2       

The optimal combination of hyperparameters obtained by Bayesian optimization can be seen in Table V: the batch size is 110, the learning rate is 0.002 154 1, and the regularization factor is 4.4895 × 10−5. In this process, as the number of iterations advances, the comparison of the observed minimum objective function with the predicted minimum objective function is shown in Fig. 7. At the 29th iteration, the two curves converge, indicating that the next computation is no longer able to improve the objective function and maximize the acquisition function (maximize the acquisition function) that is close to 0; then, the global optimum is found.

FIG. 7.

Min objective vs number of function evaluations.

FIG. 7.

Min objective vs number of function evaluations.

Close modal

The accuracy of the BO-1DCNN-SVM fault diagnosis model on the training and validation sets is shown in Fig. 8. From Fig. 8, it can be seen that the fault diagnosis accuracy of the model has reached more than 80% in both the training set and the validation set after 200 training cycles, and the accuracy of the model in the subsequent training is also showing an increasing trend. Finally, the model stabilized after 600 iterations and the validation accuracy of the model was 98.89%. It indicates that the model has high accuracy and good robustness.

FIG. 8.

The accuracy of BO-1DCNN-SVM.

FIG. 8.

The accuracy of BO-1DCNN-SVM.

Close modal

In this paper, we introduce the confusion matrix of the test set, illustrated in Fig. 9. This matrix provides a comprehensive representation of information such as the types of faults that were misclassified and the accuracy rate.

FIG. 9.

BO-1DCNN-SVM confusion matrix.

FIG. 9.

BO-1DCNN-SVM confusion matrix.

Close modal

In Fig. 9, the vertical coordinates indicate the actual fault labels and the horizontal coordinates indicate the predicted fault labels. There are four different operating conditions. There are 71 samples for the first type of the normal operating condition, 40 samples for the second type of the motor overheating fault, 35 samples for the third type of the insufficient fluid supply fault, and 33 samples for the third type of the ESP partial shaft breakage fault. The data on the diagonal line are the correct number of each type of fault diagnosis. As seen from Fig. 9, three faults in the second category were diagnosed as normal working conditions: one fault in the third category was diagnosed as normal working conditions and two faults in the fourth category were diagnosed as normal working conditions. The accuracy rate of diagnosis reached 96.64%, which has good diagnostic performance. In Fig. 10, the number of correct diagnoses is 166, resulting in an accuracy of 92.73%. Notably, the total elapsed time for BO-1DCNN-SVM is 885.0444 s, while for BO-1DCNN-Softmax, it is 1196.5453 s. This signifies that the diagnosis time with BO-1DCNN-SVM is ∼26.03% shorter, making it a faster and more time-efficient option.

FIG. 10.

BO-1DCNN-Softmax confusion matrix.

FIG. 10.

BO-1DCNN-Softmax confusion matrix.

Close modal

Figures 11 and 12 show the confusion matrices of the network optimized with the improved 1DCNN-SVM using grid search and particle swarm algorithms, respectively.

FIG. 11.

GS-1DCNN-SVM confusion matrix.

FIG. 11.

GS-1DCNN-SVM confusion matrix.

Close modal
FIG. 12.

PSO-1DCNN-SVM confusion matrix.

FIG. 12.

PSO-1DCNN-SVM confusion matrix.

Close modal

As evident from the diagonal, both GS-1DCNN-SVM and PSO-1DCNN-SVM correctly predicted 161 and 163 classifications out of 179 sets of test data, respectively. Among these four optimization models, the Bayesian optimization 1DCNN-SVM model proposed in this paper achieved the highest accuracy, yielding the best results.

In order to evaluate the performance of the proposed model, some deep learning models are compared with the proposed model in this paper with the same amount of data based on the fault diagnosis accuracy as the evaluation criteria. The experimental results are shown in Fig. 13.

FIG. 13.

Accuracy of fault diagnosis results for different wells in different models.

FIG. 13.

Accuracy of fault diagnosis results for different wells in different models.

Close modal

Classical deep learning models, such as fully connected neural networks (FCNNs), Long Short-Term Memory (LSTM) networks, and CNN, are compared with SVM, 1DCNN-SVM, BO-SVM, and the proposed method in this paper. As shown in Fig. 13, SVM and BO-SVM exhibit lower accuracy on the test set, averaging only 62.15% and 63.96%, respectively. BPNN, LSTM, and CNN achieve average accuracies of 80.98%, 87.01%, and 90.78% on the test set, respectively. 1DCNN-SVM attains an average accuracy of 92.18% on the test set. However, the BO-1DCNN-SVM method proposed in this paper achieves the highest model accuracy, surpassing 95% on the test set. This results in improved fault diagnosis with reduced misclassification and enhanced stability, making it more suitable for the faulty systems of submerged oil electric pumps on offshore platforms.

In this paper, a novel ESP defect prediction approach based on BO-1DCNN-SVM is proposed. First, the original data are preprocessed using the linear interpolation approach to address the issue of missing and invalid data in the dataset. The CNN-SVM model, which combines the benefits of CNN’s automatic feature extraction capability with the benefits of SVM’s prediction in a few sample data settings, is created by substituting an SVM classifier for the conventional 1DCNN Softmax layer. The Bayesian optimization approach is then used to optimize CNN-SVM hyperparameters, increasing the model’s predictive accuracy. The experimental findings demonstrate that the suggested method can accurately identify the failure state of electric systems since the average prediction accuracy of the BO-1DCNN-SVM model is greater than 95%, and the diagnosis time is less, saving the time cost.

This work was supported by the National Key R&D Program of China (Grant No. 2019YFC0312303-05).

The authors have no conflicts to disclose.

The data that support the findings of this study are available from the corresponding author upon reasonable request.

1.
X. P.
Cheng
,
D. D.
Xue
,
S. L.
Qin
, and
Z.
Cheng
, “
Oil production technology of double electric submersible pump in deep water oilfield
,”
China Pet. Mach.
1
,
64
68
(
2015
).
2.
X. F.
Sui
,
L.
Peng
,
G. Q.
Han
,
B. T.
Fan
, and
J. F.
Yu
, “
Fault diagnosis of electric submerged-pump based on principal component analysis
,”
J. Southwest Pet. Univ. (Nat. Sci. Ed.)
42
(
06
),
107
114
(
2020
).
3.
S.
Gupta
,
L.
Saputelli
, and
M.
Nikolaou
, “
Big data analytics workflow to safeguard ESP operations in real time
,” in
SPE North America Artificial Lift Conference and Exhibition
,
2016
.
4.
P.
Zhang
,
T.
Chen
,
G.
Wang
, and
C.
Peng
, “
Ocean economy and fault diagnosis of electric submersible pump applied in floating platform
,”
Int. J. E-Navigation Marit. Econ.
6
,
37
(
2017
).
5.
F. J.
Zhao
, “
Research status and development trend of electric submersible pump fault diagnosis technology
,”
China Pet. Mach.
1
,
80
84
(
2011
).
6.
G. Q.
Han
,
M.
Chen
, and
H.
Zhang
, “
Real-time monitoring and diagnosis of electrical submersible pump
,” in
SPE Annual Technical Conference and Exhibition
,
2015
.
7.
J. P.
Wu
,
Y. X.
Wu
, and
Z. L.
Long
, “
Design and implementation of fault diagnosis expert system based on fuzzy mathematics
,”
Packag. Eng.
2
,
49
51
(
2003
).
8.
J. L.
Zhang
and
G. Z.
Cui
,
Safety System Engineering
(
China Coal Industry Publishing House
,
2019
).
9.
Y.
Mo
,
H.
Liu
, and
X.
Yang
, “
Efficient fault tree analysis of complex fault tolerant multiple-phased systems
,”
Tsinghua Sci. Technol.
12
,
122
127
(
2007
).
10.
C. L.
Wen
and
F. Y.
Lv
, “
Fault diagnosis methods based on deep learning
,”
J. Electron. Inf. Technol.
42
(
01
),
234
248
(
2020
).
11.
Z. Q.
Chen
,
X. D.
Chen
,
J. V. D.
Olivira
, and
C.
Li
, “
Application of deep learning to device failure prediction and health management
,”
Chin. J. Sci. Instrum.
40
(
09
),
206
226
(
2019
).
12.
W. T.
Rauber
,
F. M.
Varejao
, and
F.
Fabris
, “
Automatic diagnosis of submersible motor pump conditions in offshore oil exploration
,” in
IECON 2013–39th Annual Conference of the IEEE Industrial Electronics Society
(
IEEE
,
2013
).
13.
U. C. R.
Carlos
,
L. S.
Alberto
,
L. B.
Jorge
, and
A. S.
Natache
, “
Artificial neural network based identification of the gas volume fraction in an electri-cal submersible pump
,” in
Proceedings of the 6th IASTED International Conference on Modelling, Simulation and Identification, MSI
,
2016
.
14.
M.
Barrios Castellanos
,
A. L.
Serpa
,
J. L.
Biazussi
,
W.
Monte Verde
, and
N.
do Socorro Dias Arrifano Sassim
, “
Fault identification using a chain of decision trees in an electrical submersible pump operating in a liquid-gas flow
,”
J. Pet. Sci. Eng.
184
,
106490
(
2020
).
15.
P. H.
Yang
,
J. R.
Chen
,
L. H.
Wu
, and
S.
Li
, “
Fault identification of electric submersible pumps based on unsupervised and multi-source transfer learning integration
,”
Sustainability
14
,
9870
(
2022
).
16.
J. R.
Chen
,
W.
Li
,
P. H.
Yang
,
B. Q.
Chen
, and
S.
Li
, “
Prediction and classification of faults in electric submersible pumps
,”
AIP Adv.
12
(
4
),
045215
(
2022
).
17.
J. Z.
Yang
,
S.
Wang
,
C. F.
Zheng
, and
G.
Feng
,
G.
Du
,
C.
Tan
, and
D.
Ma
, “
Fault diagnosis method and application of ESP well based on SPC rules and real-time data fusion
,”
Math. Probl. Eng.
2022
,
8497299
.
18.
J. L.
Yang
,
W.
Li
,
J. R.
Chen
, and
L.
Sheng
, “
Fault diagnosis of electric submersible pump tubing string leakage
,”
E3S Web Conf.
245
,
01042
(
2021
).
19.
L.
Peng
,
G.
Han
,
A.
Landjobo Pagou
, and
J.
Shu
, “
Electric submersible pump broken shaft fault diagnosis based on principal component analysis
,”
J. Pet. Sci. Eng.
191
,
107154
(
2020
).
20.
Y.
Yang
,
L.
Ding
, and
H. Y.
Zhang
, “
Bearing fault diagnosis with improved one-dimensional convolutional neural network and bidirectional gated cycle unit
,”
Mech. Sci. Technol.
42
,
538
545
(
2023
).
21.
Y. F.
Dong
,
Y. H.
Sun
,
L. C.
Ga
, and
P.
Han
, “
Fault diagnosis method based on improved one-dimensional convolutional and bidirectional long short-term memory neural networks
,”
Comput. Appl.
42
,
1207
1215
(
2022
).
22.
X. X.
Niu
and
C. Y.
Suen
, “
A novel hybrid CNN–SVM classifier for recognizing handwritten digits
,”
Pattern Recognit.
45
,
1318
1325
(
2012
).
23.
A.
Widodo
,
E. Y.
Kim
,
J. D.
Son
,
B. S.
Yang
,
A. C.
Tan
,
D. S.
Gu
,
B. K.
Choi
, and
J.
Mathew
, “
Fault diagnosis of low speed bearing based on relevance vector machine and support vector machine
,”
Expert Syst. Appl.
36
,
7252
7261
(
2009
).
24.
A.
Widodo
and
B. S.
Yang
, “
Support vector machine in machine condition monitoring and fault diagnosis
,”
Mech. Syst. Signal Process.
21
,
2560
2574
(
2007
).
25.
F.
Nayyeri
,
L.
Hou
,
J.
Zhou
, and
H.
Guan
, “
Foreground–background separation technique for crack detection
,”
Comput.- Aided Civil Infrastruct. Eng.
34
(
6
),
457
470
(
2019
).
26.
M. A.
Bhouri
, “
Model-order-reduction approach for structural health monitoring of large deployed structures with localized operational excitations
,” in
ASME 2021 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference
,
2021
.
27.
Y. S.
Chen
,
W. Y.
Xia
,
D. Y.
Chen
, and
T. Y.
Zhang
, “
A mixed gas composition identification method based on sample augmentatio
,” in
IEEE Technology Conference
(
IEEE
,
2022
).
28.
Y. Y.
Sun
,
T. T.
Zhao
,
Z. H.
Zou
,
Y. S.
Chen
, and
H.
Zhang
, “
Imbalanced data fault diagnosis of hydrogen sensors using deep convolutional generative adversarial network with convolutional neural network
,”
Rev. Sci. Instrum.
92
(
9
),
095007
(
2021
).
29.
Z.
Ding
,
W.
Zhang
, and
D.
Zhu
, “
Neural-network based wind pressure prediction for low-rise buildings with genetic algorithm and Bayesian optimization
,”
Eng. Struct.
260
,
114203
(
2022
).
30.
L. M.
de Campos
,
J. M.
Fernández-Luna
,
J. A.
Gámez
, and
J. M.
Puerta
, “
Ant colony optimization for learning Bayesian networks
,”
Int. J. Approximate Reasoning
31
,
291
311
(
2002
).
31.
J.
Snoek
,
O.
Rippel
, and
K.
Swersky
, “
Scalable Bayesian optimization using deep neural networks
,” in
Proceedings of the 32nd International Conference on Machine Learning
, Lille, France, Vol. 37 (
2015
); arXiv:1502.0570.