Cancer diagnostics is an important field of cancer recovery and survival with many expensive procedures needed to administer the correct treatment. Machine Learning (ML) approaches can help with the diagnostic prediction from circulating tumor cells in liquid biopsy or from a primary tumor in solid biopsy. After predicting the metastatic potential from a deep learning model, doctors in a clinical setting can administer a safe and correct treatment for a specific patient. This paper investigates the use of deep convolutional neural networks for predicting a specific cancer cell line as a tool for label free identification. Specifically, deep learning strategies for weight initialization and performance metrics are described, with transfer learning and the accuracy metric utilized in this work. The equipment used for prediction involves brightfield microscopy without the use of chemical labels, advanced instruments, or time-consuming biological techniques, giving an advantage over current diagnostic methods. In the procedure, three different binary datasets of well-known cancer cell lines were collected, each having a difference in metastatic potential. Two different classification models were adopted (EfficientNetV2 and ResNet-50) with the analysis given for each stage in the ML architecture. The training results for each model and dataset are provided and systematically compared. We found that the test set accuracy showed favorable performance for both ML models with EfficientNetV2 accuracy reaching up to 99%. These test results allowed EfficientNetV2 to outperform ResNet-50 at an average percent increase of 3.5% for each dataset. The high accuracy obtained from the predictions demonstrates that the system can be retrained on a large-scale clinical dataset.

Cancer is one of the leading causes of premature mortality worldwide.1,2 Progression of cancer is characterized by uncontrollable division of cells, which enables cell immortality, angiogenesis formation, sustained proliferative signaling, and circulating tumor cell (CTC) invasion and metastasis.3,4 Tumor heterogeneity presents a huge challenge in understanding the disease at the molecular level. Numerous studies available suggest that intratumoral heterogeneity is responsible for the evolution of cancers and stimulations of therapeutic resistance.5,6 In the initial stages of cancer, the tumor cells that have entered the blood stream have epithelial markers, such as cytokeratin and epithelial cell adhesion molecule (EpCAM). However, as the disease progresses, these cells may lose their epithelial characteristics and gain mesenchymal surface proteins, such as N-cadherin and vimentin. This process, known as the epithelial-to-mesenchymal transition (EMT), is believed to play an important role in motility of cancer.7–9 

There are numerous methods available for isolation and enumeration of CTCs from normal blood cells, with CellSearch® as the only U.S. Food and Drug Administration (FDA) approved system for isolation in a clinical setting.10,11 The system has proved to be reliable for the detection of CTCs in the blood of patients suffering from metastatic breast cancer.12 However, only detection and isolation of CTCs are not sufficient to take informed decisions about treatment. It is crucial to know how invasive a cancer is for doctors in a clinical setting to issue a correct and safe treatment for the patient by identifying the subtype of the captured CTCs. Hence, it is necessary to monitor the real-time protein expression of CTCs to come up with personalized medicines based on an individual’s CTC profile.13 

Large volumes of molecular datasets are required to construct a precise molecular profile of a particular patient with a specific cancer type.14,15 Machine learning (ML) approaches offer promise to help oncologists assess metastatic risk and relapse time of cancer by analyzing such large-scale datasets for better prediction of a disease profile.16,17 Furthermore, other approaches for cancer subtype identification involve convolutional neural networks (CNNs), one of the most established ML paradigms that process grid patterned datasets, such as images.18 In particular, Toratani et al. used a well-known CNN, VGG-16, to distinguish between phase-contrast microscopic images of squamous cell carcinoma and cervical carcinoma cells.19 The researchers did a thorough exploration with t-distributed stochastic neighbor embedding (t-sne) and received a model accuracy of 96%. In another investigation in Ref. 20, Ahmed et al. used a handcrafted CNN to classify acute and chronic Leukemia subtypes from microscopic images of blood cells, also distinguishing between lymphoid and myeloid subtypes. Additionally, studies reported by Wang et al. and Moallem et al. have applied image analysis and ML approaches for detecting CTCs from blood cells.21,22 Recent works on cancer cell identification using ML are summarized in Table S1. Furthermore, numerous recent studies suggest that the EMT occurs simultaneously with solid tumor progression, mobility, invasiveness, and metastatic activity.8,23,24 More recently, Kashkooli et al. have demonstrated that intratumoral heterogeneity causes poor transport of therapeutic and diagnostic agents into the tumor tissue, emphasizing the need for a detection system between these heterogeneous cells.25,26 While these previous works above distinguish between types of a specific cancer that provides influential studies leading to important findings, they did not attempt to classify the EMT, an important mechanism for cancer diagnosis. In our work, we will close this experimentation gap and use ML to directly apply this diagnostic mechanism by classifying cultured cell lines with lower and higher EpCAM expressions.

With the rapid development of CNNs, many different architectures have been reported, including EfficientNet, an influential model that delivers impressive performance due to compound scaling granting a uniformly scaled model.27 Recently, the founders of EfficientNet have made an update using clever ML techniques, such as the new Fused-MBConv operation, that resulted even better performance, named EfficientNetV2.28,29 Additionally, the ResNet-50 model has granted a smooth training process and high-performance with previous cell datasets. Given the advantages of EfficientNetV2, in our work, we used this architecture and compared with another influential classification model, i.e., ResNet-50, for CTC classification. Specifically, we use these classifiers to compare three different binary datasets: PC3 and DU145 (DU145 vs PC3), LnCAP and PC3 (LnCAP vs PC3), and normal and drug resistant SKOV3 (nSKOV3 vs drSKOV3) for their metastatic potential based on microscopic images collected at 50× magnification.30 Each dataset is divided into two classes based on their EpCAM expression level (DU145 vs PC3 and LnCAP vs PC3) or cancer drug resistance (nSKOV3 vs drSKOV3). The images resemble a dataset having different levels of metastatic potential, e.g., a different level of EpCAM expression on the cell surface, indicating an important classification problem for clinical applications. As bright field microscopic images are the only necessary component for data collection, a procedural setup consisting of a camera mount takes a fraction of the time compared to other biological techniques, such as advanced heterogenous cell capture from microfluidic devices. On the contrary, this computer vision and microscopy associated approach leads to complications with variations in images from different laboratories. These differences in contrast and brightness can lead to trained models that perform well with one dataset but not a dataset taken with a different microscope. However, image augmentation as a prominent machine learning technique to increase accuracy and performance forces a trained model to be more generalizable to data not in the training set, i.e., real world images. A well-augmented dataset will relieve these variations from different microscopes, and therefore, we apply this technique in our own work. Experimental results show that our developed CNN model can yield high classification accuracy for all datasets, revealing the potential of deep CNNs in cancer diagnosis and prediction in clinical patients.

This paper is structured as follows: Sec. II presents the methodology followed for the collection of microscopic images and model training with a review on weight initialization and classification performance metrics. We provide a comprehensive study on ML architectures utilized in the study with an analysis of various layers used in the model. Additionally, results obtained from the three datasets and two CNN architectures are displayed for a thorough comparison between the two models. Section III gives the final remarks and a discussion on future strategies we plan to explore related to this area of research. Finally, Sec. IV details the experimental methods and materials used for cell culture and image analysis.

To illustrate the potential of deep learning for tumor analysis based on image classification, we used typical cancer cell lines to resemble a clinical scenario with a more or less invasive cancer cell (two cancer subtypes). For example, these two subtypes can be viewed as two cancer patients with the same branch of cancer, but one patient has tumor cells that are metastatic and more inherently migrant. The experimental procedure in Fig. 1 describes the workflow for the cell classification that consists of four independent steps. The data are first collected in step 1, i.e., the different cell lines are cultured, trypsinized, and imaged at 50× magnification to create dataset large enough for a deep learning task. In step 2, the data are preprocessed by cropping the images to a pixel size of 224 × 224 × 3 for the height, width, and number of channels, respectively. After preprocessing, the deep CNN is developed and trained with the image data. The weights are determined through gradient descent, and the output of the final sigmoid layer presents the probability of classifying a cell image into a specific class. Step 3 is to evaluate the classification performance of our trained CNN model using standard evaluation metrics. Specifically, we use accuracy, precision, and recall to measure the performance of the trained CNN model. Finally, step 4 represents the cancer identification tool to be applied in a future clinical study. A large-scale clinical study is therefore advantageous with our developed tools to determine the severity of tumors for diagnosed cancer patients from a liquid or solid biopsy. For instance, liquid biopsies from cancer patient’s blood can serve as a training set for CTC classification with high and low EpCAM expression. This would uncover the capability of deep CNNs as CTCs are directly related to the invasiveness of a tumor and the potential to spread to secondary organs.

FIG. 1.

Workflow for the cell classification project. The steps include gathering datasets, training the model, evaluating the performance, and implementation of a final progression tool for the invasiveness of the cancer.

FIG. 1.

Workflow for the cell classification project. The steps include gathering datasets, training the model, evaluating the performance, and implementation of a final progression tool for the invasiveness of the cancer.

Close modal

Image datasets were collected by gathering a small sample of cells and placing the cancer cells on a transparent platform under magnification at 50× resolution. Each cell batch (≈500 µl) was used to gather around 50 images and then discarded to ensure that microscope images did not have cell duplication. After several batches of cells were imaged, there were many unprocessed gray-scale images captured for that experiment. Images were captured over multiple subculture passages to construct a generalized training set. A typical microscopic image of PC3 cells taken at 50× resolution is shown in Fig. 2(a). To create a training set of images for single cell classification, the individual cells are cropped from the unprocessed image at 50× magnification and resized to a dimension of 224 × 224 × 3 as the height, width, and number of channels, respectively. The inset in Fig. 2(a) shows this cropping and resizing with the individual cells depicted on the bottom.

FIG. 2.

Data collection procedure (a), class representation (b), and live–dead staining (c) for all datasets accumulated. The original microscopic image of PC3 cells taken at 50× magnification is on top, while the subsequently cropped and resized (224 × 224 × 3) cell is on the bottom of the data collection procedure. Representative images of each class of the three datasets are shown in (b), including DU145vsPC3 (top), LnCapvsPC3 (middle), and SKOV3drvsn (bottom). Overlayed images of the live–dead cell staining with 4′,6-diamidino-2-phenylindole (DAPI) in blue and propidium iodide in red are displayed in (c). The white scale bar for all cell types is shown in nSKOV3 at 20 µm.

FIG. 2.

Data collection procedure (a), class representation (b), and live–dead staining (c) for all datasets accumulated. The original microscopic image of PC3 cells taken at 50× magnification is on top, while the subsequently cropped and resized (224 × 224 × 3) cell is on the bottom of the data collection procedure. Representative images of each class of the three datasets are shown in (b), including DU145vsPC3 (top), LnCapvsPC3 (middle), and SKOV3drvsn (bottom). Overlayed images of the live–dead cell staining with 4′,6-diamidino-2-phenylindole (DAPI) in blue and propidium iodide in red are displayed in (c). The white scale bar for all cell types is shown in nSKOV3 at 20 µm.

Close modal

In this work, we started with a dataset containing DU145 and PC3 as classes one and two, respectively. PC3 represents a more invasive cancer with a lower EpCAM expression level and DU145 with a higher EpCAM expression. Two representative training set images of these two classes are illustrated in Figs. 2(bi) and 2(bii) for DU145 and PC3 cells, respectively. The difference between these two classes cannot be distinguished from the naked eye as the cell sizes are not shown and they have similar shapes. To explore the potential of deep CNNs in subtype classification, we further introduced two other cancer cell subtypes: (1) LnCAP, another high EpCAM expressed as CTC, and (2) SKOV3, an ovarian cancer cell line. The PC3 cells have a lower EpCAM expression level than LnCAP that represents a more metastatic cancer. Furthermore, the SKOV3 cells were modified to resist anti-cancer drugs, representing a late-stage cancer in which drugs are no longer effective in killing the metastatic CTCs. Training examples for two other binary classification models are displayed in Figs. 2(biii)2(bvi) comprising of the four classes of LnCAP, PC3, drug-resistant SKOV3 (drSKOV3), and normal SKOV3 (nSKOV3). These three classification datasets represent two cancer cells at a different stage in the EMT (epithelial to mesenchymal transition) cycle. To confirm that the subtype of the cells was preserved and the cells were still viable for imaging experiments, a live–dead cell stain was performed for all five cell types. The live–dead cell stain in Fig. 2(c) displays the overlayed 4′,6-diamidino-2-phenylindole (DAPI) and propidium iodide stain for different cancer cells. Here, the drSKOV3 image shows only one cell having a red stain, resembling that propidium iodide can penetrate the phospholipid bilayer of this cell. All other cells in the five images only show the DAPI stain from the nucleus, resembling that propidium iodide did not penetrate and the cells were viable for data collection.

Random augmentations were applied to the original dataset to construct a generalizable model that performs well in all imaging conditions. A series of random augmentations were applied to each image, including a brightness adjustment with multiplication factor 0.15, a contrast adjustment with contrast factor between 0.5 and 1.5, and a left–right flip with a 50% chance. These augmentations were then concatenated with the original dataset to give a total of two training examples per cropped cell image. Since the image classes cannot be determined by the naked eye, we decided that a larger dataset would be necessary to provide better performance for binary classification. A detailed summary of the three datasets after augmentation is described in Table I where the total number of classes and images is shown in rows and columns, respectively. The DU145 and PC3 dataset on the top (DU145vsPC3) includes the largest number of images with a total of 10 471 images for training, validation, and testing. The LnCAP and PC3 dataset in the middle (LnCAPvsPC3) includes a total of 3077 images, while drSKOV3 and nSKOV3 at the bottom (SKOV3drvsn) include 4365 images.

TABLE I.

Model dataset summary before augmentation.

Label/setTotal countTrainingValidationTest
DU145vsPC3 
DU145 5 292 4211 546 535 
PC3 5 179 4165 501 513 
Combined 10 471 8376 1047 1048 
LnCAPvsPC3 
LnCAP 1507 1222 141 144 
PC3 1507 1239 166 165 
Combined 3077 2461 307 309 
SKOV3drvsn 
SKOV3dr 2130 1723 218 189 
SKOV3n 2235 1768 219 248 
Combined 4365 3491 437 437 
Label/setTotal countTrainingValidationTest
DU145vsPC3 
DU145 5 292 4211 546 535 
PC3 5 179 4165 501 513 
Combined 10 471 8376 1047 1048 
LnCAPvsPC3 
LnCAP 1507 1222 141 144 
PC3 1507 1239 166 165 
Combined 3077 2461 307 309 
SKOV3drvsn 
SKOV3dr 2130 1723 218 189 
SKOV3n 2235 1768 219 248 
Combined 4365 3491 437 437 

The accuracy and speed of model training are often dependent on the initial values of trainable parameters.31,32 A well-known strategy for initialization where all the weights are set to zero before training, known as zero initialization, undesirably forces the gradients to be symmetric. This complication causes each activation function to require a specific initialization distribution for gradient descent to travel smoothly to the global minimum.

Therefore, with standard neutral network activation functions, the common approach for weight initialization is Xavier or He initialization. Determined from both empirical and theoretical research, Xavier initialization is used for layers that contain a sigmoid or tanh activation function. It is defined as a random number in a uniform distribution,
WXav=Umin,max=U1/n,1/n,
(1)
where W is the weight of the node, U is a uniform probability distribution, and n is the number of inputs to the node.33 Typical deep learning activation functions, e.g., the rectified linear unit (ReLU) or the sigmoid linear unit (SiLU), often have issues with Xavier initialization due to vanishing gradients.34,35 Achieved by sampling from a normal distribution, He initialization is used for layers that contain these activation functions,
WHe=Gμ,σ=G0,2/n,
(2)
where μ is the mean, σ is the standard deviation, and G is the gaussian probability distribution.35 When training a deep learning model for the first time, the architects will use He or Xavier initialization depending on the type of nonlinear functions used in the network. Most layers in ResNet-50 and EfficientNetV2 have ReLU or SiLU nonlinear functions; therefore, the developers (Google and Microsoft) utilized He initialization in these layers for the correct initialization strategy.

A simple and more efficient approach of weight initialization when training a model after the first initial training is known as transfer learning. Transfer learning is the process of solving a downstream task by reusing the weights from a pretrained model typically from when the model was created.31 The main benefit of this approach is that it facilitates a powerful training process that results in a more effective and accurate model, enabling developers to circumvent the need for large amounts of data. The pretrained model is typically trained with a much larger dataset, e.g., the ImageNet dataset for object recognition tasks containing either 1000 or 21000 object classes.36 Since the release in 2009, ImageNet has become a common benchmark dataset used for classification tasks. We employed transfer learning in this work by initializing the weights of the ResNet-50 and EfficientNetV2 models with the shared pre-trained ImageNet weights allowing for more efficient and comprehensive training.37,38

The most critical metric for image classification tasks is accuracy, defined as the number of true predictions over the number of false predictions. However, precision and recall, two additional metrics associated with the confusion matrix, are also useful. Precision can be defined as the proportion of correct positive identifications and can be calculated by taking the ratio of true positives over the total number of predictions,
precision=TPTP+FP,
(3)
where TP and FP are the number of true positive and false positive predictions, respectively. On the other hand, recall is the proportion of actual positive examples that the model predicted correctly,
recall=TPTP+FN,
(4)
where FN is the number of false negative predictions.39 Precision and recall are two important model evaluation metrics. While precision refers to the percentage of results, which are relevant, recall refers to the percentage of total relevant results correctly classified by your algorithm. For this reason, we used accuracy, precision, and recall in our evaluation to determine model performance.

We employed two deep CNNs, specifically ResNet-50 and EfficientNetV2 in this work.28,38 The two model architectures were trained and tested with the TensorFlow library on all three datasets collected (DU145vsPC3, LnCAPvsPC3, and SKOV3drvsn).37 All operations were performed on an NVIDIA Tesla P100 or T4 GPU offered by Google Colaboratory Pro platform. A TensorFlow checkpoint was employed by monitoring the validation accuracy at the end of each epoch to save the best model weights after training completed. Both ResNet-50 and EfficientNetV2 model architectures were trained for 150 epochs, which take ∼9 and 5 h, respectively. ResNet-50 training resulted in the best weights saved at epoch 135, 147, and 90 for DU145vsPC3, LnCAPvsPC3, and SKOV3drvsn datasets, respectively. EfficientNetV2 training resulted in the best weights saved at epoch 69, 23, and 122 for DU145vsPC3, LnCAPvsPC3, and SKOV3drvsn datasets, respectively.

The ResNet-50 model, a popular CNN architecture released by Microsoft in 2015, obtained favorable performance on the ImageNet dataset due to the introduction of clever shortcut connections.38,40 A comprehensive network structure of the feature extractor and binary classifier are displayed in Fig. 3(a). The feature extraction on the top includes five separate stages identifying key features in an image tensor, whereas the binary classification step at the bottom provides a final prediction with the sigmoid layer (S(x)). For the last four stages of the feature extractor (stages 2–5), each offstage contains a convolutional block and several identity blocks demonstrated in Fig. S1.38,41 Each convolution and identity block, illustrated below the stages, includes one skip connection with three convolutional and batch normalization layers. With the vanishing and exploding gradient problem stabilized from skip connections, the model performs better with a larger number of layers, granting learned features, and better final accuracy. After the higher-level features are determined in the ResNet-50 feature extractor, the image tensor is flattened before entering the final fully connected (FC) and dropout (Drop) layers at the bottom of Fig. 3(a). The global average pooling (AVG POOL) layer is referred to as flattening since the output size of the feature extractor (7 × 7 × 2048) gets flattened to a 2048 one-dimensional array by averaging the first two dimensions. Likewise, the FC and dropout layers are referred to as FC since the output size of the flattened layer (2048 nodes) decreases to a one-dimensional array of size 512 nodes. The last FC layer with a sigmoid activation gives the final prediction of a CTC with high or low invasiveness.

FIG. 3.

ResNet-50 model architecture (a) and training results from the three CTC classification models (b). The architecture illustrates a forward pass through the ResNet-50 feature extractor on the top on and the binary classifier on the bottom. The training results portray the model loss on the left and the accuracy obtained on the right. Each row consists of a CTC dataset, including DU145vsPC3, LnCAPvsPC3, and SKOV3drvsn. The blue color represents the training set, and the orange color represents the validation set.

FIG. 3.

ResNet-50 model architecture (a) and training results from the three CTC classification models (b). The architecture illustrates a forward pass through the ResNet-50 feature extractor on the top on and the binary classifier on the bottom. The training results portray the model loss on the left and the accuracy obtained on the right. Each row consists of a CTC dataset, including DU145vsPC3, LnCAPvsPC3, and SKOV3drvsn. The blue color represents the training set, and the orange color represents the validation set.

Close modal

The training results in Fig. 3(b) illustrate the model accuracy (left) and loss (right) for each dataset and epoch during training. The validation set in orange seems to fluctuate excessively, while the training set in blue has a general monotonic upward or downward trend. However, with a TensorFlow checkpoint established, we were able to determine the most performant epoch by monitoring a specific metric during training. Therefore, we only kept the model weights with the highest validation accuracy as these images were not used in the training, resulting in an improved test set accuracy.

The second model we constructed, EfficientNetV2, is the state of the art for classification tasks released by Google Brain in 2021 utilizing a technique called neural architecture search (NAS).28,42,43 NAS is an influential form of architecture engineering that searches for the best model by sampling different ML architectures and calculating a reward that is subsequently fed back to a controller to find the architecture with the highest reward. EfficientNetV2, derived from EfficientNetV1, uses this technique to optimize the training speed and has included a Fused-MBConv operation in the earlier layers of the model. The Fused-MBConv layer is slightly different than the main block of EfficientNetV1 due to the normal 3 × 3 convolution used instead of a convolution over the depth, described in Fig. S2. The authors have also used techniques from several other previous works to make the ML architecture more efficient in training, while keeping the floating-point operations (FLOPs) and the number of parameters reasonably low.27,34,42,44 Specifically, a newer method for large-scale models, e.g., ImageNet, provided a regularization technique to be consistent with the image resolution during training. This algorithm, coined as progressive learning with adaptive regularization, limited the regularization and image size during earlier epochs but increased both parameters in later epochs. Although the progressive learning was not applied to the model in this work, the transfer learning and techniques to decrease FLOPs and parameters granted a 4 h difference in training from the well-known ResNet-50 model.

The main layers of the EfficientNetV2 model, known as the mobile inverted bottleneck (MBConv) and Fused-MBConv, are repeated several times in the main model. The main blocks of the architecture are described in Fig. 4(a) with the feature extraction (top) and binary classification (bottom) summarized with different segments in the graphic. The EfficientNetV2 feature extractor is comprised of a stem, with a normal convolution layer, and six separate blocks each consisting of multiple repeats of a Fused-MBConv or MBConv layer.29 After the features are extracted into a 7 × 7 × 1280 tensor, the values are globally averaged, with each channel resulting in one value. Finally, the classification output is computed after three layers, two with a ReLU activation and one with a sigmoid activation. Each thin rectangle under the specific block of the network denotes the number of repetitions of the Fused-MBConv or MBConv layer.

FIG. 4.

EfficientNetV2B0 model architecture (a) and training results from the three CTC classification models (b). The architecture illustrates a forward pass through the EfficientNetV2 feature extractor on the top on and the binary classifier on the bottom. The expansion factor (E) in each block denotes the amount the channels are expanded in the mobile inverted bottleneck (MBConv) block. The training results portray the model loss on the left and the accuracy obtained on the right. Each row consists of a CTC dataset, including DU145vsPC3, LnCAPvsPC3, and SKOV3drvsn. The blue color represents the training set, and the orange color represents the validation set.

FIG. 4.

EfficientNetV2B0 model architecture (a) and training results from the three CTC classification models (b). The architecture illustrates a forward pass through the EfficientNetV2 feature extractor on the top on and the binary classifier on the bottom. The expansion factor (E) in each block denotes the amount the channels are expanded in the mobile inverted bottleneck (MBConv) block. The training results portray the model loss on the left and the accuracy obtained on the right. Each row consists of a CTC dataset, including DU145vsPC3, LnCAPvsPC3, and SKOV3drvsn. The blue color represents the training set, and the orange color represents the validation set.

Close modal

The training results in Fig. 4(b) illustrate the model accuracy (left) and loss (right) for each dataset and epoch during training. Here, the DU145vsPC3 dataset is shown on the top, LnCAPvsPC3 in the middle, and SKVO3drvsn on the bottom with the training set in blue and the validation set in orange. It is apparent that training with EfficientNetV2 model gave better stability in model accuracy and loss. Noticeably, the validation loss only reaches to less than 0.75 for all three datasets, while the loss reaches to over 400 with ResNet-50. Additionally, training time was around 50% faster than ResNet training due to the several optimizations introduced by utilizing NAS. We established another TensorFlow checkpoint by monitoring the validation accuracy during training and, thus, saved the weights with the best performance.

The ResNet-50 test set performance with all binary confusion matrices is displayed in Fig. 5. The top left gives the confusion matrix for DU145vsPC3, top right for LnCAPvsPC3, and bottom left for SKOV3drvsn. The bottom right presents the accuracy, precision, and recall for all datasets showing favorable performance. The positive label was taken as the most invasive CTC (PC3 and drSKOV3) for the precision and recall computation. Notably, for the dataset consisting of PC3 and DU145 CTCs, we received the lowest accuracy despite having the highest number of training examples. As having the highest number of training examples should ordinarily give the highest accuracy, we concluded that this could be due to PC3 and DU145 cells having a closer EpCAM expression level compared to that of PC3 and LnCAP.45 Studies show that LnCaP has a significantly higher expression level as compared to DU145 and PC3, resulting in a more difficult journey to a local minimum in the gradient descent curve.46 Moreover, the model may be overfitting from too many training examples and thus a higher error rate in the test set. Future work is needed to explore this aspect further and find why this dataset would give the lowest comparable accuracy to the others.

FIG. 5.

Binary confusion matrices from the ResNet-50 model describing the test set results the different classification models ran. Each confusion matrix has the true labels for both classes on the vertical axis and the predicted labels on the horizontal axis. The accuracy is then calculated for each dataset as the true positives over the total predicted. The precision and recall are illustrated below the accuracy table with the positive label taken as the more invasive CTC (PC3 and drSKOV3).

FIG. 5.

Binary confusion matrices from the ResNet-50 model describing the test set results the different classification models ran. Each confusion matrix has the true labels for both classes on the vertical axis and the predicted labels on the horizontal axis. The accuracy is then calculated for each dataset as the true positives over the total predicted. The precision and recall are illustrated below the accuracy table with the positive label taken as the more invasive CTC (PC3 and drSKOV3).

Close modal

The performance of the EfficientNetV2 model on our cancer cell datasets was systematically compared with the binary confusion matrices to see the false predictions in the test set. Figure 6 shows the binary confusion matrices with the accuracy, precision, and recall in the bottom right. Here, the percent increase for the PC3vsDU145 accuracy (7.6%) shows that the EfficientNetV2 classification gives a substantially better prediction on the test set than the ResNet-50 model does. The accuracy performance is also slightly better for the other two datasets with an average percent increase of 3.5%. Certainly, the architecture from NAS search resulted in a more stable, faster, and performant model, allowing for higher accuracy in classification of all datasets. Due to the importance of correct classification for cancer prediction, this slight increase in accuracy can be potentially useful in future applications. More importantly, the high accuracy results computed from these datasets reveal a potential that deep CNNs have with CTCs and present a new research strategy in subtype identification.

FIG. 6.

Binary confusion matrices from the EfficientNetV2 model describing the test set results the different classification models ran. Each confusion matrix has the true labels for both classes on the vertical axis and the predicted labels on the horizontal axis. The accuracy is then calculated for each dataset as the true positives over the total predicted. The precision and recall are illustrated below the accuracy table with the positive label taken as the more invasive cancer cell (PC3 and drSKOV3).

FIG. 6.

Binary confusion matrices from the EfficientNetV2 model describing the test set results the different classification models ran. Each confusion matrix has the true labels for both classes on the vertical axis and the predicted labels on the horizontal axis. The accuracy is then calculated for each dataset as the true positives over the total predicted. The precision and recall are illustrated below the accuracy table with the positive label taken as the more invasive cancer cell (PC3 and drSKOV3).

Close modal

CTC subtype identification is an important issue in cancer diagnosis due to different levels of tumor invasiveness. There are currently many systems available to diagnose cancer that have complex procedures, and therefore, a less expensive tool is required to examine the disease progression. Deep learning offers a unique approach, separating images into different classes with CNNs, giving promise in determining how metastatic a tumor is. In this work, two separate deep CNN architectures, ResNet-50 and EfficientNetV2, were explored to illustrate the potential for future clinical applications. ResNet-50, a popular image classification model, utilizes skip connections to learn high level features in input images, while EfficientNetV2, a recent update to EfficientNetV1, utilizes MBConv layers to give better results with faster training. We systematically compared training and test set performance between the two models with the TensorFlow API and Google Colaboratory platform. Three separate binary datasets were collected representing DU145 with PC3 cells (DU145vsPC3), LnCAP with PC3 cells (LnCAPvsPC3), and SKOV3n with SKOV3dr cells (SKOV3drvsn). Each of these datasets were collected by manually cropping microscopic images at 50× magnification containing multiple cells in them.

Training results show fluctuating loss and accuracy, but with a checkpoint implemented during training, the best model weights were utilized for testing. EfficientnetV2 optimized for fast training and the low number of parameters resulted in a 50% increase in training speed. With the use of transfer learning for weight initialization, the performance of EfficientNetV2 showed positive results with high accuracy. Specifically, the average percent increase with the EfficientNetV2 model was 3.5% between all three datasets used. One constraint of this approach is that the cells need to be located first, whether captured in a liquid biopsy or removed from a solid tumor. Additionally, cells from a solid tumor would need to be surgically removed to gather microscopic images of the cells for metastatic diagnosis. Furthermore, the solid tumor needs to be decomposed into individual cells, and this process can affect the cells’ natural physiological conditions, resulting in inaccurate downstream analysis.47 The recent work also reported the discovery of cancer cell clusters in blood circulation.48,49 Future work will include this cluster analysis; specifically, we will train the model to include both single and cancer clusters containing several non-cancer cells. Furthermore, with cultured cells showing positive performance, this study illustrates that a widespread clinical model will give similar results. We plan on expanding this research further to clinical applications and to CTC identification methods to provide simpler diagnostic approaches.

Ham’s F-12K (Kaighn’s), Eagle’s Minimum Essential Medium (EMEM), Roswell Park Memorial Institute (RPMI)-1640, and McCoy’s 5A modified growth media were purchased from Fisher Scientific. DAPI and propidium iodide cell stains were purchased from Thermo Fisher. Paclitaxel (PTX) drug was purchased from Thermo Fisher. Trypsin–ethylenediaminetetraacetic acid (EDTA) was purchased from Invitrogen. The Petri dish’s used as a transparent container for cell imaging was purchased from Antylia Scientific.

Human prostate cancer cell line PC3 (ATCC CRL-1435TM VA) was cultured at 37°C in an F-12K growth medium containing 10% FBS and 1% penicillin/streptomycin with the media being changed every two days. Human prostate cancer cell line DU145 (ATCC HTB-81TM VA) was cultured at 37°C in an EMEM growth medium containing 10% FBS and 1% penicillin/streptomycin with the media being changed every two days. Human prostate cancer cell line LnCAP (ATCC CRL-1740TM VA) was cultured at 37°C in an RPMI-1640 growth medium containing 10% FBS and 1% penicillin/streptomycin with the media being changed every two days. Human ovarian cancer cell line SKOV3 (ATCC HTB-77TM VA) was cultured at 37°C in a McCoy’s 5A modified growth medium containing 10% FBS and 1% penicillin/streptomycin with the media being changed every two days. The drug-resistant SKOV3 cancer cells were cultured at 37°C in an McCoy’s 5A modified growth medium containing 0.25 µM PTX, 10% FBS, and 1% penicillin/streptomycin with the media being changed every two days. The drug-resistant SKOV3 cells were accustomed to PTX by gradually increasing the concentration of the drug until the cells became resistant. After cells reached around 80% confluency, they were released from culture flasks by a 0.05% trypsin–EDTA solution at 37°C.

The cells were centrifuged and resuspended to a high concentration so that many cells could be seen in one microscopic image. Images were captured with an Olympus BX53 microscope with an XM10 CCD camera. After the micrograph images containing multiple cells were stored on a computer, they were manually cropped with a handwritten Python program with imaging library OpenCV.50 Many of the cells in the original captured image were stuck together in doublets or triplets due to the inherent adhesion property of EpCAM. Since many of the cells stuck together, we used a program to crop out an individual cell and filled the corners of the those images with the average pixel, shown in Fig. S3. Since the research shows that clinical CTCs form clusters when branching to the bloodstream, we included the cells cropped this way to make the ML model more generalizable to clusters. Although every class (PC3, DU145, LnCAP, SKOV3dr, and SKOV3n) included this special cropping, most of the cells used were singlets and did not require it.

See the supplementary material for a comprehensive analysis on the ResNet blocks and MBConv layers used in the classification models.. Additionally more experimental analysis is presented on the cropping of single cancer cells.

W.L. acknowledges support from the National Science Foundation (CBET, Grant No. 1935792) and the National Institutes of Health (IMAT, Grant No. 1R21CA240185-01). Q. Lu acknowledges the startup fund from the Texas Tech University.

The authors have no conflicts to disclose.

Karl Gardner: Conceptualization (equal); Data curation (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Software (equal); Supervision (equal); Validation (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). Rutwik Joshi: Writing – original draft (equal); Writing – review & editing (equal). Md Nayeem Hasan Kashem: Data curation (equal); Writing – original draft (equal); Writing – review & editing (equal). Thanh Quang Pham: Data curation (equal); Writing – original draft (equal); Writing – review & editing (equal). Qiugang Lu: Conceptualization (equal); Formal analysis (equal); Writing – original draft (equal); Writing – review & editing (equal). Wei Li: Project administration (equal); Writing – review & editing (equal).

We believe that the open-source code allows for greater innovation and cutting-edge research due to the ability of others incorporating additional features and testing. Scientific knowledge should not be contained; rather, it should be accessible for others to use freely in their own experiments. Furthermore, we would like to be transparent with this analysis and assist other researchers with more AI inspired cancer cell projects. For this reason, our adoption of the ResNet-50 model and images from all datasets can be found at https://github.com/karl-gardner/cell_classification. Here, one can access the data associated with training and testing for all models (DU145vsPC3, LnCAPvsPC3, and SKOV3drvsn) in a Google Colaboratory notebook and a shared Google Drive folder.51 We encourage others to contribute or use our trained model for testing or production purposes.

1.
F.
Bray
et al, “
The ever‐increasing importance of cancer as a leading cause of premature death worldwide
,”
Cancer
127
(
16
),
3029
3030
(
2021
).
2.
J.
Ferlay
et al, “
Cancer statistics for the year 2020: An overview
,”
Int. J. Cancer
149
(
4
),
778
789
(
2021
).
3.
D.
Hanahan
and
R. A.
Weinberg
, “
Hallmarks of cancer: The next generation
,”
Cell
144
(
5
),
646
674
(
2011
).
4.
H.
Hamidi
and
J.
Ivaska
, “
Every step of the way: Integrins in cancer progression and metastasis
,”
Nat. Rev. Cancer
18
(
9
),
533
548
(
2018
).
5.
D.
Juric
et al, “
Convergent loss of PTEN leads to clinical resistance to a PI(3)Kα inhibitor
,”
Nature
518
(
7538
),
240
244
(
2015
).
6.
E. L.
Kwak
et al, “
Molecular heterogeneity and receptor coamplification drive resistance to targeted therapy in MET-amplified esophagogastric cancer
,”
Cancer Discovery
5
(
12
),
1271
1281
(
2015
).
7.
M.
Poudineh
et al, “
Profiling functional and biochemical phenotypes of circulating tumor cells using a two‐dimensional sorting device
,”
Angew. Chem., Int. Ed.
56
(
1
),
163
168
(
2017
).
8.
P.
Mehlen
and
A.
Puisieux
, “
Metastasis: A question of life or death
,”
Nat. Rev. Cancer
6
(
6
),
449
458
(
2006
).
9.
I.
Pastushenko
and
C.
Blanpain
, “
EMT transition states during tumor progression and metastasis
,”
Trends Cell Biol.
29
(
3
),
212
226
(
2019
).
10.
K.-A.
Hyun
et al, “
Isolation and enrichment of circulating biomarkers for cancer screening, detection, and diagnostics
,”
Analyst
141
(
2
),
382
392
(
2016
).
11.
H.
Cho
et al, “
Microfluidic technologies for circulating tumor cell isolation
,”
Analyst
143
(
13
),
2936
2970
(
2018
).
12.
S.
Riethdorf
et al, “
Detection of circulating tumor cells in peripheral blood of patients with metastatic breast cancer: A validation study of the CellSearch system
,”
Clin. Cancer Res.
13
(
3
),
920
928
(
2007
).
13.
M.
Labib
and
S. O.
Kelley
, “
Circulating tumor cell profiling for precision oncology
,”
Mol. Oncol.
15
(
6
),
1622
1646
(
2021
).
14.
C. J.
Creighton
, “
The molecular profile of luminal B breast cancer
,”
Biologics
6
,
289
297
(
2012
).
15.
N. H.
Tran
et al, “
Precision medicine in colorectal cancer: The molecular profile alters treatment strategies
,”
Ther. Adv. Med. Oncol.
7
(
5
),
252
262
(
2015
).
16.
B.
Jiang
et al, “
Machine learning of genomic features in organotropic metastases stratifies progression risk of primary tumors
,”
Nat. Commun.
12
(
1
),
6692
(
2021
).
17.
J.
Barretina
et al, “
The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity
,”
Nature
483
(
7391
),
603
607
(
2012
).
18.
K.
Fukushima
, “
A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position
,”
Biol. Cybern.
36
,
193
202
(
1980
).
19.
M.
Toratani
et al, “
A convolutional neural network uses microscopic images to differentiate between mouse and human cell lines and their radioresistant clones
,”
Cancer Res.
78
(
23
),
6703
6707
(
2018
).
20.
N.
Ahmed
et al, “
Identification of leukemia subtypes from microscopic images using convolutional neural network
,”
Diagnostics
9
(
3
),
104
(
2019
).
21.
S.
Wang
et al, “
Label-free detection of rare circulating tumor cells by image analysis and machine learning
,”
Sci. Rep.
10
(
1
),
12226
(
2020
).
22.
G.
Moallem
et al, “
Detection of live breast cancer cells in bright-field microscopy images containing white blood cells by image analysis and deep learning
,”
J. Biomed. Opt.
27
(
7
),
076003
(
2022
).
23.
D.
Ribatti
,
R.
Tamma
, and
T.
Annese
, “
Epithelial-mesenchymal transition in cancer: A historical overview
,”
Transl. Oncol.
13
(
6
),
100773
(
2020
).
24.
M.
Iwatsuki
et al, “
Epithelial–mesenchymal transition in cancer development and its clinical significance
,”
Cancer Sci.
101
(
2
),
293
299
(
2010
).
25.
F. M.
Kashkooli
et al, “
A spatiotemporal multi-scale computational model for FDG PET imaging at different stages of tumor growth and angiogenesis
,”
Sci. Rep.
12
(
1
),
10062
(
2022
).
26.
M. A.
Abazari
,
M.
Soltani
, and
F. M.
Kashkooli
, “
Targeted nano-sized drug delivery to heterogeneous solid tumor microvasculatures: Implications for immunoliposomes exhibiting bystander killing effect
,”
Phys. Fluids
35
(
1
),
011905
(
2023
).
27.
M.
Tan
and
Q. V.
Le
, “
EfficientNet: Rethinking model scaling for convolutional neural networks
,”
arXiv Cornell University
v5
,
6105
6114
(
2019
), https://arxiv.org/abs/1905.11946.
28.
M.
Tan
and
Q. V.
Le
, “
EfficientNetV2: Smaller models and faster training
,”
arXiv Cornell University
v3
,
10096
10106
(
2021
), https://arxiv.org/abs/2104.00298.
29.
M.
Tan
et al, Google Brain AutoML, 2021, Google Brain: https://github.com/google/automl.
30.
R.
Cypess
,
R.
Cheng
, and
R.
Gowin
, American Type Culture Collection (ATCC),
2023
.
31.
A.
Ng
, DeepLearning.AI, 2022 [cited 2022; Education Technology Company]. Available from: https://www.deeplearning.ai/.
32.
S.
Yadav
, Weight initialization techniques in neural networks,
2018
, https://towardsdatascience.com/.
33.
X.
Glorot
and
Y.
Bengio
, “
Understanding the difficulty of training deep feedforward neural networks
,” in
Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics
, edited by
T.
Yee Whye
and
T.
Mike
(
PMLR: Proceedings of Machine Learning Research
,
2010
), pp.
249
256
.
34.
P.
Ramachandran
,
B.
Zoph
, and
Q. V.
Le
, “
Swish: A self-gated activation function
,” arXiv:1710.05941v1 (
2017
).
35.
K.
He
et al, “
Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification
,” in
2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015
(IEEE, 2015).
36.
J.
Deng
et al, “
ImageNet: A large-scale hierarchical image database
,” in
2009 IEEE Conference on Computer Vision and Pattern Recognition
(
IEEE
,
2009
).
37.
M.
Abadi
et al, “
TensorFlow: A system for large-scale machine learning
,” in
Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation
(
USENIX Association
,
Savannah, GA
,
2016
), pp.
265
283
.
38.
K.
He
et al, “
Deep residual learning for image recognition
,” arXiv:1512.03385 (
2015
).
39.
Google Developers, Classification: Precision and Recall, 2022; Available from: https://developers.google.com/machine-learning/crash-course/classification/precision-and-recall.
40.
M.
Raissi
, Applied Deep Learning Course,
2023
: Github.
41.
P.
Dwivedi
, Understanding and coding a ResNet in Keras,
2019
, Towards Data Science: https://towardsdatascience.com/.
42.
M.
Tan
et al, “
MnasNet: Platform-aware neural architecture search for mobile
,” in
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach,
CA
,
15
20
June
2019
(
IEEE
,
2019
).
43.
B.
Zoph
et al, “
Learning transferable architectures for scalable image recognition
,” arXiv:1707.07012 (
2017
).
44.
M.
Sandler
et al, “
MobileNetV2: Inverted residuals and linear bottlenecks
,”
arXiv Cornell University
v4
,
4510
4520
(
2018
), https://arxiv.org/abs/1801.04381.
45.
Y.
Xu
,
H.
Zhao
, and
J.
Hou
, “
Correlation between overexpression of EpCAM in prostate tissues and genesis of androgen-dependent prostate cancer
,”
Tumour Biol.
35
(
7
),
6695
6700
(
2014
).
46.
B. J.
Green
et al, “
Phenotypic profiling of circulating tumor cells in metastatic prostate cancer patients using nanoparticle-mediated ranking
,”
Anal. Chem.
91
(
15
),
9348
9355
(
2019
).
47.
B.
Arneth
, “
Tumor microenvironment
,”
Medicina
56
(
1
),
15
(
2020
).
48.
I.
Krol
et al, “
Detection of clustered circulating tumour cells in early breast cancer
,”
Br. J. Cancer
125
(
1
),
23
27
(
2021
).
49.
E.
Schuster
,
R.
Taftaf
,
C.
Reduzzi
,
M. K.
Albert
,
I.
Romero-Calvo
, and
H.
Liu
, “
Better together: Circulating tumor cell clustering in metastatic cancer
,”
Trends Cancer
7
(
11
),
1020
1032
(
2021
).
50.
G.
Bradski
, The OpenCV Library. Dr. Dobb’s Journal of Software Tools,
2000
.
51.
K.
Gardner
, Cell Classification Repository, 2023, Github: https://github.com/karl-gardner/cell_classification.

Supplementary Material