Atomic force microscope (AFM)-based nanomanipulation has been proved to be a possible method for assembling various nanoparticles into complex patterns and devices. To achieve efficient and fully automated nanomanipulation, nanoparticles on the substrate must be identified precisely and automatically. This work focuses on an autodetection method for flexible nanowires using a deep learning technique. An instance segmentation network based on You Only Look Once version 3 (YOLOv3) and a fully convolutional network (FCN) is applied to segment all movable nanowires in AFM images. Combined with follow-up image morphology and fitting algorithms, this enables detection of postures and positions of nanowires at a high abstraction level. Benefitting from these algorithms, our program is able to automatically detect nanowires of different morphologies with nanometer resolution and has over 90% reliability in the testing dataset. The detection results are less affected by image complexity than the results of existing methods and demonstrate the good robustness of this algorithm.

  • A deep learning algorithm based on YOLOv3 and an FCN is applied to detect nanowires in AFM images.

  • Separate nanowires are abstracted as polygonal line segments for further manipulation with the AFM probe.

  • Experimental results prove the efficiency of the algorithm in automated nanomanipulation.

The assembly of nanoparticles, including a variety of low-dimensional materials, represents one of the most important areas of current nanotechnology. Nanoparticles can be used as building blocks of nanodevices or to make nanostructures with enhanced properties.1,2 In particular, as one-dimensional nanoparticles, nanowires have shown great potential for use in wearable sensors and other nanoelectronic devices.3–8 Consequently, the ability to achieve precise and rapid manipulation of nanowires has countless implications for both fundamental research and industrial applications. Since its invention in the 1980s,9 atomic force microscopy (AFM) has become the most popular tool to image and manipulate nanoparticles with nanometer resolution. However, moving a nanoparticle over a long distance (e.g., 10 μm) with an AFM probe often requires hundreds or even thousands of manipulation steps, which is rather time-consuming. To increase the efficiency of AFM-based manipulation, especially of nanowires, we have developed an automated nanomanipulation technique.10,11 Precise nanowire detection is the premise for successful manipulation, and therefore the degree of precision determines the extent to which manipulation can be automated. In addition, morphological abstraction of nanowires needs to be examined carefully, since inappropriate geometrical simplification may lead to significant displacement errors or failure of the whole manipulation. The alternative is manual handling of nanowires, which has very low efficiency. Therefore, great effort needs to be put into establishing a reliable and efficient nanowire detection algorithm to replace hands-on image processing.

Although many image processing techniques have been applied successfully to the detection of spherical nanoparticles,12–14 the detection of nanowires remains a more challenging task. This is mainly because most nanowires are flexible and hence can appear in different shapes. Their complex morphology makes them difficult to identify and abstract. Classical image morphology algorithms can barely meet the requirements for detecting nanowires under the complex conditions encountered in practice on account of low-dimensional constraints and the rote rules that these define. Therefore, deep learning techniques have been proposed as possible solutions. Since 2012, deep learning15 methods based on neural networks, especially convolutional neural networks (CNNs), have dominated many computer vision applications, such as object detection,16 semantic segmentation,17 and human pose estimation.18 Neural networks can learn high-dimensional hierarchical features of targets from training datasets, and it is this that underlies the high performance of deep learning algorithms. Originating in computer science, deep learning techniques have found applications to image processing in a wide range of fields, such as medical diagnosis,19 biological analysis,20 and material classification,21,22 while our team has adopted these techniques to accurately detect movable nanowires with the aim of improving the automation of nanomanipulation procedures.

In this paper, we train an instance segmentation network based on You Only Look Once version 3 (YOLOv3)23–25 and a fully convolutional network (FCN)26 to segment movable nanowires in AFM images at a high level of abstraction. The process of detection using this network involves two stages, as does its training. First, an object detector simplified from YOLOv3 locates nanowires in the image with bounding boxes. Then, a tiny FCN is applied to the images cropped by the boxes to segment objects within them. This region-based segmentation approach outperforms a fully convolutional approach, which is directly applied to the initial image, because the object detector eliminates unnecessary pixels, which may interfere with the segmentation and cause errors. Combination of the network with follow-up morphology and fitting algorithms enables the posture and position of movable nanowires to be detected robustly. In this project, the training dataset and the testing dataset contain images of silver nanowires captured in our previous nanomanipulation experiments. The algorithm details and the test results will be introduced in the following sections.

Differing from optical microscopes, an AFM captures an image by sampling the atomic forces between a probe and the specimen in fixed steps along x and y axes on a given region. With this mechanical method, it is possible to manipulate nanoparticles using the forces produced by pushing them with AFM probes. This manipulation process involves two phases. First, the probe approaches the substrate until it comes into contact with it. In the second phase, the probe moves on the substrate along the path to the selected target position. If this path passes through a nanoparticle, the nanoparticle can be pushed by the probe to the target position. However, this process of manipulation is time-consuming because imaging and manipulation using AFM cannot be executed simultaneously. An image must be captured after each path of manipulation in order to check whether the nanoparticle has reached its target position. For many reasons, this process will unavoidably repeat many times, and hence the efficiency of nanomanipulation is very low. One of the reasons for this is that the path may not pass through the nanoparticle because of its manual selection. Although the path may be appropriate, the nanoparticle may slip and lose contact with the probe during the movement. In particular, for nanowire manipulation, nanowires may break off or twine when they are attached to AFM probes. Consequently, automated nanomanipulation technology aims to enhance the efficiency of manipulating objects with AFM by decreasing the frequency of imaging required between rounds of manipulation.

The technique involves four steps, as shown in Fig. 1. First, an image is captured by AFM, and the movable nanowires in the image are automatically detected by a computer program. In the second step, a target position is assigned for each nanowire. In the third step, the order of moving the nanowires and the manipulation paths are generated according to special rules defined after modelling and simulation of the behavior of the nanowires. Finally, all manipulation paths are continuously executed by AFM without manual intervention. In this way, numerous paths are implemented in one round of manipulation, and consequently the imaging steps are less frequent. This technique greatly improves the efficiency of manipulating objects on the nanoscale and enables assembly of complex nanostructures in a relatively short time. For instance, Fig. 1(f) shows a case in which the number “1895” is assembled by nanomanipulation. The nanowire detection algorithm discussed in this paper is devised for the first step of our automated nanomanipulation technique.

FIG. 1.

Demonstration of the automated nanomanipulation technique: (a) AFM image of silver nanowires of diameter 30 nm lying on the silicon substrate; (b) detection of nanowires (marked in red); (c) target positions of nanowires (marked in white); (d) paths generated automatically for manipulation; (e) ideal manipulation result using AFM; (f) a case of manipulation in which the number “1895” is assembled using 20 nanowires of diameter 30 nm.

FIG. 1.

Demonstration of the automated nanomanipulation technique: (a) AFM image of silver nanowires of diameter 30 nm lying on the silicon substrate; (b) detection of nanowires (marked in red); (c) target positions of nanowires (marked in white); (d) paths generated automatically for manipulation; (e) ideal manipulation result using AFM; (f) a case of manipulation in which the number “1895” is assembled using 20 nanowires of diameter 30 nm.

Close modal

As the input of the algorithm, a grayscale image of nanowires captured by a commercial AFM reflects the local topography of the sample. Nanowires tangled or overlapped with other nanowires are eliminated, since they cannot be predictably manipulated with AFM probes. Movable nanowires scanned completely in the image area are the targets to be detected. For the output of the algorithm, we use a polygonal line to represent a nanowire, since most nanowires are flexible and can be straight, folded, or curved. In this way, the posture and position of a nanowire can be defined by the nodes of a polygonal line. In general, the approach of the algorithm to obtain the polygonal lines is a two-stage segmentation network with a series of morphology and fitting operations on the output of the network.

The architecture of the present network is illustrated schematically in Fig. 2. As an instance-first segmentation network, our network utilizes a simplified YOLOv3 to predict nanowires and obtain their bounding boxes. Then, a tiny FCN is used to generate object masks from the images cropped by the bounding boxes.

FIG. 2.

Architecture of the present segmentation network.

FIG. 2.

Architecture of the present segmentation network.

Close modal

Based on Darknet-53,27 YOLOv3 is by far one of the fastest object detectors with high detecting precision. It integrates ResNet28 to build a deep end-to-end network and applies CNN layers with a stride of 2 to take the role of downsampling instead of pooling layers.29 Three output feature maps in different sizes predict attributes of bounding boxes. The width and height of the bounding boxes are relative to those of the designated anchor boxes. In total, nine anchor boxes obtained by K-means clustering30 on the Visual Object Classes (VOC)31 and Common Objects in Context (COCO)32 datasets are adopted in the original YOLOv3 network. Upsampling layers and concatenating operations are used to combine output arrays from different layers for prediction in later feature maps.25 Except for the output feature maps, each convolutional layer is followed by a rectified linear unit (ReLU) function33 and batch normalization.34 

In our application of YOLOv3 in PyTorch,35 only one feature map and one anchor box are retained for detection. As shown in Fig. 2, there are four downsampling layers behind the first convolutional layer, and hence the size of the output feature map is one-sixteenth the size of the input image. These downsampling layers are followed by one, two, eight, and eight ResNets, respectively. Finally, with seven convolutional layers, the feature map is achieved.

As illustrated in Fig. 3, if the size of the input image is 112 pixels × 112 pixels, then the size of the output feature map will be 7 cells × 7 cells, which indicates that the present object detector has divided the image into 7 × 7 regions. Each cell of the feature map is responsible for detecting the bounding box of the nanowire whose center is in the corresponding region of the cell. For instance, the red region in Fig. 3 detects the nanowire in the image. A bounding box predicts three types of parameters: box coordinates, object score, and class score. The four parameters (tx, ty, tw, th) can be transferred into the coordinates (bx, by, bw, bh) of the box with the size of the region as units by the following equations:

(1)
(2)
(3)
(4)
(5)

(bx, by) are the center coordinates of the bounding box relative to the input image, (bw, bh) are the width and height of the box, (cx, cy) is the index of the cell in the output feature map, and (aw, ah) are the width and height of the anchor box.

FIG. 3.

Concepts of the object detector: (a) image of a nanowire; (b) feature map; (c) bounding box prediction.

FIG. 3.

Concepts of the object detector: (a) image of a nanowire; (b) feature map; (c) bounding box prediction.

Close modal

The object score p0 gives the probability of having target objects in the bounding box. The class score p1 represents the probability for a nanowire. Although only one category of objects is detected in this project, the object score is reserved for use in future work in case more categories of nanoparticles can be detected in addition to nanowires.

After the bounding boxes of the nanowires have been acquired, a tiny FCN is utilized on the cropped regional images to obtain the mask of objects within them. An FCN is a classical deep learning network for semantic segmentation. We use one here because it has a simple structure and the segmentation task in the project is not complex. It is enough to solve the problem with the FCN. Normally, FCNs are based on the convolutional parts of Visual Geometry Group (VGG) networks.36 Given that the segmentation task in this project is less complex, we employed a shorter and lighter FCN after validating its performance on the testing dataset. Our FCN model retains the layers before the third max-pooling layer in VGG-16.36 As illustrated in stage 2 in Fig. 2, seven convolutional layers and three max-pooling layers are involved in the encoding process. The numbers of output channels for each convolutional layer are 16, 16, 32, 32, 64, 64, and 64, and these are also smaller than the corresponding layers in VGG-16. In the decoding process, three deconvolutional layers37 synthesize outputs from different max-pooling layers to generate a final pixelwise output of the same size as the input image. As the solution of a binary classification problem, the output comprises two channels. To avoid overfitting on the training dataset, this FCN generates the mask of ridge objects in the image area rather than that of nanowires, since other nanoparticles may also be cropped in the region. Further operations can select the mask of nanowires, which will be discussed in later sections.

The training of the segmentation network separately involves two stages. In the first stage, the weights of the object detector are randomly initialized with the default method of PyTorch. To prevent overfitting, we use data augmentation38 by resizing the input images every 10 batches and randomly transposing them. The default size of input images is 416 pixels × 416 pixels, and the random sizes are 320 pixels × 320 pixels, 352 pixels × 352 pixels, 384 pixels × 384 pixels, 416 pixels × 416 pixels, 448 pixels × 448 pixels, 480 pixels × 480 pixels, and 512 pixels × 512 pixels. The network is trained with the Adam optimizer39 using the recommended default parameters. We set the batch size to be four, and the learning rate is 0.001. The loss function is defined by seven parts as follows:

(6)

When cells in the target feature map predict a nanowire, the function counts six parts. Lobj is the loss-of-object score between detected cells and target cells and kobj is the weight of this score. Lclass is the loss of class score. Lx, Ly, Lw, and Lh are coordinate losses between predicting bounding boxes and target boxes. When target cells predict no object, only Lnoobj representing the loss-of-object score is counted and other kinds of loss are ignored. knoobj is the weight of Lnoobj, which is set to be much bigger than kobj in this project. Among all these losses, Lobj, Lnoobj, and Lclass are cross-entropy losses40 and Lx, Ly, Lw, and Lh are mean squared losses.

The datasets for the first stage include 220 AFM images, which are randomly separated into two groups: 170 images for training and 50 images for validation. These images are labeled by hand with a self-devised labeling software.

In the second stage, the weights of the FCN are also randomly initialized by the default method of PyTorch, and the optimizer is SGD41 with a learning rate of 0.001 and a momentum of 0.7. We set the batch size to 16. To avoid overfitting, the input images are randomly rotated or transposed before training. The images in the current datasets are cropped from the annotated AFM images in the first stage. There are 896 regional images in total, 673 of which are used for training. Since these images are cropped by the boxes, the sizes of which are not fixed, they are first resized into 256 pixels × 256 pixels to facilitate batch training. The pixel values of the input images are normalized with a mean of 0.15 and a standard deviation of 0.15. The target masks for segmentation are one-hot encoded for training. They are annotated with black and white pixels using Photoshop. The loss function between the predictions and the target masks is cross-entropy loss.

As illustrated in Figs. 4(a) and 4(b), the segmentation network predicts the bounding boxes of nanowires and generates the masks of objects within the boxes. However, the masks may form more than one continuous white pixel area, such as the small white pixel areas marked by the red circles in Fig. 4(b). The mask of the target nanowire is considered to be the largest continuous white pixel area, and other smaller areas are eliminated, as shown in Fig. 4(c). A morphological filling operation removes any black pixel area within the selected mask. The skeletons of the nanowires are then refined by an image thinning algorithm, and after being pruned the skeletons involve only two endpoints, as illustrated in Fig. 4(d).

FIG. 4.

Posture and position abstraction of nanowires: (a) AFM image of nanowires; (b) masks of objects obtained by our segmentation network; (c) masks of nanowires; (d) skeletons of nanowires after thinning and features of the skeletons; (e) polygonal lines representing nanowires after the application of the fitting algorithm. The scale bar indicates 2 μm.

FIG. 4.

Posture and position abstraction of nanowires: (a) AFM image of nanowires; (b) masks of objects obtained by our segmentation network; (c) masks of nanowires; (d) skeletons of nanowires after thinning and features of the skeletons; (e) polygonal lines representing nanowires after the application of the fitting algorithm. The scale bar indicates 2 μm.

Close modal

Displayed as the magnified local image in the green circle in Fig. 4(d), the nanowire skeleton is an array of white pixels possessing the following two features:

  1. For two adjacent pixels, each is in the 8-connected neighborhood of the other.

  2. For two white pixels both adjacent to the same white pixel, neither is in the 4-connected neighborhood of the other.

Given the coordinates of the skeleton pixels, a nanowire can be abstracted as a polygonal line by separating the coordinates to fit different lines. We use the least squares method to fit these lines. Finally, by calculating the endpoints and the intersections between adjacent lines, the skeleton is simplified to the nodes of the polygonal line representing the posture and position of the nanowire. All the polygonal lines and their corresponding bounding boxes are plotted in the initial image, as shown in Fig. 4(e). The three magnified regional images are respectively examples of a straight nanowire, a folded nanowire, and a curved nanowire. A straight nanowire is abstracted with two endpoints, while a folded or curved nanowire requires more nodes for its interpretation.

For evaluation of the first stage, we report several standard evaluation metrics: precision, recall, and F1 score. For each of these metrics, a higher score indicates better performance. Precision P is the fraction of instances predicted to be a nanowire that are correct:

(7)

Recall R is the fraction of instances with ground truth being a nanowire that are predicted to be a nanowire:

(8)

ŷi is the predicted class label for each bounding box, and yi is the corresponding ground truth class label. Here, the only class is the nanowire. Equivalently, precision is the ratio of true positives to total (true and false) positives, and recall is the ratio of true positives to total predictions. Precision and recall are both essential metrics to evaluate a model, but neither of them alone can testify to the robustness of a model.

Accordingly, the F1 score, balancing the influence of precision and recall, is devised to generally evaluate a model:

(9)

For the evaluation of the FCN model, the intersection over union (IoU) metric of masks is applied. IoUmask is the ratio of correctly predicted mask pixels of true positives to the union of mask pixels with either true or false positives plus false negatives:

(10)

where ŝi and si indicate pixels of the predicted mask and of the target mask, respectively.

In our testing dataset, 50 AFM images of nanowires are used to evaluate the object detector. Bounding boxes are considered correct when their object confidence and IoUs to their target boxes exceed the given thresholds. The threshold of IoU is denoted by THRIoU and that of object confidence by THRobj_conf. As listed in Table I, precision, recall, and F1 score are calculated with THRIoU ranging from 0.5 to 0.7 in steps of 0.1 and with THRobj_conf taking values of 0.001, 0.01, 0.1, 0.3, 0.5, and 0.7. The F1 scores of three items are over 0.95 with THRIoU of 0.5 and THRobj_conf of 0.01, 0.1, or 0.3, shown in bold in Table I. When THRIoU is 0.5 and THRobj_conf is 0.01, the detector performs best with a precision of 0.955 and a recall of 0.967 in the testing set. This THRobj_conf of 0.001 is quite small because kobj was set to 1 and knoobj was set to 100 in our training process. As a result, the object confidence of cells predicting no nanowire in the feature map is suppressed to 0. A small value of the object confidence can be taken to signify the existence of a nanowire. Given that the small training and testing datasets may lead to bias, we recommend setting THRobj_conf to 0.3 in practical applications to obtain more reliable and reasonable detection results. In general, both the precision and recall of this object detector are over 0.95 in the testing dataset.

TABLE I.

Evaluation of the object detector.

THRobj-conf
THRIoU Metrics 0.001 0.01 0.1 0.3 0.5 0.7
0.5  Precision  0.931  0.955  0.966  0.966  0.966  0.972 
Recall  0.967  0.967  0.941  0.935  0.922  0.922 
F1 score  0.948  0.961  0.954  0.950  0.943  0.946 
0.6  Precision  0.912  0.935  0.946  0.945  0.945  0.952 
Recall  0.948  0.948  0.922  0.915  0.902  0.902 
F1 score  0.929  0.942  0.934  0.930  0.923  0.926 
0.7  Precision  0.906  0.929  0.946  0.945  0.945  0.952 
Recall  0.941  0.941  0.922  0.915  0.902  0.902 
F1 score  0.923  0.935  0.934  0.930  0.923  0.926 
THRobj-conf
THRIoU Metrics 0.001 0.01 0.1 0.3 0.5 0.7
0.5  Precision  0.931  0.955  0.966  0.966  0.966  0.972 
Recall  0.967  0.967  0.941  0.935  0.922  0.922 
F1 score  0.948  0.961  0.954  0.950  0.943  0.946 
0.6  Precision  0.912  0.935  0.946  0.945  0.945  0.952 
Recall  0.948  0.948  0.922  0.915  0.902  0.902 
F1 score  0.929  0.942  0.934  0.930  0.923  0.926 
0.7  Precision  0.906  0.929  0.946  0.945  0.945  0.952 
Recall  0.941  0.941  0.922  0.915  0.902  0.902 
F1 score  0.923  0.935  0.934  0.930  0.923  0.926 

The output map of the FCN model has two channels, since background is regarded as a class in semantic segmentation. The second channel is used to generate the mask of objects in regional images. The nanowire mask is obtained by selecting the largest area in the mask of objects and then filling black pixel areas in it. The value of pixels in the output map ranges from 0 to 1. By setting the threshold to classify the pixels, the predicted mask can be obtained. If the value of a pixel exceeds the threshold, it is set to 1. Otherwise, it is set to 0. There are 223 images in the testing dataset to validate the FCN model. IoUmask of all objects and IoUmask of the target nanowire were calculated to show the performance of the model, and their averages are listed in Table II. In general, both IoUmask of all objects and IoUmask of the target nanowire are over 0.82, even though the threshold of pixel values is strictly set to 0.9. We recommend that the threshold be set as 0.1 in practical applications of the algorithm.

TABLE II.

Evaluation of the FCN model.

Threshold of Average IoUmask Average IoUmask
pixel values of all objects of target nanowire
0.1  0.860  0.843 
0.3  0.856  0.837 
0.5  0.852  0.832 
0.7  0.847  0.828 
0.9  0.842  0.823 
Threshold of Average IoUmask Average IoUmask
pixel values of all objects of target nanowire
0.1  0.860  0.843 
0.3  0.856  0.837 
0.5  0.852  0.832 
0.7  0.847  0.828 
0.9  0.842  0.823 

Actually, the effect of segmentation using the FCN in the algorithm is equivalent to that of combining two image morphology algorithms: edge detection and hole filling. However, the method based on deep learning can give better segmentation results. Figures 5(a)-5(d) are regional images cropped by the bounding boxes in which a nanowire dominates. Figures 5(a′)-5(d′) are images obtained by edge detection and hole filling. We use the Canny operator42 with thresholds of 50 and 100 to detect edges. Before that, the pixel values of the cropped image are rescaled into the range 0–255 to ensure that the thresholds are effective. Figures 5(a*)-5(d*) show the masks generated by our FCN model. In a simple image such as Fig. 5(a), the traditional approach can give a comparable segmentation result [Fig. 5(a′)] to that of the deep learning approach [Fig. 5(a*)]. However, when the substrate is scattered with small nanoparticles such as in Fig. 5(b), the traditional approach cannot remove these objects and hence gives a result like that in Fig. 5(b′). When the body of a nanowire is partly beyond the boundary of the image, as, for instance, in Fig. 5(c), edge detection cannot obtain closed edges, and therefore the hole filling step is meaningless. When the image condition is more complex, as in Fig. 5(d) with conspicuous contamination of the substrate, the detection result [Fig. 5(d′)] of the traditional approach is poor. The mask of contamination dominates the image, and selection of the largest white pixel area will lead to the nanowire being ignored. Comparison of the examples of the two approaches indicates that our FCN, an approach based on deep learning, can segment objects at a high abstraction level by extracting high-dimensional hierarchical features from the training datasets. Nevertheless, some morphology algorithms such as hole filling, image erosion, and image expansion have found application to amending segmentation masks obtained by the FCN model, and have thereby contributed to refining the skeletons of nanowires.

FIG. 5.

Segmentation results: (a)–(d) regional images cropped by bounding boxes; (a′)–(d′) masks obtained by edge detection and hole filling; (a*)–(d*) masks generated by our FCN model. Scale bars indicate 500 nm.

FIG. 5.

Segmentation results: (a)–(d) regional images cropped by bounding boxes; (a′)–(d′) masks obtained by edge detection and hole filling; (a*)–(d*) masks generated by our FCN model. Scale bars indicate 500 nm.

Close modal

The present nanowire detection algorithm is applied to the AFM images in Figs. 6(a)-6(f), giving the results shown in Figs. 6(a′)-6(f′), including the bounding boxes and the polygonal lines representing nanowires. Six movable nanowires are correctly detected in the image in Fig. 6(a′). In the image in Fig. 6(b′), three nanowires are precisely detected, including a folded nanowire at an angle of nearly 0°, as marked with the blue ellipse. All movable nanowires in the image in Fig. 6(c) are detected, and the incomplete nanowires circled with the blue ellipses are successfully eliminated. In the image in Fig. 6(d), two pairs of nanowires as labeled with the blue ellipses are quite close to each other, and these are also ideally identified. In the image in Fig. 6(e), the conditions are more complex: the nanowires have low contrast owing to the high obstacles at the top left corner of the image, and there are also several incomplete nanowires at the top of the image. Even so, all the nanowires are well detected by our algorithm. In the image in Fig. 6(f), 18 movable nanowires are on the substrate together with much contamination, and some of them are very close. However, our algorithm has found 17 of them precisely, and only one nanowire marked in the image is detected incorrectly. Compared with human-level detection, the precision and recall of detection using our algorithm on the testing dataset are respectively 0.914 and 0.902. In the validation of the first stage, the precision and recall are 0.966 and 0.941. The precision loss of 0.052 and the recall loss of 0.039 here are caused by errors in the second stage. With the average number of nanowires in each image being three, the average detection time on the testing dataset is 1.88 s, which is acceptable. The detailed time cost is listed in Table III. This algorithm greatly enhances the automation of the nanomanipulation technique. All of the above discussion indicates that our nanowire detection algorithm based on a deep learning technique is robust.

FIG. 6.

Detection using the nanowire detection algorithm: (a)–(f) AFM images of nanowires; (a′)–(f′) detection results of bounding boxes and polygonal lines representing nanowires. Scale bars indicate 2 μm.

FIG. 6.

Detection using the nanowire detection algorithm: (a)–(f) AFM images of nanowires; (a′)–(f′) detection results of bounding boxes and polygonal lines representing nanowires. Scale bars indicate 2 μm.

Close modal
TABLE III.

Average time cost.

Size Average time cost Average time cost on
(pixels) on YOLO (s) the whole detection (s)
256 × 256  0.08  1.60 
416 × 416  0.12  1.75 
728 × 728  0.21  1.87 
Size Average time cost Average time cost on
(pixels) on YOLO (s) the whole detection (s)
256 × 256  0.08  1.60 
416 × 416  0.12  1.75 
728 × 728  0.21  1.87 

In this paper, we have proposed a nanowire detection algorithm based on YOLOv3 and an FCN to segment movable nanowires in AFM images. With follow-up image morphology algorithms and a special fitting method, nanowires are detected at a high level of abstraction. We have demonstrated the high performance and the robustness of the algorithm. This data-driven approach expands the reach of traditional morphology algorithms to more complex nanowire features that have until now been difficult to treat in an automated fashion. The algorithm increases the efficiency of automated nanomanipulation, enabling it to meet the requirements of laboratory applications and increasing its potential for industrial use. This algorithm can generally be applied to images from other sources, not just AFM images. It can also be transplanted to other detection projects aimed at wire-shaped objects.

This research is supported by the National Nature Science Foundation of China (Grant No. 61973233). The open source project PyTorch-YOLOv3 is essential to this work.43 

1.
Khan
I
,
Saeed
K
,
Khan
I
.
Nanoparticles: properties, applications and toxicities
.
Arab J Chem
2019
;
12
(
7
):
908
931
. .
2.
Li
JF
,
Zhang
YJ
,
Ding
SY
, et al 
Core–shell nanoparticle-enhanced Raman spectroscopy
.
Chem Rev
2017
;
117
(
7
):
5002
5069
. .
3.
Park
J
,
Kim
J
,
Kim
K
, et al 
Wearable, wireless gas sensors using highly stretchable and transparent structures of nanowires and graphene
.
Nanoscale
2016
;
8
(
20
):
10591
10597
. .
4.
You
B
,
Han
CJ
,
Kim
Y
, et al 
A wearable piezocapacitive pressure sensor with a single layer of silver nanowire-based elastomeric composite electrodes
.
J Mater Chem A
2016
;
4
(
27
):
10435
10443
. .
5.
Hashemi
P
,
Ali
K
,
Alexander
R
.
Nanowire transistor structures with merged source/drain regions using auxiliary pillars
. U.S. patent 9,257,527. 9 February
2016
.
6.
Wu
WZ
,
Wang
ZL
.
Piezotronic nanowire-based resistive switches as programmable electromechanical memories
.
Nano Lett
2011
;
11
(
7
):
2779
2785
. .
7.
Mongillo
M
,
Spathis
P
,
Katsaros
G
, et al 
Multifunctional devices and logic gates with undoped silicon nanowires
.
Nano Lett
2012
;
12
(
6
):
3074
3079
. .
8.
Mai
LQ
,
Tian
XC
,
Xu
X
, et al 
Nanowire electrodes for electrochemical energy storage devices
.
Chem Rev
2014
;
114
(
23
):
11828
11862
. .
9.
Binnig
G
,
Quate
CF
,
Gerber
C
.
Atomic force microscope
.
Phys Rev Lett
1986
;
56
(
9
):
930
. .
10.
Liu
HZ
,
Wu
S
,
Zhang
JM
, et al 
Strategies for the AFM-based manipulation of silver nanowires on a flat surface
.
Nanotechnology
2017
;
28
(
36
):
365301
. .
11.
Wu
S
,
Bai
H
,
Jin
F
.
Automated manipulation of flexible nanowires with an atomic force microscope
.
2017 IEEE International Conference on Manipulation, Manufacturing and Measurement on the Nanoscale (3M-NANO)
.
IEEE
.
2017
. .
12.
Wilson
RS
,
Yang
L
,
Dun
A
, et al 
Automated single particle detection and tracking for large microscopy datasets
.
R Soc Open Sci
2016
;
3
(
5
):
160225
. .
13.
Nicholson
WV
,
Glaeser
RM
.
Review: automatic particle detection in electron microscopy
.
J Struct Biol
2001
;
133
(
2-3
):
90
101
. .
14.
Jak
MJJ
,
Konstapel
C
,
van Kreuningen
A
, et al 
Automated detection of particles, clusters and islands in scanning probe microscopy images
.
Surf Sci
2001
;
494
(
2
):
43
52
. .
15.
LeCun
Y
,
Bengio
Y
,
Hinton
G
.
Deep learning
.
Nature
2015
;
521
(
7553
):
436
444
. .
16.
Zhao
ZQ
,
Zheng
P
,
Xu
ST
, et al 
Object detection with deep learning: a review
.
IEEE Trans Neural Network Learn Syst
2019
;
30
(
11
):
3212
3232
. .
17.
Garcia-Garcia
A
,
Orts-Escolano
S
,
Oprea
AO
, et al 
A review on deep learning techniques applied to semantic segmentation
. arXiv:1704.06857 (
2017
).
18.
Liu
Z
,
Zhu
JK
,
Bu
JJ
, et al 
A survey of human pose estimation: the body parts parsing based methods
.
J Vis Commun Image Represent
2015
;
32
:
10
19
. .
19.
Litjens
G
,
Kooi
T
,
Bejnordi
BE
, et al 
A survey on deep learning in medical image analysis
.
Med Image Anal
2017
;
42
:
60
88
. .
20.
Falk
T
,
Mai
D
,
Bensch
R
, et al 
U-net: deep learning for cell counting, detection, and morphometry
.
Nat Methods
2019
;
16
(
1
):
67
70
. .
21.
Masubuchi
S
,
Watanabe
E
,
Seo
Y
, et al 
Deep-learning-based image segmentation integrated with optical microscopy for automatically searching for two-dimensional materials
.
npj 2D Mater Appl
2020
;
4
:
1
9
. .
22.
DeCost
BL
,
Lei
B
,
Francis
T
, et al 
High throughput quantitative metallography for complex microstructures using deep learning: a case study in ultrahigh carbon steel
.
Microsc Microanal
2019
;
25
(
1
):
21
29
. .
23.
Redmon
J
,
Divvala
S
,
Girshick
R
, et al 
You only look once: unified, real-time object detection
.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
. arXiv:1506.02640 (
2016
).
24.
Redmon
J
,
Ali
F
.
YOLO9000: better, faster, stronger
.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
.
2017
. .
25.
Redmon
J
,
Ali
F
.
Yolov3: an incremental improvement
. arXiv:1804.02767 (
2018
).
26.
Long
J
,
Shelhamer
E
,
Darrell
T
.
Fully convolutional networks for semantic segmentation
.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
.
2015
. .
27.
Darknet. Open source neural networks in C, http://pjreddie.com/darknet/.
28.
He
K
,
Zhang
X
,
Ren
S
, et al 
Deep residual learning for image recognition
.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
.
2016
. .
29.
Scherer
D
,
Müller
A
,
Behnke
S
.
Evaluation of pooling operations in convolutional architectures for object recognition
. In:
Artificial Neural Networks – ICANN 2010
.
Berlin, Heidelberg
:
Springer Berlin Heidelberg
.
2010
. pp.
92
101
.
30.
Na
S
,
Liu
X
,
Guan
Y
.
Research on k-means clustering algorithm: an improved k-means clustering algorithm
.
2010 Third International Symposium on Intelligent Information Technology and Security Informatics
.
IEEE
.
2010
. .
31.
Everingham
M
,
van Gool
L
,
Williams
CKI
, et al 
The pascal visual object classes (VOC) challenge
.
Int J Comput Vis
2010
;
88
(
2
):
303
338
. .
32.
Lin
TY
,
Maire
M
,
Belongie
S
, et al 
Microsoft COCO: common objects in context
. In:
Computer Vision – ECCV 2014
.
Cham
:
Springer International Publishing
.
2014
. pp.
740
755
.
33.
Hara
K
,
Saito
T
,
Shouno
H
.
Analysis of function of rectified linear unit used in deep learning
.
2015 International Joint Conference on Neural Networks (IJCNN)
.
IEEE
.
2015
. .
34.
Ioffe
S
,
Christian
S.
Batch normalization: accelerating deep network training by reducing internal covariate shift
. arXiv:1502.03167 (
2015
).
35.
Paszke
A
,
Gross
F
,
Massa
F
, et al 
PyTorch: an imperative style, high-performance deep learning library
. arXiv:1912.01703 (
2019
).
36.
Simonyan
K
,
Zisserman
A
.
Very deep convolutional networks for large-scale image recognition
. arXiv:1409.1556 (
2014
).
37.
Shi
W
,
Caballero
J
,
Theis
L
, et al 
Is the deconvolution layer the same as a convolutional layer?
arXiv:1609.07009 (
2016
).
38.
Perez
L
,
Wang
J
.
The effectiveness of data augmentation in image classification using deep learning
. arXiv:1712.04621 (
2017
).
39.
Kingma
DP
,
Ba
J
.
Adam: a method for stochastic optimization
. arXiv:1412.6980 (
2014
).
40.
Zhang
Z
,
Sabuncu
MR
.
Generalized cross entropy loss for training deep neural networks with noisy labels
. arXiv:1805.07836 (
2018
).
41.
Bottou
L
.
Large-scale machine learning with stochastic gradient descent
.
Proceedings of COMPSTAT’2010
.
Heidelberg
:
Physica-Verlag HD
.
2010
. pp.
177
186
.
42.
Ding
LJ
,
Goshtasby
A
.
On the Canny edge detector
.
Pattern Recognit
2001
;
34
(
3
):
721
725
. .
43.
Linder-Norén
E
. PyTorch-YOLOv3, https://github.com/eriklindernoren/PyTorch-YOLOv3; 2019 (Accessed 05/2019).

Huitian Bai was born in Shenyang, China in 1996. He received a bachelor’s degree in Precision Instrument Engineering from Tianjin University, China in 2018. From January to May 2018, he studied as a visiting student at TU Ilmenau, Germany. He is currently working toward a master’s degree in the same university. His research focuses on machine learning and its application to nano-manipulation.

Sen Wu was born in Wuhan, China in 1982. He received his B.E. and Ph.D. from the School of Precision Instruments and Optoelectronics Engineering at Tianjin University in 2004 and 2012, respectively. From February 2008 to February 2009, he studied at TU Ilmenau, Germany as a visiting student. From 2012 to 2016, he worked as an assistant professor in the same school of Tianjin University, and in 2016, he was promoted to associate professor. Dr. Wu’s research interests include atomic force microscopy and auto-nanomanipulation technology. He won the Best Conference Paper award of IEEE 3M-NANO in 2017.