Video microscopy has a long history of providing insight and breakthroughs for a broad range of disciplines, from physics to biology. Image analysis to extract quantitative information from video microscopy data has traditionally relied on algorithmic approaches, which are often difficult to implement, time-consuming, and computationally expensive. Recently, alternative data-driven approaches using deep learning have greatly improved quantitative digital microscopy, potentially offering automatized, accurate, and fast image analysis. However, the combination of deep learning and video microscopy remains underutilized primarily due to the steep learning curve involved in developing custom deep-learning solutions. To overcome this issue, we introduce software, DeepTrack 2.0, to design, train, and validate deep-learning solutions for digital microscopy. We use this software to exemplify how deep learning can be employed for a broad range of applications, from particle localization, tracking, and characterization, to cell counting and classification. Thanks to its user-friendly graphical interface, DeepTrack 2.0 can be easily customized for user-specific applications, and thanks to its open-source, object-oriented programing, it can be easily expanded to add features and functionalities, potentially introducing deep-learning-enhanced video microscopy to a far wider audience.

During the last century, the quantitative analysis of microscopy images has provided important insights for various disciplines, ranging from physics to biology. An early example is the pioneering experiment performed by Perrin in 1910 that demonstrated beyond any doubt the physical existence of atoms.1 Perrin manually tracked the positions of microscopic colloidal particles in a solution by projecting their image on a sheet of paper [Fig. 1(a)] and, despite a time resolution of just 30 s, he managed to quantify their Brownian motion and connect it to the atomic nature of matter. In the following decades, several scientists followed in Perrin's footsteps, improving the time resolution of the experiment down to just seconds2,8 [Fig. 1(b)]. Despite these improvements, manual tracking of particles intrinsically limits the time resolution of conceivable experiments.

FIG. 1.

Brief history of quantitative microscopy and particle tracking. (a)–(b) 1910–1950: The manual analysis era. (a) Examples of manually tracked trajectories of colloids in a suspension from Perrin's experiment that convinced the world of the existence of atoms.1 The time resolution is 30 s. (b) Kappler manually tracked the rotational Brownian motion of a suspended micromirror to determine the Avogadro number.2 (c)–(e) 1951–2015: The digital microscopy era. (c) Causley and Young developed a computerized microscope to count particles and cells using a flying-spot microscope and an analog analysis circuit.3 (d) Geerts et al. developed an automatized method to track single gold nanoparticles on the membranes of living cells.4 (e) Crocker and Grier kickstarted modern particle tracking, achieving high accuracy using a largely setup-agnostic approach.5 (f)–(i) 2015–2020: The deep-learning-enhanced microscopy era. (f) Ronneberger et al. developed the U-Net, a variation of a convolutional neural network that is particularly suited for image segmentation and has been very successful for biomedical applications.6 (g) Helgadottir et al. developed a software to track particles using convolutional neural networks (DeepTrack 1.0) and demonstrated that it can achieve higher tracking accuracy than traditional algorithmic approaches.7 (j) This article presents DeepTrack 2.0, which provides an integrated environment to design, train and validate deep learning solutions for quantitative digital microscopy. (b) Reprinted with permission from E. Kappler, Ann. Phys. 403, 233–256 (1931). Copyright 1931 John Wiley and Sons.2 (c) Reprinted with permission from Causley and Young, Nature 176, 453–454 (1955). Copyright 1955 Nature Publishing Group.3 (d) Reprinted with permission from Geerts et al., Biophys. J. 52, 775–782 (1987). Copyright 1987 Elsevier.24 (e) Reprinted with permission from Crocker and Grier, J. Colloid Interface Sci. 179, 298–310 (1996). Copyright 1996 Elsevier.5 (f) Reprinted with permission from Ronneberger et al., Int. Conf. Med. Image Comput. Comput. Assist. Interv. 234–241 (2015). Copyright 2015 Nature Springer.6 

FIG. 1.

Brief history of quantitative microscopy and particle tracking. (a)–(b) 1910–1950: The manual analysis era. (a) Examples of manually tracked trajectories of colloids in a suspension from Perrin's experiment that convinced the world of the existence of atoms.1 The time resolution is 30 s. (b) Kappler manually tracked the rotational Brownian motion of a suspended micromirror to determine the Avogadro number.2 (c)–(e) 1951–2015: The digital microscopy era. (c) Causley and Young developed a computerized microscope to count particles and cells using a flying-spot microscope and an analog analysis circuit.3 (d) Geerts et al. developed an automatized method to track single gold nanoparticles on the membranes of living cells.4 (e) Crocker and Grier kickstarted modern particle tracking, achieving high accuracy using a largely setup-agnostic approach.5 (f)–(i) 2015–2020: The deep-learning-enhanced microscopy era. (f) Ronneberger et al. developed the U-Net, a variation of a convolutional neural network that is particularly suited for image segmentation and has been very successful for biomedical applications.6 (g) Helgadottir et al. developed a software to track particles using convolutional neural networks (DeepTrack 1.0) and demonstrated that it can achieve higher tracking accuracy than traditional algorithmic approaches.7 (j) This article presents DeepTrack 2.0, which provides an integrated environment to design, train and validate deep learning solutions for quantitative digital microscopy. (b) Reprinted with permission from E. Kappler, Ann. Phys. 403, 233–256 (1931). Copyright 1931 John Wiley and Sons.2 (c) Reprinted with permission from Causley and Young, Nature 176, 453–454 (1955). Copyright 1955 Nature Publishing Group.3 (d) Reprinted with permission from Geerts et al., Biophys. J. 52, 775–782 (1987). Copyright 1987 Elsevier.24 (e) Reprinted with permission from Crocker and Grier, J. Colloid Interface Sci. 179, 298–310 (1996). Copyright 1996 Elsevier.5 (f) Reprinted with permission from Ronneberger et al., Int. Conf. Med. Image Comput. Comput. Assist. Interv. 234–241 (2015). Copyright 2015 Nature Springer.6 

Close modal

In the 1950s, analog electronics provided some tools to increase acquisition and analysis speed. According to Preston,9 the history of digital microscopy began in Britain in 1951 with an unlikely actor: the British National Coal Committee convened to investigate “the possibility of making a machine to replace the human observer” to measure coal dust in mining operations.10 In 1955, Causley and Young developed a flying-spot microscope to count and size particles and cells.3 The flying-spot microscope used a cathode-ray tube to scan a sample pixel by pixel, while the cells were counted and sized by a simple analog integrated circuit [Fig. 1(c)]. This device allowed over an order of magnitude faster counting than human operators while maintaining the same accuracy.

During the 1950s and in earnest in the 1960s, researchers started employing digital computers to add speed and functionalities to microscopic image analysis, with a growing focus on biomedical applications. In 1965, Prewitt and Mendelsohn managed to distinguish cells in a blood smear by using a computer to analyze images obtained with a flying-spot microscope and recorded as 8-bit values on a magnetic tape.11 In the following years, digital microscopy went from research laboratories to clinical settings with the development of the computerized tomography scanner (CT scanner) in 197212 and the automated flow cytometer in 1974.13 

In soft matter physics, despite the early success of Perrin's experiment,1 most studies focused on the ensemble behavior of colloidal particles employing methods such as selective photobleaching and image correlation.14–16 These methods can resolve fast dynamics, but they can only measure the average behavior of a homogeneous colloidal solution.16 To overcome these limitations, Geerts et al. automated particle tracking in 1987, developing what is now known as single-particle tracking, and used it to track individual gold nanoparticles on the surface of living cells from images acquired with a differential interference contrast (DIC) microscope4 [Fig. 1(d)]. In the decades that followed, researchers have also used fluorescent molecules17–19 and quantum dots20,21 as tracers within biological systems.

It quickly became evident that highly accurate tracking algorithms were needed to analyze the collected data. In 1996, Crocker and Grier proposed an algorithm to determine particle positions based on the measurement of the centroids of their images5 [Fig. 1(e)]. The main advantage of this algorithm is that it is largely setup-agnostic, i.e., it does not depend on the specific properties of the imaging system or the particle. Other setup-agnostic approaches have been proposed in more recent years, e.g., the Fourier transform of the particle image,22 or its radial symmetry.23 Other algorithms, instead, made a model of the image based on the properties of the imaging system and of the particle.16,24–29 These alternative methods were less general and often more computationally expensive, but they often achieved higher accuracy and could also provide quantitative information about the particle, such as its size30 or its out-of-plane position.31–33 Despite the large number of methods being introduced, digital video microscopy remained a hard problem, requiring the development of ad hoc algorithms tuned to the needs of each experiment. In fact, a 2014 comparison of 14 tracking methods found that, when compared on several different simulated scenarios, no single algorithm performed best in all scenarios.34 

Only in the last few years has machine learning started to be employed for the analysis of images obtained from digital microscopy. This comes in the wake of the deep-learning revolution,35 thanks to which computer-vision task such as image recognition,36 semantic segmentation,37 and image generation,38 are now automatized with relative ease. Recent results have demonstrated the potential of applying deep learning to microscopy, vastly improving techniques for particle tracking,7,39,40 cell segmentation and classification,6,41–44 particle characterization,39,45,46 object counting,47 depth-of-field extension,48 and image resolution.49,50 In 2015, Ronneberger et al. developed a special kind of neural network (U-Net) for the segmentation of cell-images6 [Fig. 1(f)], that is now widely used for the segmentation of biomedical images. In particle tracking, Hannel et al. employed deep learning to track and measure colloids from their holographic images,39 Newby et al. demonstrated how deep learning can be used for the simultaneous tracking of multiple particles,40 and Helgadottir et al. achieved tracking accuracy surpassing standard methods7 [Fig. 1(g)]. These early successes clearly demonstrate the potential of deep learning to analyze microscopy data. However, they also point to a key limiting factor for the development and deployment of deep-learning solutions to microscopy: the availability of high-quality training data. In fact, training data often need to be experimentally acquired specifically for each application and manually annotated by experts, especially for biomedical applications, which is expensive, time-consuming, and potentially process biased.51 

In this article, we provide a brief review of the applications of deep learning to digital microscopy and introduce comprehensive software [DeepTrack 2.0, Fig. 1(h)] to design, train, and validate deep-learning solutions for quantitative digital microscopy. In Sec. II, we review the main applications of deep learning to microscopy and the most frequently employed neural-network architectures. In Sec. III, we introduce DeepTrack 2.0, which greatly expands the functionalities of the particle-tracking software DeepTrack 1.0,7 and features a user-friendly graphical interface and a modular (object-oriented) architecture that can be easily expanded and customized for specific applications. Finally, in Sec. IV, we demonstrate the versatility and power of deep learning and DeepTrack 2.0 by using it to tackle a variety of physical and biological quantitative digital microscopy challenges, from particle localization, tracking, and characterization to cell counting and classification.

In this section, we will start by providing an overview of machine learning and deep learning, in particular, introducing and comparing the deep-learning models that are most commonly used in microscopy: fully connected neural networks, convolutional neural networks, convolutional encoder-decoders, U-Nets, and generative adversarial networks. Subsequently, we will review some key applications of deep learning in microscopy, focusing on three key areas: image segmentation, image enhancement, and particle tracking.

Image segmentation partitions an image into multiple segments each corresponding to a specific object (e.g., separating cells from the background or classifying cells of different kinds). In this context, deep learning has been very successful, especially in the segmentation of biological and biomedical images. However, one limiting factor is the need for high-quality training datasets, which usually need to be acquired from experiments and manually annotated by experts (e.g., by manually drawing the cell edges on the images), which is a time-consuming and tedious task.

Image enhancement includes tasks such as noise reduction, deaberration, refocusing, and super-resolution. Deep learning, in this case, has been widely employed in the last few years, especially in the context of computational microscopy. Differently from image segmentation, image enhancement can often utilize training datasets that are directly acquired from experiments, without the need for manual annotation.

Particle tracking deals with the localization of objects (often microscopic colloidal particles or tracer molecules) in 2D or 3D. Deep-learning-powered solutions are more accurate than traditional approaches, work in extremely difficult environments with poor signal-to-noise ratio, and can extract quantitative information about the particles. Even though it is possible to train particle-tracking deep-learning solutions using experimentally-acquired data with the corresponding ground truth, it is often difficult to determine their ground-truth values with sufficient accuracy. Therefore, it is often convenient to employ simulated training data obtained from physical simulations of the required image where the ground truth is known.

In contrast to standard computer algorithms where the user is required to define explicit rules to process the data, machine-learning algorithms can learn patterns and rules to perform specific tasks directly from a series of data. In supervised learning, machine-learning algorithms learn by adjusting their behavior according to a set of input data and the corresponding desired outputs (the ground truth). These input–output pairs constitute the training dataset, which can be obtained either from experiments or from simulations.

Deep learning is a kind of machine learning built on artificial neural networks (ANN).35 ANNs were originally conceived to emulate the capabilities of the brain, specifically its ability to learn.52–54 They are constituted by interconnected artificial neurons (simple computing units often just returning a non-linear function of their inputs). Often, these artificial neurons are organized in layers (typically with tens or hundreds of artificial neurons). In the most commonly employed architectures, each layer receives the output of the previous layer, computes some transformation, and feeds the result into the next layer. In many machine vision applications, the number of layers is in the tens (this number is the “depth” of the ANN; hence the term “deep learning”).

The weights of the connections between artificial neurons and layers are the parameters that are adjusted in the training process. The training can be broken down into the following steps (referred to as the error backpropagation algorithm):55 First, the ANN receives an input and calculates a predicted output based on its current weights. Second, the output is compared to the true, desired output, and the error is measured using a loss function. Third, the ANN propagates this error backward, calculating for each weight whether it should be increased or decreased in order to reduce the error (and a local estimate of the rate of change of the error depending on that weight). Finally, the weights are updated using an optimizer, which determines how much each weight should be changed. By feeding the network additional training data, it typically improves its performance, gradually converging to some optimum weight configuration.

In microscopy applications, the most commonly employed ANN architectures are dense neural networks, convolutional neural networks, convolutional encoder-decoders, U-Nets, and generative adversarial networks (Table I).

TABLE I.

A comparison of common deep-learning architectures. Advantages and disadvantages of deep-learning architectures commonly employed for microscopy, i.e., the dense neural network (DNN), the convolutional neural network (CNN), the convolutional encoder-decoder (CED), the U-Net, and the generative adversarial network (GAN). For each model, we also show a miniature example of the architecture, where gray lines with orange circles represent dense layers, blue rectangles represent convolutional layers, red rectangles represent pooling layers, and magenta rectangles represent deconvolutional layers. The arrows depict the forward concatenation steps.

ArchitectureAdvantagesDisadvantages
 • Can use all available information.• Can represent any transformation between the input and output.• Input and output can easily have any dimensions. • The number of weights increases quickly with the number of layers and the dimension of the input. 
• The input and output dimensions must be known in advance. 
 
 
 • Can be constructed with a limited number of weights. • Cannot access global information. 
• Highly effective at extracting local information from images. • Can be computationally expensive. 
• Analysis is position independent. • Difficult to retain an exact output shape. 
 • Only the number of features needs to be known in advance. • Positional information is lost during downsampling. 
• Returns an image in output, which can be more interpretable by humans. 
• Can be trained as an auto-encoder without annotated data. 
• Can be hard to annotate data. 
 • Retains positional information. • Can quickly grow large. 
• Only the number of features needs to be known in advance. 
• Returns an image in output, which can be more interpretable by humans. 
• Forward concatenation layers disallow use as an auto-encoder. 
 • Can create very realistic images. • Very hard to train. 
• Encourages high-frequency predictions. • The outputs are designed to look correct, not to be correct. 
• Can be trained without annotated data. • Output quality very sensitive to the details of the architecture. 
ArchitectureAdvantagesDisadvantages
 • Can use all available information.• Can represent any transformation between the input and output.• Input and output can easily have any dimensions. • The number of weights increases quickly with the number of layers and the dimension of the input. 
• The input and output dimensions must be known in advance. 
 
 
 • Can be constructed with a limited number of weights. • Cannot access global information. 
• Highly effective at extracting local information from images. • Can be computationally expensive. 
• Analysis is position independent. • Difficult to retain an exact output shape. 
 • Only the number of features needs to be known in advance. • Positional information is lost during downsampling. 
• Returns an image in output, which can be more interpretable by humans. 
• Can be trained as an auto-encoder without annotated data. 
• Can be hard to annotate data. 
 • Retains positional information. • Can quickly grow large. 
• Only the number of features needs to be known in advance. 
• Returns an image in output, which can be more interpretable by humans. 
• Forward concatenation layers disallow use as an auto-encoder. 
 • Can create very realistic images. • Very hard to train. 
• Encourages high-frequency predictions. • The outputs are designed to look correct, not to be correct. 
• Can be trained without annotated data. • Output quality very sensitive to the details of the architecture. 

The workhorse of ANNs is dense neural networks (DNNs), which consist of a series of layers fully connected in sequence. While sufficiently large DNNs can approximate any function,56 the number of weights required quickly grows to unmanageable levels, especially for large inputs such as images. Furthermore, they present a rigid structure, where the dimensions of both the input and output are fixed. Therefore, when analyzing images, they are rarely used on their own, while they are often employed as the final steps of some other network to generate the final output from already pre-processed data.

In contrast, convolutional neural networks (CNNs) are particularly useful to analyze images. They are primarily built upon convolutional layers. In each convolutional layer, a series of 2D filters (k × k matrices, with k being the filter size) are convolved with the input image, producing a series of feature maps as output. The size of the filters with respect to the input image determines the features that can be detected in each layer. To gradually detect larger features, the feature maps are downsampled after each convolutional layer. The downsampled feature maps are then fed as input to the next network layer. There is often a dense top after the last convolutional layer, i.e., a relatively small DNN that integrates the information contained in the output feature maps of the last layer to determine the sought-after result.

Convolutional encoder-decoders are convolutional neural networks constituted by two paths. First, there is the encoder path, which reduces the dimensionality of the input through a series of convolutional and downsampling layers, therefore encoding the information about the original image. Then, there is the decoder path, which uses the encoded information to reconstruct either the original image or some transformed version of it (e.g., in segmentation tasks). Therefore, when trained to reconstruct the input image at the output, the information at the end of the encoder path can serve as a compressed version of the input image. When trained to reconstruct a transformed version of the input image, the encoded information can serve as a powerful representation of the input image useful for the specific task at hand.

U-Nets are an especially useful evolution of convolutional encoder-decoders. In addition to the encoder and decoder convolutional paths, U-Nets also feature forward concatenation steps between corresponding levels of these two paths. This permits them to preserve positional information lost when the image resolution is reduced. They have been particularly successful in analyzing and segmenting biological and biomedical images.44,57–59

Different from the previous case, generative adversarial networks (GANs) combine two networks, a generator and a discriminator, regardless of their specific architectures.60 The generator manufactures data, usually images, from some input. The discriminator, in turn, classifies its input as either real data or synthetic data created by the generator. The term adversarial refers to the fact that these two networks compete against each other: the generator tries to fool the discriminator with manufactured data, while the discriminator tries to expose the generator. The generator can be trained to either transform images by feeding it with a real image as input, or to makeup images by feeding it with a random input. The generator is typically either a convolutional encoder-decoder or a U-Net, while the discriminator is often a convolutional neural network. While GANs are a breakthrough for data generation and offer many benefits, they are difficult to train and highly sensitive to hyperparameter tuning; and slight changes in their overall architecture can lead to vanishing gradients, lack of convergence, and uncorrelated generator loss and image quality.61,62

Deep learning has been extremely successful at segmentation tasks, especially for biomedical applications,44,64–73 but also in materials science.74,75 Image segmentation is typically used to locate objects and boundaries in images. More precisely, image segmentation assigns a label to every pixel in an image such that pixels with the same label share certain characteristics (e.g., represent objects of the same type).

Generally, deep-learning models performing image segmentation are trained using experimental images that need to be manually annotated by experts. In some cases, to alleviate the need for annotated images, pre-trained neural networks are employed (i.e., neural networks that have been trained for classification tasks on a large dataset of different images, often not directly related to the task at hand) and fine-tuned using a relatively small set of manually annotated data.44,64

If the exact topography of the sample is not needed, one can downsample the image using several convolutional layers, obtaining a coarse classification of its various regions. For example, this approach was used by Coudray et al.42 to distinguish cancerous lung cells from normal tissue by fine-tuning a pre-trained neural network for image analysis and object detection (Inception v363) [Fig. 2(a)].

FIG. 2.

Image segmentation with deep learning. (a) Lung cell classification and mutation prediction,42 using Inception v3:63 Non-overlapping tiles in the input are analyzed, returning a low-resolution segmentation mask, containing either just a binary classification as tumor or healthy, or a complete prediction of the mutation type. (b) The U-Net architecture as originally proposed by Ronneberger6 differs from the convolutional encoder-decoder (CED), by the addition of forward concatenation steps between the encoder and decoder parts, which allow the network to forward positional information lost during encoding. The network is used to segment nearly overlapping cells. (c) A cell segmentation software, based on a model closely resembling the U-Net,64 can be automatically retrained by feeding it additional fluorescence images. (a) Reprinted with permission from Coudray et al., Nat. Med. 24, 1559–1567 (2018). Copyright 2018 Nature Springer.42 (b) Reprinted with permission from Ronneberger et al., Int. Conf. Med. Image Comput. Comput. Assist. Interv. 234–241 (2015). Copyright 2015 Nature Springer.6 (c) Reprinted with permission from Sadanandan et al., Sci. Rep. 7, 7860 (2017). Copyright 2017 Author(s), licensed under a Creative Commons Attribution (CC-BY) license.64 

FIG. 2.

Image segmentation with deep learning. (a) Lung cell classification and mutation prediction,42 using Inception v3:63 Non-overlapping tiles in the input are analyzed, returning a low-resolution segmentation mask, containing either just a binary classification as tumor or healthy, or a complete prediction of the mutation type. (b) The U-Net architecture as originally proposed by Ronneberger6 differs from the convolutional encoder-decoder (CED), by the addition of forward concatenation steps between the encoder and decoder parts, which allow the network to forward positional information lost during encoding. The network is used to segment nearly overlapping cells. (c) A cell segmentation software, based on a model closely resembling the U-Net,64 can be automatically retrained by feeding it additional fluorescence images. (a) Reprinted with permission from Coudray et al., Nat. Med. 24, 1559–1567 (2018). Copyright 2018 Nature Springer.42 (b) Reprinted with permission from Ronneberger et al., Int. Conf. Med. Image Comput. Comput. Assist. Interv. 234–241 (2015). Copyright 2015 Nature Springer.6 (c) Reprinted with permission from Sadanandan et al., Sci. Rep. 7, 7860 (2017). Copyright 2017 Author(s), licensed under a Creative Commons Attribution (CC-BY) license.64 

Close modal

Much more frequently, image segmentation uses convolutional encoder-decoders and U-Nets.44,57–59 In fact, U-Nets were originally developed for cell segmentation, where one of the key requirements is to clearly mark the cell edges, such that neighboring cells can be distinguished6 [Fig. 2(b)].

High-quality image segmentation annotations are time-consuming to obtain, which is why many researchers opt to design networks that can be retrained for a specific task using a much smaller dataset. For example, Sadanandan et al. developed a neural network that can automatically be retrained using fluorescently labeled cells64 [Fig. 2(c)]. With such an approach, the neural network can easily be adapted to different experimental setups, even though the process requires some experimental effort in acquiring the additional training data.

Segmentation has also been used for three-dimensional images. For example, Li et al. used a three-dimensional convolutional neural network to reconstruct the interconnections between biological neurons.76 Similar approaches have also been employed for the volumetric reconstruction of organs.77,78

Deep learning has been widely employed for image enhancement. This is particularly interesting because it permits to analyze images in ways that would be extremely difficult or impossible to do with microscopy because of intrinsic physical limitations.

Deep learning has been employed to achieve super-resolution by using diffraction-limited images to reconstruct images beyond the diffraction limit. For example, Ouyang et al. trained a GAN to imitate the output of the standard super-resolution method photoactivated localization microscopy (PALM),79 significantly improving the resolution of fluorescence images50 [Fig. 3(a)].

FIG. 3.

Image enhancement with deep learning. (a) Fluorescence super-resolution localization microscopy using deep learning:50 It uses sparse PALM79 [optionally together with a widefield (WF) image], to construct a super-resolved image. We see how the quality of the produced image increases with the acquisition time. (b) A GAN is used to transform holography images to brightfield.80 From top to bottom, we see holographic images of pollen, backpropagated to the focal plane, and finally transformed into bright-field images. The bottom set of images are the real bright-field images for comparison. (c) A GAN is used to convert quantitative phase images (in-line holography) into virtual tissue stainings, mimicking histologically stained bright-field images.81 (a) Reprinted with permission from Ouyang et al., Nat. Biotechnol. 36, 460–468 (2018). Copyright 2018 Nature Springer.50 (b) Reprinted with permission from Wu et al., Light Sci. Appl. 8, 25 (2019). Copyright 2019 Author(s), licensed under a Creative Commons Attribution (CC-BY) license.80 (c) Reprinted with permission from Rivenson et al., Light Sci. Appl. 8, 2047 (2019). Copyright 2019 Author(s), licensed under a Creative Commons Attribution (CC-BY) license.81 

FIG. 3.

Image enhancement with deep learning. (a) Fluorescence super-resolution localization microscopy using deep learning:50 It uses sparse PALM79 [optionally together with a widefield (WF) image], to construct a super-resolved image. We see how the quality of the produced image increases with the acquisition time. (b) A GAN is used to transform holography images to brightfield.80 From top to bottom, we see holographic images of pollen, backpropagated to the focal plane, and finally transformed into bright-field images. The bottom set of images are the real bright-field images for comparison. (c) A GAN is used to convert quantitative phase images (in-line holography) into virtual tissue stainings, mimicking histologically stained bright-field images.81 (a) Reprinted with permission from Ouyang et al., Nat. Biotechnol. 36, 460–468 (2018). Copyright 2018 Nature Springer.50 (b) Reprinted with permission from Wu et al., Light Sci. Appl. 8, 25 (2019). Copyright 2019 Author(s), licensed under a Creative Commons Attribution (CC-BY) license.80 (c) Reprinted with permission from Rivenson et al., Light Sci. Appl. 8, 2047 (2019). Copyright 2019 Author(s), licensed under a Creative Commons Attribution (CC-BY) license.81 

Close modal

An interesting application of deep learning is to realize cross-modality analysis, where a neural network learns how to translate the output of a certain optical device to that of another. For example, Wu et al. used a U-Net to translate between holography and bright-field microscopy, enabling volumetric imaging without the speckle and artifacts associated with holography80 [Fig. 3(b)]. This method uses experimental pairs of images collected simultaneously by two different optical devices.

Further, deep learning can be used to generate images that cannot be obtained directly from a particular optical device or sample. For example, Rivenson et al. used phase information obtained from holography to create a virtually-stained sample corresponding to a histologically-stained bright-field image81 [Fig. 3(c)].

Image-enhancement techniques typically train networks using experimental images that do not need any manual annotation. Either the target is calculated using known methods, or it is collected simultaneously using an alternate path for the light. While this reduces the amount of manual labor required, both approaches have their drawbacks. A network trained to imitate a traditional method is unlikely to improve upon it on primary metrics. Instead, it can improve by allowing less ideal inputs or by decreasing the execution time. On the other hand, using a dual microscope will lead to networks specialized for the optical devices used to acquire the training images. Moreover, such a dual-purpose microscope is usually nonstandard, requiring the user to alter and customize their setup.

Single-particle tracking has become a crucial tool for probing the microscopic world. Standard approaches are typically limited by the complexity of the system, and higher particle densities, higher levels of noise, and more complex point-spread functions often lead to worse results. Development using deep learning has shown that it is possible to largely overcome these limitations. A big advantage of deep-learning solutions for particle tracking is that simulated data can frequently be used to train the networks.

Newby et al. demonstrated that deep learning can be used for the detection of particles in high-density, low-signal-to-noise-ratio images.40 Their method uses a small CNN to construct a pixel-by-pixel classification probability map of background vs particle [Fig. 4(a)]. Standard algorithms can then be applied to this probability map to track the particles.

FIG. 4.

Particle tracking with deep learning. (a) Particle detection in dense images of varying diffraction patterns,40 where a relatively small network of three convolutional layers estimates a pixel-by-pixel probability map of background vs particle. (b) High-accuracy, single-particle localization using a CNN with a dense top.7 The network is scanned across the image to detect and localize all particles (dots) and bacteria (circles) in the image. (c) Particle tracking and characterization in terms of radius and refractive index using in-line holography images,46 where bounding boxes for each particle in the field of view are extracted and fed to a CNN. They showcase accurate measurements on data for particles between 0.5 and 1.5 μm. (d) Particle characterization in terms of radius and refractive index using off-axis holography images.45 A CNN with latent space temporal averaging is used to measure multiple observations of a single particle to improve accuracy. This allows characterization of particles down to around 0.2 μm. (a) Reprinted with permission from Newby et al., Proc. Natl. Acad. Sci. U. S. A. 115, 9026–9031(2018). (c) Reprinted with permission from Altman and Grier, J. Phys. Chem. B 124, 1602–1610 (2020). Copyright 2020 American Chemical Society.46 (d) Reprinted with permission from Midtvedt et al., ACS Nano (2021). Copyright 2021 Authors, licensed under a Creative Commons Attribution (CC-BY) license.45 

FIG. 4.

Particle tracking with deep learning. (a) Particle detection in dense images of varying diffraction patterns,40 where a relatively small network of three convolutional layers estimates a pixel-by-pixel probability map of background vs particle. (b) High-accuracy, single-particle localization using a CNN with a dense top.7 The network is scanned across the image to detect and localize all particles (dots) and bacteria (circles) in the image. (c) Particle tracking and characterization in terms of radius and refractive index using in-line holography images,46 where bounding boxes for each particle in the field of view are extracted and fed to a CNN. They showcase accurate measurements on data for particles between 0.5 and 1.5 μm. (d) Particle characterization in terms of radius and refractive index using off-axis holography images.45 A CNN with latent space temporal averaging is used to measure multiple observations of a single particle to improve accuracy. This allows characterization of particles down to around 0.2 μm. (a) Reprinted with permission from Newby et al., Proc. Natl. Acad. Sci. U. S. A. 115, 9026–9031(2018). (c) Reprinted with permission from Altman and Grier, J. Phys. Chem. B 124, 1602–1610 (2020). Copyright 2020 American Chemical Society.46 (d) Reprinted with permission from Midtvedt et al., ACS Nano (2021). Copyright 2021 Authors, licensed under a Creative Commons Attribution (CC-BY) license.45 

Close modal

Helgadottir et al. achieved a tracking accuracy surpassing that of traditional methods using a convolutional neural network with a dense top to detect particle centroids in challenging conditions7 [Fig. 4(b)].

Along with particle localization, deep learning has also been used to measure other characteristics of particles. For example, Altman and Grier used a convolutional neural network to measure the radius and refractive index of images of colloids acquired by an in-line holographic microscope46 [Fig. 4(c)]. Midtvedt et al. used an off-axis microscope and a time-average convolutional neural network to measure the radius and refractive index of even smaller particles45 [Fig. 4(d)].

Moreover, deep learning has been used for micro-tubule tracking,82 3D tracking of fluorescence images,83,84 intra-cellular particle detection,85,86 nanoparticle sizing in transmission electron microscopy (TEM),87 frame-to-frame linking,88 and single-particle anomalous diffusion characterization.89–91 

In this section, we introduce DeepTrack 2.0, which is an integrated software environment to design, train, and validate deep-learning solutions for digital microscopy.92 DeepTrack 2.0 builds on the particle-tracking software package DeepTrack, which we introduced in 2019,7 and greatly expands it beyond particle tracking toward a whole new range of quantitative microscopy applications, such as classification, segmentation, and cell counting.

To accommodate users with any level of experience in programing and deep learning, we provide access to the software through several channels, from a high-level graphical user interface that can be used without any programing knowledge, to scripts that can be adapted for specific applications and even a low-level set of abstract classes to implement new functionalities. Furthermore, we provide various tutorials to use the software at each level of complexity, including several video tutorials to guide the user through each step of a deep-learning analysis for microscopy, including defining the training image generation routine, deciding the neural network model, training and validating the network, and applying the trained network to real data.

As the main entry point, we provide a completely stand-alone graphical user interface, which delivers all the power of DeepTrack 2.0 without requiring programing knowledge. This is available for Windows and for macOS.93 In fact, we recommend all users start with the graphical user interface, which provides a visual approach to deep learning and an intuitive feel for how the various software components interact.

As more precise control is desired, we recommend the users peruse the available Jupyter notebooks,92 which provide complete examples of how to write scripts for DeepTrack 2.0.

For most applications, DeepTrack 2.0 already includes all necessary components. However, if more advanced functionalities are required, it is easy to extend DeepTrack 2.0 by building on its framework of abstract objects and its native communication with the popular deep learning package Keras.94 In fact we expect the most advanced users to expand the functionalities of DeepTrack 2.0 according to their needs.

All users are also welcome to report any bugs and to propose additions to DeepTrack 2.0 through its GitHub page.92 

The graphical user interface of DeepTrack 2.0 provides an intuitive way to perform the various steps that are necessary for the realization of a deep-learning analysis for microscopy. Through the graphical user interface, users can define and visualize image generation pipelines [Fig. 5(a)], train models [Figs. 5(b)–5(c)], and analyze experimental data [Fig. 5(d)].

FIG. 5.

DeepTrack 2.0 graphical user interface. (a) The main interface: (1) The image generation pipeline is defined using drag and drop components. (2) An image created using the pipeline and the corresponding label are shown. (3) A comparison image is also shown to help ensure that the generated image is similar to experimental images. (b) The training loss and validation loss over time can be monitored in real time during training. It is also possible to monitor custom metrics, or any metric as a function of some property of the image (e.g., particle size, signal-to-noise ratio, aberration strength). (c) The model prediction on individual images in the validation set can be compared to the corresponding target in real time during training, providing another way to concretely visualize the improvement of the model performance over time. (d) Finally, the model can be evaluated on experimental images also during training, which can help quickly hone in on a model that correctly handles specific experimental data.

FIG. 5.

DeepTrack 2.0 graphical user interface. (a) The main interface: (1) The image generation pipeline is defined using drag and drop components. (2) An image created using the pipeline and the corresponding label are shown. (3) A comparison image is also shown to help ensure that the generated image is similar to experimental images. (b) The training loss and validation loss over time can be monitored in real time during training. It is also possible to monitor custom metrics, or any metric as a function of some property of the image (e.g., particle size, signal-to-noise ratio, aberration strength). (c) The model prediction on individual images in the validation set can be compared to the corresponding target in real time during training, providing another way to concretely visualize the improvement of the model performance over time. (d) Finally, the model can be evaluated on experimental images also during training, which can help quickly hone in on a model that correctly handles specific experimental data.

Close modal

The following is a typical workflow:

  1. Define the image generation pipeline, e.g., a pipeline to generate images of a particle corrupted by noise.

  2. Define the ground-truth training target, e.g., the particle image without noise (image target), or the particle position (numeric target).

  3. Define a deep-learning model, e.g., U-Net.

  4. Train and evaluate the deep-learning model.

  5. Apply the deep-learning model to the user's experimental data.

Projects realized with DeepTrack 2.0 can be saved and subsequently loaded, which is useful for archival purposes, as well as to share deep-learning models and results between users and platforms. Furthermore, projects can be automatically exported as Python code, which can then be executed to train a network command-line or imported into an existing project.

At a more advanced level, it is possible to extend the capabilities of the graphical user interface by adding new Python-coded objects. We envision that this possibility will motivate users to create and share additional software compatible with DeepTrack 2.0.

We provide several Jupyter notebooks, both as examples of how to write scripts using DeepTrack 2.0, and as a foundation to create customized solutions. To facilitate this, we provide several video tutorials92 that detail how the solutions are constructed and how they can be modified.

The software architecture of DeepTrack 2.0 (Fig. 6) is built on four main components: features, properties, images, and deep-learning models.

FIG. 6.

DeepTrack 2.0 framework.(a) An example of an image generation pipeline composed of five features. Five distinct ellipses are generated and passed to a fluorescence microscope simulator, which produces an image to which a constant background and Poisson noise are added. (b) The position of the ellipses are imprinted on the image, allowing us to create a ground-truth label, where each particle is represented by a small circle. (c) A generator that can continuously create images and corresponding labels is used to train a deep-learning model. Typically, many thousands of images are created to train the neural network. (d) The trained model is then able to analyze experimental data. (Image shown here is not experimental and only used for demonstration purposes.)

FIG. 6.

DeepTrack 2.0 framework.(a) An example of an image generation pipeline composed of five features. Five distinct ellipses are generated and passed to a fluorescence microscope simulator, which produces an image to which a constant background and Poisson noise are added. (b) The position of the ellipses are imprinted on the image, allowing us to create a ground-truth label, where each particle is represented by a small circle. (c) A generator that can continuously create images and corresponding labels is used to train a deep-learning model. Typically, many thousands of images are created to train the neural network. (d) The trained model is then able to analyze experimental data. (Image shown here is not experimental and only used for demonstration purposes.)

Close modal
  • Features: They are the foundations on which DeepTrack 2.0 is built. They receive a list of images as input, and either apply some transformation to all of them (e.g., adding noise), or add a new image to the list (e.g., adding a scatterer), or merge them into a single image (e.g., imaging a list of scatterers through an optical device). By defining a set of features and how they connect, we produce a single feature that defines the entire image creation process.

  • Properties: Properties are the parameters that determine how features operate. For example, a property can control the position of a particle, the intensity of the added noise, or the amount by which an image is rotated. They can have a constant value or be defined by a function (e.g., to place a particle at a random position), which can depend on the value of other properties, either from the same feature or from other features.

  • Images: Images are the objects on which the features operate. They behave like n-dimensional NumPy arrays (ndarray) and can therefore be directly used with most Python packages. They contain a list of the property values used by the features that created the image, which can be used to generate ground-truth labels for training neural networks, as well as to evaluate how the error in deep-learning models depends on the properties defining the image (e.g., signal-to-noise ratio, background gradient, and illumination wavelength).

  • Model: They are the deep-learning models. A series of standard models is already provided in DeepTrack 2.0, including models for DNNs, CNNs, U-Nets, and GANs. In each case, the parameters of the model (e.g., number of layers and number of artificial neurons) can be defined by the user.

DeepTrack 2.0 solutions depend on the interactions between these objects. In general, there are three distinct typical operations a feature can perform. The first operation is to add an image to a list of images (notably Scatterers),

[I1,[P1]…,In][Pn]F(P)[I1,[P1]…,In,[Pn]I][P].

Here, a list of n images are fed to the feature F. Each of these images has a list of properties Pi, which describe the process used to create that image. The feature is controlled by some properties P, and returns a new list of images. The first n images are unchanged, but a new image I is appended to the end, on which the properties P are imprinted.

The second operation is to transform all images in the list in some way (the standard behavior of features, including noise, augmentations, and most mathematical operations),

[I1,[P1]…,In][Pn]F(P)[I1,[P1,P]…,In][Pn,P].

Here, the feature returns a list of the same length, but each image is altered (e.g., some noise is added or it is rotated). The properties characterizing this alteration P are imprinted on all images.

The third operation is to merge several images into a single image

[I1,[P1]…,In][Pn]F(P)[I][P1,,Pn,P].

Here, all the properties of the input images, as well as the feature's own properties, are imprinted on the resulting image (notably optical devices).

A typical complete image generation pipeline can look something like

[]Fs1(Ps1)[Is1][Ps1]Fs2(Ps2)[Is1,[Ps1]Is2][Ps2]F(P)[Is1,[Ps1,P]    Is2][Ps2,P]Fo(Po)[Io][Ps1,Ps2,P,Po].

Here, the start is an empty list. Two initial features (Fs1,Fs2) append images to that list, creating a list of two images (Is1,Is2) (e.g., these could be two scattering particles in the field of view). Each such image is modified by a feature F (e.g., by adding some noise), before being merged into a single image by Fo (e.g., representing the output of a microscope). Note that P is not added to the list of properties twice; the list is in fact a set and as such cannot contain duplicate properties.

We show an even more concrete example in Fig. 6(a). Here, we have an initial feature Ellipse which creates a single image of an ellipse. We follow this by the feature Duplicate, which creates a fixed number of duplicates (here five). (Note that Duplicate duplicates the feature Ellipse, not the generated image, which is why it can create several different ellipses, for example with different radius, intensity, or in-image position.) This list of images is sent to the feature Fluorescence, which images them through a simulated fluorescence microscope. After this, a background offset is added and Poisson noise is introduced.

Since the positions of all ellipses are stored as properties, they are imprinted on the final image. This allows us to create a segmented mask, shown in Fig. 6(b), that we can use as the ground-truth label to train the deep-learning model. These two image-creation pipelines (one for the data, one for the label) are passed to a generator that continuously creates new images by updating the properties that control the features using user-defined update rules. These images are fed to a neural network to train it [Fig. 6(c)], resulting in a trained model that can analyze experimental data [Fig. 6(d)].

Writing code directly using the DeepTrack 2.0 framework allows the user to extend the capabilities of the package. In most cases, it is sufficient to use the Lambda feature, which allows any external function to be incorporated into the framework. However, certain scenarios may require the user to write custom features. For example, the user can extend the feature Optics (features that simulate optical devices) to create a new imaging modality, the feature Scatterer (features that represent some object in the sample) to create a custom scatter, or the feature Augmentation (features that augment an image to cheaply broaden the input space) to expand the range of available augmentations. It is also straightforward to add new neural-network models—any Keras model can be directly merged with DeepTrack 2.0 without any configuration, while models from other packages can easily be wrapped.

To help users get started writing code using DeepTrack 2.0, we provide several comprehensive video tutorials,92 ranging in scope from implementing custom features to writing complete solutions.

In this section, we use DeepTrack 2.0 to exemplify how deep learning can be employed for a broad range of microscopy applications. We start with a standard benchmark for image classification: the MNIST digit recognition challenge95 (Sec. IV A). Following this, we employ DeepTrack 2.0 to analyze microscopy images. First, we develop a model to track particles whose images are acquired by bright-field microscopy, training a single-particle tracker whose accuracy surpasses standard algorithmic approaches, especially in noisy imaging conditions7 (Sec. IV B). Then, we expand this example to also extract quantitative information about the particle, namely its size and refractive index (Sec. IV C). Deep learning is especially powerful in tracking multiple particles in noisy environments. As a demonstration of this, we develop a model that can detect quantum dots on a living cell imaged by fluorescence microscopy (Sec. IV D). Again, we expand this example to demonstrate three-dimensional tracking of multiple particles whose images are acquired using holography (Sec. IV E). We also develop a neural network to count the number of cells in fluorescence images (Sec. IV F). Finally, we train a GAN to create synthetic images of cells from a semantic mask (Sec. IV G). These examples are available both as project files for DeepTrack 2.0 graphical user interface92 and as Jupyter notebooks,93 and they are complemented by video tutorials.

Recognizing handwritten digits of the MNIST dataset is a classical benchmark for machine learning.95 The task consists of recognizing handwritten digits from 0 to 9 in 28 × 28 pixel images. In the dataset, there are 6104 training images and 1104 validation images, some examples of which are provided in Fig. 7(a).

FIG. 7.

A dense neural network to classify handwritten digits. (a) Three example images from the MNIST dataset with their corresponding labels. (b) The network architecture consists of five fully connected layers of decreasing size, with the final layer having ten nodes, whose outputs correspond to classification probabilities. (c) Examples of augmented training images: The network is trained on a set of 6×104 (28 × 28 pixel) images augmented by translations, rotations, shear, and elastic distortions, using a categorical cross-entropy loss. The validation loss (magenta line) is significantly lower than the training loss (orange line), likely due to augmentations making the training set harder than the validation set. (d) Confusion matrix showing how the 1×104 validation images are classified by the network: The diagonal represents the correctly classified digits, constituting the vast majority of digits. The off-diagonal cells represent incorrectly classified digits.

FIG. 7.

A dense neural network to classify handwritten digits. (a) Three example images from the MNIST dataset with their corresponding labels. (b) The network architecture consists of five fully connected layers of decreasing size, with the final layer having ten nodes, whose outputs correspond to classification probabilities. (c) Examples of augmented training images: The network is trained on a set of 6×104 (28 × 28 pixel) images augmented by translations, rotations, shear, and elastic distortions, using a categorical cross-entropy loss. The validation loss (magenta line) is significantly lower than the training loss (orange line), likely due to augmentations making the training set harder than the validation set. (d) Confusion matrix showing how the 1×104 validation images are classified by the network: The diagonal represents the correctly classified digits, constituting the vast majority of digits. The off-diagonal cells represent incorrectly classified digits.

Close modal

Since this is a relatively simple task, we employ a DNN [Fig. 7(b)]. The architecture of the networks is that of LeCun et al., which has achieved the best results using DNNs amongst the attempts listed on the MNIST webpage.95 As a loss function, we use categorical cross-entropy, which is a standard loss function for classification tasks.

We train the network using the 6104 training images augmented using affine transformations and elastic distortions, exemplified in Fig. 7(c). We train it for 1000 epochs, where one epoch represents one pass through all training images, resulting in a validation loss of 0.05 as compared to a training loss of 0.20 [Fig. 7(c)]. The higher training loss is likely due to the augmentations, which make the training data harder than the validation data. It is, as such, unlikely that the network has overfitted the training set.

Finally, we validate the trained network on the 1104 validation images. The network achieves an accuracy of 99.34%, which is in line with the best performance achieved by DNN on the MNIST digit recognition task.95 The confusion matrix [Fig. 7(d)] shows that the incorrectly classified digits consist mainly of 9s classified as 7s and 4s classified as 9s.

Determining the position of objects within an image is a fundamental task for digital microscopy. In this example, we aim at localizing the position of an optically trapped particle with very high accuracy (silica, diameter 1.98 μm, wavelength 532 nm, power 2.5 mW at the sample). Two videos are captured of the same particle in the same optical trap, one with good image quality and one with poor image quality (produced by substituting the high-quality LED illumination with a low-power incandescence light bulb, resulting in the flickering of the illumination light and a high level of electronic noise), from which we want to extract the particle's x and y positions [Fig. 8(a)].

FIG. 8.

A convolutional neural network to track a single particle. (a) Frames of the same particle held in the same optical trap, but with different illumination which results in a low-noise video (left) and a high-noise video (right). (b) The network architecture consists of three convolutional layers, each followed by a pooling layer. The resulting tensor is flattened and passed through three fully connected layers, which return the predicted x and y position of the particle. (c) Five examples of the outputs of the image generation pipeline with increasing signal-to-noise ratio (SNR). The pixel tracking error for 1000 images using the DeepTrack model (orange markers) and the radial-center algorithm (gray markers). The DeepTrack model systematically outperforms the radial-center algorithm, especially for low SNR. The model was trained for 110 epochs on a set of 1·104 synthetic images. The validation loss (magenta line) and the training loss (orange line) remain similar for the whole training session. (d) The predicted position of the particle in the low-noise video (top panel) and the high-noise video (bottom panel) as found by the radial center algorithm (gray line) and by the DeepTrack model (dotted orange line). In the low-noise case, they overlap within a fraction of a pixel, while for the high-noise case, the radial center algorithm produces erratic predictions.

FIG. 8.

A convolutional neural network to track a single particle. (a) Frames of the same particle held in the same optical trap, but with different illumination which results in a low-noise video (left) and a high-noise video (right). (b) The network architecture consists of three convolutional layers, each followed by a pooling layer. The resulting tensor is flattened and passed through three fully connected layers, which return the predicted x and y position of the particle. (c) Five examples of the outputs of the image generation pipeline with increasing signal-to-noise ratio (SNR). The pixel tracking error for 1000 images using the DeepTrack model (orange markers) and the radial-center algorithm (gray markers). The DeepTrack model systematically outperforms the radial-center algorithm, especially for low SNR. The model was trained for 110 epochs on a set of 1·104 synthetic images. The validation loss (magenta line) and the training loss (orange line) remain similar for the whole training session. (d) The predicted position of the particle in the low-noise video (top panel) and the high-noise video (bottom panel) as found by the radial center algorithm (gray line) and by the DeepTrack model (dotted orange line). In the low-noise case, they overlap within a fraction of a pixel, while for the high-noise case, the radial center algorithm produces erratic predictions.

Close modal

To analyze these images, we first use a CNN to transform the 51 × 51 pixel input image to a 6×6×64 tensor. Subsequently, we pass this result to a DNN, which outputs an estimate of the particle's in-plane position [Fig. 8(b)]. This model is based on the one described by Helgadottir.7 We use mean absolute error (MAE) as the loss function. [Alternatively, we could also use mean squared error (MSE), which delivers equally accurate results.]

The network is trained purely on synthetic data generated using DeepTrack 2.0. The generation of synthetic microscopy data for training a network generally entails the following steps. First, the optical properties of the instrument used to capture the data are replicated (e.g., numerical aperture (NA), illumination spectra, magnification, and pixel size). This ensures that the simulated point-spread function of the simulated optical system closely matches the experimental setup. Second, the properties of the sample are specified, including the radius and refractive index of the particle. As a final step, noise is added to the simulated images to be representative of experimental data. During training, each parameter of the simulation (e.g., the optical properties, the sample properties, and the noise strength) is stochastically varied around the expected experimental values to make the network more robust. In Fig. 8(c) we show a few outputs from the image generation pipeline with signal-to-noise ratio (SNR) increasing from left to right. We also demonstrate that the network outperforms the radial center algorithm.23 This is achieved by training the network for 110 epochs on a set of 1×104 synthetic images. The validation set consisted of 1000 images. We can see that the loss is still decreasing at this point, but the gain is minimal [Fig. 8(c)]. No signs of overfitting can be seen.

Finally, we use the network to track the two videos of the optically trapped particle. In Fig. 8(d), we see that in the low-noise video, the radial center method and the DeepTrack model agree, while, for the high noise-video, the radial center method makes large, sporadic jumps. Since the videos are of the same particle in the same optical trap, we expect the dynamics of the particle to be similar. Since only the DeepTrack method gives consistent dynamics in the two cases, it indicates that DeepTrack is better able to track this more difficult case. A more detailed discussion of this example can be found in Ref. 7.

Microscopy images contain quantitative information about the morphology and optical properties of the sample. However, extracting this information using conventional image analysis techniques is extremely demanding. Deep learning has proven to offer a remedy to this.45,46 In this example, we employ DeepTrack 2.0 to develop a model to quantify the radius and refractive index of particles based on their complex-valued scattering patterns. As experimental verification, we record the scattering patterns of a heterogeneous mixture of two particle populations (150 nm-radius polystyrene and 230 nm-radius polystyrene beads flowing in a microfluidic channel, illumination wavelength 633 nm) using an off-axis holographic microscope.

In line with the previous example, we use a CNN to downsample the 64×64×2 pixel input (the two channels correspond to the real and imaginary parts of the field) to an 8×8×128 tensor. Subsequently, we pass this tensor to a DNN, which outputs an estimate of the particle's radius and refractive index [Fig. 9(b)]. The number of channels in each layer is doubled compared to the previous example, which may help capture subtle changes in the scattered field. We used MAE loss.

FIG. 9.

A convolutional neural network to measure the radius and refractive index of a single particle. (a) The real and imaginary parts of the scattered field are used to measure the radius and refractive index of a single particle. The field is captured using an off-axis holographic microscope (633 nm) and numerically propagated such that the particle is in focus. The total dataset consists of roughly 8·103 such observations, belonging to 352 individual polystyrene particles with 150 nm or 230 nm radius. (b) The network architecture consists of three convolutional layers, each followed by a pooling layer. The resulting tensor is flattened and passed through three fully connected layers which return the predicted radius and refractive index of the particle. (c) Three pairs of real and imaginary parts of the scattered field from a single particle. The network is trained for 110 epochs on 1000 (64 × 64 pixel) images, using MAE loss. The validation loss (magenta line) stops decreasing significantly after only 20 epochs, while the training loss (orange line) keeps decreasing. (d) Measured radius vs measured refractive index for an ensemble of particles. There are two clearly distinguished populations, which closely match the modal characteristics of the particles (shown by the two circles).

FIG. 9.

A convolutional neural network to measure the radius and refractive index of a single particle. (a) The real and imaginary parts of the scattered field are used to measure the radius and refractive index of a single particle. The field is captured using an off-axis holographic microscope (633 nm) and numerically propagated such that the particle is in focus. The total dataset consists of roughly 8·103 such observations, belonging to 352 individual polystyrene particles with 150 nm or 230 nm radius. (b) The network architecture consists of three convolutional layers, each followed by a pooling layer. The resulting tensor is flattened and passed through three fully connected layers which return the predicted radius and refractive index of the particle. (c) Three pairs of real and imaginary parts of the scattered field from a single particle. The network is trained for 110 epochs on 1000 (64 × 64 pixel) images, using MAE loss. The validation loss (magenta line) stops decreasing significantly after only 20 epochs, while the training loss (orange line) keeps decreasing. (d) Measured radius vs measured refractive index for an ensemble of particles. There are two clearly distinguished populations, which closely match the modal characteristics of the particles (shown by the two circles).

Close modal

To account for imperfections in the experimental system, we approximate the experimental point-spread function (PSF) for the simulated images by adding coma aberrations with random strength. In Fig. 9(c) we show three examples of outputs from the image generation pipeline. The network is trained for 110 epochs on a set of 1·104 synthetic images. The validation set consists of 1000 images. The validation loss diverges from the training loss after only 20 epochs, suggesting that the training could be terminated earlier, or that a larger training set could be beneficial.

Finally, we evaluate the model on the experimental dataset. In each frame, all particles are roughly localized using a standard tracking algorithm and focused using the criteria described in Ref. 96. These observations are subsequently linked from frame to frame to form traces. We use the fact that we observe each particle multiple times to improve the accuracy of the sizing. Specifically, we predict the size and refractive index of a particle using an image from each observation of that particle. We then average the result to obtain the final prediction for that particle. (This deviates slightly from the method proposed in Ref. 45, where the averaging is performed in the latent space, which results in a more accurate method.) We can see the results in Fig. 9(d), showing the radius vs the refractive index of each measured particle. We clearly distinguish two populations, which closely match the modal characteristics of the particles (shown by the two circles).

The previous examples have been focused on analyzing a single particle at a time. Frequently, however, microscopy involves detecting multiple particles at once. In this example, we extract the positions of quantum dots situated on the surface of a living cell. A small slice of an image is shown in Fig. 10(a), with each particle circled in white.

FIG. 10.

A U-Net to detect quantum dots in fluorescence images. (a) A small slice of an image depicting quantum dots situated on a living cell, imaged through a fluorescence microscope (data kindly provided by Carlo Manzo). The quantum dots in the image are circled in white. (b) The network architecture is a small U-Net. A final convolutional layer outputs a single image, where each particle in the input is represented by a circle of 1s. (c) Examples of synthetic images used in the training process. The network is trained on 2000 (128 × 128 pixel) images for 110 epochs using binary cross-entropy loss. The validation loss (magenta line) and the training loss (orange line) are similar in magnitude for the entire training session. After 10 epochs, both losses start decreasing more rapidly, which is explained by a change in the weighting in the loss function, which is explained in the text. (d) A single frame tracked using the trained model. It detects all obvious particles, as well as a few that are hard to conclusively verify as real observations.

FIG. 10.

A U-Net to detect quantum dots in fluorescence images. (a) A small slice of an image depicting quantum dots situated on a living cell, imaged through a fluorescence microscope (data kindly provided by Carlo Manzo). The quantum dots in the image are circled in white. (b) The network architecture is a small U-Net. A final convolutional layer outputs a single image, where each particle in the input is represented by a circle of 1s. (c) Examples of synthetic images used in the training process. The network is trained on 2000 (128 × 128 pixel) images for 110 epochs using binary cross-entropy loss. The validation loss (magenta line) and the training loss (orange line) are similar in magnitude for the entire training session. After 10 epochs, both losses start decreasing more rapidly, which is explained by a change in the weighting in the loss function, which is explained in the text. (d) A single frame tracked using the trained model. It detects all obvious particles, as well as a few that are hard to conclusively verify as real observations.

Close modal

We train a U-Net model to transform the input into a binarized representation, where each pixel within 3 pixels of a particle in the input is set to 1, and every other pixel is set to 0, as shown in Fig. 10(b). The network returns a probability for each pixel, which is thresholded into the binary representation. (Note that in this example, we can use a network that is smaller than the original U-Net because the information is highly localized; however, if, for example, the data were aberrated, a deeper network would be better.) The network is compiled with a binary cross-entropy loss.

The network is trained purely on synthetic data, simulating the appearance of a quantum dot as the PSF of the optical device. In Fig. 10(c), we show several examples of the outputs from the image generation pipeline. The network is trained using 2000 (128 × 128 pixel) images in two sessions. The first session consists of 10 epochs where the loss is weighted such that setting a pixel value of 1 to 0 is penalized 10 times more than setting a value of 0 to 1. This helps the network avoid the very simple local optimum of setting every pixel to 0. For the following 100 epochs, the two types of errors are penalized equally. This explains the sudden change of training rate after 10 epochs seen in Fig. 10(c). The validation set consists of 256 images, and shows no signs of overfitting.

In Fig. 10(d), we show a single image tracked using the trained network. It detects all obvious particles, as well as a few that are hard to verify as real observations. However, for most such cases, they are detected again in the next frame, indicating that it is a real observation instead of just random noise. (However, this method to verify the tracking is not conclusive since quantum dots are known to frequently flicker, meaning that they are not guaranteed to be visible in the subsequent frames. Conversely, two observations in a row do not necessarily mean that it is a real observation. It can be a product of optical effects that are consistent between frames.)

Similarly to single-particle analysis, multi-particle analysis can be extended to extract quantitative information about the particles. In this example, we locate spherical particles in 3D space from the intensity of the scattered field captured by an in-line holographic microscope [Fig. 11(a)]. In order to be able to validate the out-of-plane positioning with ground-truth experimental data, we capture the experimental data using an off-axis holographic microscope. This allows us to accurately track the particles in 3D space using standard methods.96 Off-axis holographic microscopes, unlike the inline counterpart, retrieve the entire complex field instead of just the field intensity. We approximate the conversion from off-axis to in-line holographic microscopy by squaring the amplitude of the field. Like the previous example, we represent each particle in the input by a region of pixel values of 1s in the output. The difference is that this network returns a volume, with each particle instead represented by a sphere with a radius of 3 pixels [Fig. 11(b)]. The last dimension of the output represents the out-of-plane position of the particle, ranging from 2 to 30 μm. The network is slightly larger than the previous example since it needs to extract more information about the particles. Just as in the previous example, binary cross-entropy is used as the loss function.

FIG. 11.

A U-Net to track spherical particles in three dimensions. (a) A sample network input, consisting of scattering patterns of several spherical particles. The sample contains a mixture of 150 and 230 nm polystyrene particles. (b) The network architecture is a small U-Net. A final convolutional layer outputs a volume, where each particle in the input is represented by a sphere of 1s. The out-of-plane direction spans 2 to 30 μm. (c) Examples of synthetic images used in the training process. The network is trained on 2000 (256 × 256 pixel) images for 100 epochs using a binary cross-entropy loss. The validation loss (magenta line) diverges from the training loss (orange line) after roughly 10 epochs. (d) A single particle tracked using the DeepTrack model (dotted orange line) and off-axis holography (gray line), showing the x, y, and z positions over time. The two methods almost perfectly overlap. Moreover, we show the predicted out-of-plane position of all detections as found using the DeepTrack model vs the off-axis holography. Most observations fall close to the central line, with a few outliers and some deviations near the edges of the range.

FIG. 11.

A U-Net to track spherical particles in three dimensions. (a) A sample network input, consisting of scattering patterns of several spherical particles. The sample contains a mixture of 150 and 230 nm polystyrene particles. (b) The network architecture is a small U-Net. A final convolutional layer outputs a volume, where each particle in the input is represented by a sphere of 1s. The out-of-plane direction spans 2 to 30 μm. (c) Examples of synthetic images used in the training process. The network is trained on 2000 (256 × 256 pixel) images for 100 epochs using a binary cross-entropy loss. The validation loss (magenta line) diverges from the training loss (orange line) after roughly 10 epochs. (d) A single particle tracked using the DeepTrack model (dotted orange line) and off-axis holography (gray line), showing the x, y, and z positions over time. The two methods almost perfectly overlap. Moreover, we show the predicted out-of-plane position of all detections as found using the DeepTrack model vs the off-axis holography. Most observations fall close to the central line, with a few outliers and some deviations near the edges of the range.

Close modal

The network is trained purely on synthetic data. We replicate the optical properties of the instrument used to capture the data, simulating the appearance of a particle using Mie theory. Each parameter of the simulation is stochastically varied around the experimentally expected values, making the network more robust. Additionally, we approximate the experimental PSF by adding coma aberrations with random strength. In Fig. 11(c), we show a few images from this pipeline. The network is trained for 100 epochs on a set of 1·103 synthetic images. The validation set consisted of 256 images, and diverges from the training loss after 10 epochs, suggesting that it could be terminated earlier, or that a larger training set could be beneficial [Fig. 11(c)].

In Fig. 11(d), we show a single particle tracked in three dimensions, with the position found using the trained network in orange and the off-axis method in gray. The two methods overlap almost exactly. Moreover, we also show the out-of-plane positioning found by the off-axis method and the trained model for all detections. We see that most observations fall very close to the central line, with few outliers. Moreover, we see a divergence from the central line at the edges of the range, due to a limitation of how the position is extracted from the binarized image. A sphere close to the edge of the volume will not be entirely contained within the image, so that its centroid will not be the center of the sphere, resulting in a bias.

DeepTrack 2.0 is not limited to particle analysis. Counting the number of cells in an image has traditionally been a tedious task performed manually by trained experts. In this example, we count the number of U2OS cells (cells cultivated from the bone tissue of a patient suffering from osteosarcoma98) in fluorescence images shown in Fig. 12(a). We use the BBBC039 dataset for evaluation, which contains nuclei of U2OS cells (human bone osteosarcoma cells) in a chemical screen, with approximately 23 000 single samples manually annotated as ground truth by the data provider.97 

FIG. 12.

A U-Net to count cells in a fluorescence image. (a) Two slices from the BBBC039 dataset containing nuclei of U2OS cells in a chemical screen,97 with the corresponding number of cells in the image. (b) The network architecture is a small U-Net. A final convolutional layer outputs an image with a single feature, where each cell in the input is represented by a Gaussian distribution with a standard deviation of 10 pixels and whose intensity integrates to one. Thus, the integration of the intensity of the output corresponds to the number of cells in the image. (c) Examples of the cell images created by the image simulation pipeline, followed by a sample input-output pair containing six cells. The network is trained on 1000 (256 × 256 pixel) images, and validated on 150 (512 × 688 pixel) images, using MAE loss. The validation loss (magenta line) is consistently higher than the training loss (orange line), but follows a similar curve. (d) The number of cells as found by the DeepTrack model compared to a naive approach based on the summation of the values of the pixels of the image. Each data point represents a 256 × 256 pixel slice of one of the 50 images in the test set. Three points are circled and have their corresponding input-output pair shown on the right.

FIG. 12.

A U-Net to count cells in a fluorescence image. (a) Two slices from the BBBC039 dataset containing nuclei of U2OS cells in a chemical screen,97 with the corresponding number of cells in the image. (b) The network architecture is a small U-Net. A final convolutional layer outputs an image with a single feature, where each cell in the input is represented by a Gaussian distribution with a standard deviation of 10 pixels and whose intensity integrates to one. Thus, the integration of the intensity of the output corresponds to the number of cells in the image. (c) Examples of the cell images created by the image simulation pipeline, followed by a sample input-output pair containing six cells. The network is trained on 1000 (256 × 256 pixel) images, and validated on 150 (512 × 688 pixel) images, using MAE loss. The validation loss (magenta line) is consistently higher than the training loss (orange line), but follows a similar curve. (d) The number of cells as found by the DeepTrack model compared to a naive approach based on the summation of the values of the pixels of the image. Each data point represents a 256 × 256 pixel slice of one of the 50 images in the test set. Three points are circled and have their corresponding input-output pair shown on the right.

Close modal

We once again use a U-net model. This time, we represent each cell by a Gaussian distribution with a standard deviation of 10 pixels, whose intensity values integrate to one [Fig. 12(b)]. In this way, the integral of the output intensity corresponds to the number of cells in the image. By representing the cell by a Gaussian profile, we also reduce the emphasis on the absolute positioning of the cell, while retaining the ability for a human to validate the output visually. We compile the network using MAE loss.

The training data consists of synthetic data generated by imaging cell-like objects through a simulated fluorescence microscope. A few example cells, as well as a single training input-output pair, is shown in Fig. 12(c). The network is trained for 190 epochs on a set of 1000 synthetic images. Since the training set of the BBBC039 dataset is not used for training, we merge the training set and the validation set, and use the merged set for validation. The validation loss is consistently higher than the training loss but follows a very similar curve [Fig. 12(c)]. This suggests that the synthetic data are a decent approximation of the experimental images. The offset can be largely explained by a few images in the validation set that are particularly hard for the network.

For large images, errors can average out, which can result in deceptively accurate counting. To eliminate this concern, we show the predicted number of cells vs the true number of cells for smaller slices of images (256 × 256 pixels) in the BBBC039 dataset in Fig. 12(d). The network predicts the correct number of cells within just a few percent. As a comparison, we show that the images cannot be analyzed by simply integrating the intensity of the input images [Fig. 12(d)]. In order to show a best-case scenario for the sum-of-pixels method, we transformed each sum by an affine transformation that minimizes the square error on the test set itself. It is apparent that this is not sufficient to achieve an acceptable counting accuracy.

DeepTrack 2.0 can also efficiently handle more cases where the training set is derived directly from experimentally captured data, instead of being simulated. In this example, we combine the two approaches by using experimental data to train a GAN to create new data from a semantic representation of the image [Fig. 13(a)]. More specifically, the GAN creates images of the drosophila melanogaster third instar larva ventral nerve cord from a semantic representation of background, membrane, and mitochondria. This GAN, once trained, can subsequently be used as a part of an image simulation pipeline, just as any other DeepTrack feature.

FIG. 13.

A conditional GAN is used to create cell images from a semantic mask. (a) Example masks (left) from which images of drosophila melanogaster third instar larva ventral nerve cord (right) are generated using the segmented anisotropic ssTEM dataset.99 (b) The network architecture is a conditional generative adversarial network. The generator transforms a semantic mask into a realistic cell image, using a U-Net architecture with the most condensed layer being replaced by two residual network blocks.100 The discriminator is designed similar to the PatchGan discriminator,101 and receives both the mask and an image as an input. The generator is trained using a MAE loss between the experimental image and the generated image, as well as a MSE loss on the discriminator output. Conversely, the discriminator is trained with a MSE loss. (c) Examples of masks and the corresponding experimental images. The loss of the generator (left) and of the discriminator (right) are shown over 1000 training epochs, each of which consists of 16 mini-batches of 7 samples. We see that the generator loss increases toward the end of the training, a signature that continuing training beyond this point destabilizes the generator. (d) Masks images from a validation set, and the corresponding generated image and real image. The generated images are qualitatively similar to the real images.

FIG. 13.

A conditional GAN is used to create cell images from a semantic mask. (a) Example masks (left) from which images of drosophila melanogaster third instar larva ventral nerve cord (right) are generated using the segmented anisotropic ssTEM dataset.99 (b) The network architecture is a conditional generative adversarial network. The generator transforms a semantic mask into a realistic cell image, using a U-Net architecture with the most condensed layer being replaced by two residual network blocks.100 The discriminator is designed similar to the PatchGan discriminator,101 and receives both the mask and an image as an input. The generator is trained using a MAE loss between the experimental image and the generated image, as well as a MSE loss on the discriminator output. Conversely, the discriminator is trained with a MSE loss. (c) Examples of masks and the corresponding experimental images. The loss of the generator (left) and of the discriminator (right) are shown over 1000 training epochs, each of which consists of 16 mini-batches of 7 samples. We see that the generator loss increases toward the end of the training, a signature that continuing training beyond this point destabilizes the generator. (d) Masks images from a validation set, and the corresponding generated image and real image. The generated images are qualitatively similar to the real images.

Close modal

The architecture of the neural network we employ is shown in Fig. 13(b). The model is composed of a generator that learns the mapping relation between the input mask and its corresponding cell-image, and of a discriminator that, given the semantic segmentation, determines if the generated image plausibly could have been drawn from a real sample.

The generator follows a U-Net design with symmetric encoder and decoder paths connected through skip connections. The encoder consists of convolutional blocks followed by strided convolutions for downsampling. Each convolutional block contains two sequences of 3 × 3 convolutional layers. At each step of the encoding path, we increase the number of feature channels by a factor of 2.

The encoder connects to the decoder through two residual network (ResNet) blocks,100 each with 1024 feature channels. For upsampling, we use bilinear interpolations, followed by a convolutional layer (stride = 1). This operation is followed by concatenation with the corresponding feature map from the encoding path. Furthermore, we add two convolutional blocks with 16 feature channels at the final layer of the decoder. We use a 1 × 1 convolutional layer to map each 16-component feature vector to the output image. Herein, the hyperbolic tangent activation (tanh) is employed to transform the output to the range [1,1]. Every layer in the generator, except the last layer, is followed by an instance normalization and a LeakyRelu activation layer [defined as Φ(x)=α·x, where α = 1 for x > 0 and α=0.2 for x < 0].

The discriminator follows a PatchGan architecture,101 which divides the input images into overlapping patches and classifies each path as real or fake, rather than using a single descriptor. This splitting arises naturally as a consequence of the discriminator's convolutional architecture.62 The discriminator's convolutional blocks consist of 4 × 4 convolutional layers with a stride of 2, which decrease the input resolution to half the width and height. In all layers, we use instance normalization (with no learnable parameters) and LeakyRelu activation. Finally, the network outputs an 8×8 single-channel tensor containing the predicted probability for each path.

The training data consists of experimental data from the segmented anisotropic ssTEM dataset.99 Each sample is normalized between −1 and 1, and augmented by mirroring, rotating, shearing, and scaling. Moreover, Gaussian distributed random noise with standard deviation randomly sampled (per image) between 0 and 0.1 is added to the mask. Adding noise to the mask qualitatively improves the image quality. Specifically, without adding noise, the network is prone to tiling very similar internal structures, especially far away from the border of a mask. This occurs because there is no internal structure in the input, making two nearby regions of the input virtually identical from the point of view of the network. By introducing some internal structure in the form of noise, we help the network distinguish otherwise very similar regions in the input. An additional benefit is that it is possible to generate many images from a single mask just by varying the noise. A few example training input–output pairs are shown in Fig. 12(c). For this example, we define the loss functions of the generator as, lG=γ·MAE{zlabel,zoutput}+(1D(zoutput))2, and discriminator as, lD=D(zoutput)2+(1D(zlabel))2, where D(·) denotes the discriminator network prediction, zlabel refers to the ground truth cell-image, and zoutput is the generated image. Note that the generator loss function, lG, aims to minimize the MAE between the generator output image and its target, based on the regularization parameter γ set to 0.8. For training, we use the Adam optimizer with a learning rate of 0.0002 and β1=0.5 (the exponential decay rate for the 1st moment estimates) for 1000 epochs, each of which consisting of 16 mini-batches.

The resulting model is able to create new images from masks it has never seen before. We show five such cases in Fig. 13(d). The generated images are not identical to the real cell images in terms of texture and appearance, which is expected since the masks only contain spatial information about the cells' structures. However, the generated images are qualitatively similar to images from the experimental dataset.

The adoption of new deep-learning methods for the analysis of microscopy data are extremely promising, but has been hampered by difficulties in generating high-quality training datasets. While manually annotated experimental data ensures that the training set is representative of the validation set, it is not guaranteed that the trained network can correctly analyze data obtained with another setup or annotated by another operator. Moreover, it limits the network to human-level accuracy, which is not sufficient for tasks requiring a higher level of accuracy, such as single-particle tracking. Synthetically generated data bypasses these issues because the ground truth can be known exactly, and the networks can be trained with parameters that exactly match each user's setup.

Thanks to the increasingly available inference speed, it will become easier to perform real-time analysis of microscopy data. This can be used to make real-time decisions, from simple experiment control (e.g., such as controlling the sample flow speed) to more complex decisions (e.g., real-time sorting and optical force feedback systems). For example, one could imagine a completely automated experimental feedback system that applies optical forces to optimize imaging parameters and to acquire the best possible measurements of the quantities of interest.

In this article, we have introduced DeepTrack 2.0, which provides a software environment to develop neural-network models for quantitative digital microscopy from the generation of training datasets to the deployment of deep-learning solutions tailored to the needs of each user. We have shown that DeepTrack 2.0 is capable of training neural networks that perform a broad range of tasks using purely synthetic training data. For tasks where it is infeasible to simulate the training set, DeepTrack 2.0 can augment images on the fly to expand the available training set. Moreover, DeepTrack 2.0 is complemented by a graphical user interface, allowing users with minimal programing experience to explore and create deep-learning models.

We envision DeepTrack 2.0 as an open-source project, where contributors with different areas of expertise can help improve and expand the framework to cover the users' needs. Interesting possible directions for the future expansion of DeepTrack 2.0 can, for example, provide tools for the analysis of time sequences using recurrent neural networks, understand physical processes using reservoir computing, and even support physical implementations of neural networks for greater execution speed and higher energy efficiency.

Deep learning has the potential to revolutionize how we do microscopy. However, there are still many challenges to overcome, not least of which is figuring out how to obtain enough training data for the model to generalize. We believe that physical simulations will play a crucial role in overcoming this roadblock. As such, we strongly encourage researchers and community collaborators to contribute with objects and models in their area of expertise: from specialized in-sample structures and improved optics simulation methods to new and exciting neural network architectures.

The authors would like to thank Carlo Manzo for providing the experimental images used in the fourth case study, Jose Alvarez for designing the logo of DeepTrack 2.0, as well as the European Research Council (Grant No. 677511), the Knut and Alice Wallenberg Foundation (Grant No. 2019.0079), and Vetenskapsrådet (Grant Nos. 2016–03523 and 2019–05071) for funding this research.

The data and software that support the findings of this study are openly available in GitHub.92,93

1.
J.
Perrin
, “
Mouvement brownien et molécules
,”
J. Phys.: Theor. Appl.
9
,
5
39
(
1910
).
2.
E.
Kappler
, “
Versuche zur Messung der Avogadro-Loschmidtschen Zahl aus der Brownschen Bewegung einer Drehwaage
,”
Ann. Phys. (Berlin)
403
,
233
256
(
1931
).
3.
D.
Causley
and
J. Z.
Young
, “
Counting and sizing of particles with the flying-spot microscope
,”
Nature
176
,
453
454
(
1955
).
4.
H.
Geerts
,
M.
De Brabander
,
R.
Nuydens
,
S.
Geuens
,
M.
Moeremans
,
J.
De Mey
, and
P.
Hollenbeck
, “
Nanovid tracking: A new automatic method for the study of mobility in living cells based on colloidal gold and video microscopy
,”
Biophys. J.
52
,
775
782
(
1987
).
5.
J. C.
Crocker
and
D. G.
Grier
, “
Methods of digital video microscopy for colloidal studies
,”
J. Colloid Interface Sci.
179
,
298
310
(
1996
).
6.
O.
Ronneberger
,
P.
Fischer
, and
T.
Brox
, “
U-net: Convolutional networks for biomedical image segmentation
,” in
Int. Conf. Med. Image Comput. Comput. Assist. Interv.
(
Springer
,
2015
), pp.
234
241
.
7.
S.
Helgadottir
,
A.
Argun
, and
G.
Volpe
, “
Digital video microscopy enhanced by deep learning
,”
Optica
6
,
506
513
(
2019
).
8.
I.
Nordlund
, “
Eine neue Bestimmung der Avogadroschen Konstante aus der Brownschen Bewegung kleiner, in Wasser suspendierten Quecksilberkügelchen
,”
Z. Phys. Chem.
87U
,
40
62
(
2017
).
9.
K.
Preston
, “
Digital image processing in the United States
,” in
Digital Processing of Biomedical Images
(
Springer
,
1976
), pp.
1
10
.
10.
W. H.
Walton
, “
Automatic counting of microscopic particles
,”
Nature
169
,
518
520
(
1952
).
11.
J. M. S.
Prewitt
and
M. L.
Mendelsohn
, “
The analysis of cell images
,”
Ann. N. Y. Acad. Sci.
128
,
1035
1053
(
1966
).
12.
G.
N. Hounsfield
,
Brit. J. Radiol.
, Tech. Rep. (
Central Research Laboratories of EMI Limited
,
Hayes, Middlesex
,
1973
).
13.
H. P.
Mansberg
,
A. M.
Saunders
, and
W.
Groner
, “
The Hemalog D white cell differential system
,”
J. Histochem. Cytochem.
22
,
711
724
(
1974
).
14.
D.
Magde
,
E.
Elson
, and
W. W.
Webb
, “
Thermodynamic fluctuations in a reacting system measurement by fluorescence correlation spectroscopy
,”
Phys. Rev. Lett.
29
,
705
708
(
1972
).
15.
D.
Axelrod
,
P.
Ravdin
,
D. E.
Koppel
,
J.
Schlessinger
,
W. W.
Webb
,
E. L.
Elson
, and
T. R.
Podleski
, “
Lateral motion of fluorescently labeled acetylcholine receptors in membranes of developing muscle fibers
,”
Proc. Natl. Acad. Sci. U.S.A.
73
,
4594
4598
(
1976
).
16.
C.
Manzo
and
M. F.
Garcia-Parajo
, “
A review of progress in single particle tracking: From methods to biophysical insights
,”
Rep. Prog. Phys.
78
,
124601
(
2015
).
17.
T.
Schmidt
,
G. J.
Schütz
,
W.
Baumgartner
,
H. J.
Gruber
, and
H.
Schindler
, “
Imaging of single molecule diffusion
,”
Proc. Natl. Acad. Sci. U.S.A.
93
,
2926
2929
(
1996
).
18.
G. J.
Schütz
,
G.
Kada
,
V.
Pastushenko
, and
H.
Schindler
, “
Properties of lipid microdomains in a muscle cell membrane visualized by single molecule microscopy
,”
EMBO J.
19
,
892
901
(
2000
).
19.
D.
Alcor
,
G.
Gouzer
, and
A.
Triller
, “
Single-particle tracking methods for the study of membrane receptors dynamics
,”
Eur. J. Neurosci.
30
,
987
997
(
2009
).
20.
M.
Dahan
,
S.
Lévi
,
C.
Luccardini
,
P.
Rostaing
,
B.
Riveau
, and
A.
Triller
, “
Diffusion dynamics of glycine receptors revealed by single-quantum dot tracking
,”
Science
302
,
442
445
(
2003
).
21.
F.
Pinaud
,
S.
Clarke
,
A.
Sittner
, and
M.
Dahan
, “
Probing cellular events, one quantum dot at a time
,”
Nat. Methods
7
,
275
285
(
2010
).
22.
B.
Yu
,
D.
Chen
,
J.
Qu
, and
H.
Niu
, “
Fast fourier domain localization algorithm of a single molecule with nanometer precision
,”
Opt. Lett.
36
,
4317
4319
(
2011
).
23.
R.
Parthasarathy
, “
Rapid, accurate particle tracking by calculation of radial symmetry centers
,”
Nat. Methods
9
,
724
726
(
2012
).
24.
R. E.
Thompson
,
D. R.
Larson
, and
W. W.
Webb
, “
Precise nanometer localization analysis for individual fluorescent probes
,”
Biophys. J.
82
,
2775
2783
(
2002
).
25.
R. J.
Ober
,
S.
Ram
, and
E. S.
Ward
, “
Localization accuracy in single-molecule microscopy
,”
Biophys. J.
86
,
1185
1200
(
2004
).
26.
B.
Zhang
,
J.
Zerubia
, and
J. C.
Olivo-Marin
, “
Gaussian approximations of fluorescence microscope point-spread function models
,”
Appl. Opt.
46
,
1819
1829
(
2007
).
27.
A. V.
Abraham
,
S.
Ram
,
J.
Chao
,
E. S.
Ward
, and
R. J.
Ober
, “
Quantitative study of single molecule location estimation techniques
,”
Opt. Express
17
,
23352
23373
(
2009
).
28.
S.
Stallinga
and
B.
Rieger
, “
Accuracy of the Gaussian point spread function model in 2D localization microscopy
,”
Opt. Express
18
,
24461
24476
(
2010
).
29.
S.
Stallinga
and
B.
Rieger
, “
Position and orientation estimation of fixed dipole emitters using an effective hermite point spread function model
,”
Opt. Express
20
,
5896
5921
(
2012
).
30.
S. H.
Lee
,
Y.
Roichman
,
G. R.
Yi
,
S. H.
Kim
,
S. M.
Yang
,
A.
van Blaaderen
,
P.
van Oostrum
, and
D. G.
Grier
, “
Characterizing and tracking single colloidal particles with video holographic microscopy
,”
Opt. Express
15
,
18275
18282
(
2007
).
31.
L.
Holtzer
,
T.
Meckel
, and
T.
Schmidt
, “
Nanometric three-dimensional tracking of individual quantum dots in cells
,”
Appl. Phys. Lett.
90
,
053902
(
2007
).
32.
H.
Deschout
,
F.
Cella Zanacchi
,
M.
Mlodzianoski
,
A.
Diaspro
,
J.
Bewersdorf
,
S. T.
Hess
, and
K.
Braeckmans
, “
Precisely and accurately localizing single emitters in fluorescence microscopy
,”
Nat. Methods
11
,
253
266
(
2014
).
33.
W. J.
Godinez
and
K.
Rohr
, “
Tracking multiple particles in fluorescence time-lapse microscopy images via probabilistic data association
,”
IEEE Trans. Med. Imag.
34
,
415
432
(
2015
).
34.
N.
Chenouard
,
I.
Smal
,
F.
De Chaumont
,
M.
Maška
,
I. F.
Sbalzarini
,
Y.
Gong
,
J.
Cardinale
,
C.
Carthel
,
S.
Coraluppi
,
M.
Winter
,
A. R.
Cohen
,
W. J.
Godinez
,
K.
Rohr
,
Y.
Kalaidzidis
,
L.
Liang
,
J.
Duncan
,
H.
Shen
,
Y.
Xu
,
K. E. G.
Magnusson
,
J.
Jaldén
,
H. M.
Blau
,
P.
Paul-Gilloteaux
,
P.
Roudot
,
C.
Kervrann
,
F.
Waharte
,
J. Y.
Tinevez
,
S. L.
Shorte
,
J.
Willemse
,
K.
Celler
,
G. P.
Van Wezel
,
H. W.
Dan
,
Y. S.
Tsai
,
C. O.
De Solórzano
,
J. C.
Olivo-Marin
, and
E.
Meijering
, “
Objective comparison of particle tracking methods
,”
Nat. Methods
11
,
281
289
(
2014
).
35.
Y.
Lecun
,
Y.
Bengio
, and
G.
Hinton
, “
Deep learning
,”
Nature
521
,
436
444
(
2015
).
36.
D.
Ciresan
,
U.
Meier
, and
J.
Schmidhuber
, “
Multi-column deep neural networks for image classification
,” in
25th IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR 2012)
(
IEEE
,
2012
), pp.
3642
3649
.
37.
E.
Shelhamer
,
J.
Long
, and
T.
Darrell
, “
Fully convolutional networks for semantic segmentation
,”
IEEE Trans. Pattern Anal. Mach. Intell.
39
,
640
651
(
2017
).
38.
M.
Li
,
W.
Zuo
, and
D.
Zhang
, “
Convolutional network for attribute-driven and identity-preserving human face generation
,” arXiv preprint arXiv:1608.06434 (
2016
).
39.
M. D.
Hannel
,
A.
Abdulali
,
M.
O'Brien
, and
D. G.
Grier
, “
Machine-learning techniques for fast and accurate feature localization in holograms of colloidal particles
,”
Opt. Express
26
,
15221
15231
(
2018
).
40.
J. M.
Newby
,
A. M.
Schaefer
,
P. T.
Lee
,
M. G.
Forest
, and
S. K.
Lai
, “
Convolutional neural networks automate detection for tracking of submicron-scale particles in 2D and 3D
,”
Proc. Natl. Acad. Sci. U. S. A.
115
,
9026
9031
(
2018
).
41.
C. L.
Chen
,
A.
Mahjoubfar
,
L. C.
Tai
,
I. K.
Blaby
,
A.
Huang
,
K. R.
Niazi
, and
B.
Jalali
, “
Deep learning in label-free cell classification
,”
Sci. Rep.
6
,
21471
(
2016
).
42.
N.
Coudray
,
P. S.
Ocampo
,
T.
Sakellaropoulos
,
N.
Narula
,
M.
Snuderl
,
D.
Fenyö
,
A. L.
Moreira
,
N.
Razavian
, and
A.
Tsirigos
, “
Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning
,”
Nat. Med.
24
,
1559
1567
(
2018
).
43.
L.
Zhang
,
L.
Lu
,
I.
Nogues
,
R. M.
Summers
,
S.
Liu
, and
J.
Yao
, “
DeepPap: Deep convolutional networks for cervical cell classification
,”
IEEE J. Biomed. Health Inform.
21
,
1633
1643
(
2017
).
44.
T.
Falk
,
D.
Mai
,
R.
Bensch
,
Ö.
Çiçek
,
A.
Abdulkadir
,
Y.
Marrakchi
,
A.
Böhm
,
J.
Deubner
,
Z.
Jäckel
,
K.
Seiwald
,
A.
Dovzhenko
,
O.
Tietz
,
C.
Dal Bosco
,
S.
Walsh
,
D.
Saltukoglu
,
T. L.
Tay
,
M.
Prinz
,
K.
Palme
,
M.
Simons
,
I.
Diester
,
T.
Brox
, and
O.
Ronneberger
, “
U-net: Deep learning for cell counting, detection, and morphometry
,”
Nat. Methods
16
,
67
70
(
2019
).
45.
B.
Midtvedt
,
E.
Olsén
,
F.
Eklund
,
F.
Höök
,
C. B.
Adiels
,
G.
Volpe
, and
D.
Midtvedt
, “
Fast and accurate nanoparticle characterization using deep-learning-enhanced off-axis holography
,”
ACS Nano
(published online, 2021).
46.
L. E.
Altman
and
D. G.
Grier
, “
CATCH: Characterizing and tracking colloids holographically using deep neural networks
,”
J. Phys. Chem. B
124
,
1602
1610
(
2020
).
47.
W.
Xie
,
J. A.
Noble
, and
A.
Zisserman
, “
Microscopy cell counting and detection with fully convolutional regression networks
,”
Comput. Methods Biomech. Biomed. Eng. Imaging Vis.
6
,
283
292
(
2018
).
48.
Y.
Wu
,
Y.
Rivenson
,
Y.
Zhang
,
Z.
Wei
,
H.
Günaydin
,
X.
Lin
, and
A.
Ozcan
, “
Extended depth-of-field in holographic imaging using deep-learning-based autofocusing and phase recovery
,”
Optica
5
,
704
710
(
2018
).
49.
E.
Nehme
,
L. E.
Weiss
,
T.
Michaeli
, and
Y.
Shechtman
, “
Deep-STORM: Super-resolution single-molecule microscopy by deep learning
,”
Optica
5
,
458
464
(
2018
).
50.
W.
Ouyang
,
A.
Aristov
,
M.
Lelek
,
X.
Hao
, and
C.
Zimmer
, “
Deep learning massively accelerates super-resolution localization microscopy
,”
Nat. Biotechnol.
36
,
460
468
(
2018
).
51.
F.
Xing
,
Y.
Xie
,
H.
Su
,
F.
Liu
, and
L.
Yang
, “
Deep learning in microscopy image analysis: A survey
,”
IEEE Trans. Neural Netw. Learn. Syst.
29
,
4550
4568
(
2018
).
52.
A. M.
Zador
, “
A critique of pure learning and what artificial neural networks can learn from animal brains
,”
Nat. Commun.
10
,
3770
(
2019
).
53.
B.
Mehlig
, “
Artificial neural networks
,” arXiv preprint arXiv:1901.05639 (
2019
).
54.
A. K.
Jain
,
J.
Mao
, and
K. M.
Mohiuddin
, “
Artificial neural networks: A tutorial
,”
Computer
29
,
31
44
(
1996
).
55.
D. E.
Rumelhart
,
G. E.
Hinton
, and
R. J.
Williams
, “
Learning representations by back-propagating errors
,”
Nature
323
,
533
536
(
1986
).
56.
G.
Cybenko
, “
Approximation by superpositions of a sigmoidal function
,”
Math. Control, Signals, Syst.
2
,
303
314
(
1989
).
57.
F.
Lateef
and
Y.
Ruichek
, “
Survey on semantic segmentation using deep learning techniques
,”
Neurocomputing
338
,
321
348
(
2019
).
58.
F.
Long
, “
Microscopy cell nuclei segmentation with enhanced u-net
,”
BMC Bioinformatics
21
,
1
8
(
2020
).
59.
E.
Moen
,
D.
Bannon
,
T.
Kudo
,
W.
Graf
,
M.
Covert
, and
D.
Van Valen
, “
Deep learning for cellular image analysis
,”
Nat. Methods
16
,
1233
1246
(
2019
).
60.
I.
Goodfellow
,
J.
Pouget-Abadie
,
M.
Mirza
,
B.
Xu
,
D.
Warde-Farley
,
S.
Ozair
,
A.
Courville
, and
Y.
Bengio
, “
Generative adversarial nets
,” in
Adv. Neural Inf. Process Syst.
(
2014
), pp.
2672
2680
.
61.
A.
Yadav
,
S.
Shah
,
Z.
Xu
,
D.
Jacobs
, and
T.
Goldstein
, “
Stabilizing adversarial nets with prediction methods
,” arXiv preprint arXiv:1705.07364 (
2017
).
62.
D.
Foster
,
Generative Deep Learning: Teaching Machines to Paint, Write, Compose, and Play
(
O'Reilly Media
,
2019
).
63.
C.
Szegedy
,
V.
Vanhoucke
,
S.
Ioffe
,
J.
Shlens
, and
Z.
Wojna
, “
Rethinking the inception architecture for computer vision
,” in
29th IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR 2016)
(
2016
), pp.
2818
2826
.
64.
S. K.
Sadanandan
,
P.
Ranefall
,
S. L.
Guyader
, and
C.
Wählby
, “
Automated training of deep convolutional neural networks for cell segmentation
,”
Sci. Rep.
7
,
7860
(
2017
).
65.
Y.
Al-Kofahi
,
A.
Zaltsman
,
R.
Graves
,
W.
Marshall
, and
M.
Rusu
, “
A deep learning-based algorithm for 2-D cell segmentation in microscopy images
,”
BMC Bioinformatics
19
,
365
(
2018
).
66.
Y.
Song
,
E. L.
Tan
,
X.
Jiang
,
J. Z.
Cheng
,
D.
Ni
,
S.
Chen
,
B.
Lei
, and
T.
Wang
, “
Accurate cervical cell segmentation from overlapping clumps in pap smear images
,”
IEEE Trans. Med. Imaging
36
,
288
300
(
2017
).
67.
S. U.
Akram
,
J.
Kannala
,
L.
Eklund
, and
J.
Heikkilä
, “
Cell segmentation proposal network for microscopy image analysis
,” in
Deep Learning and Data Labeling for Medical Applications
(
Springer
,
2016
), pp.
21
29
.
68.
A.
Arbelle
and
T. R.
Raviv
, “
Microscopy cell segmentation via adversarial neural networks
,” in
2018 IEEE 15th Int. Symp. Biomed. Imaging (ISBI 2018)
(
IEEE
,
2018
), pp.
645
648
.
69.
N.
Hatipoglu
and
G.
Bilgin
, “
Cell segmentation in histopathological images with deep learning algorithms by utilizing spatial relationships
,”
Med. Biol. Eng. Comput.
55
,
1829
1848
(
2017
).
70.
A.
Arbelle
and
T. R.
Raviv
, “
Microscopy cell segmentation via convolutional LSTM networks
,” in
2019 IEEE 16th Int. Symp. Biomed. Imaging (ISBI 2019)
(
IEEE
,
2019
), pp.
1008
1012
.
71.
J. B.
Lugagne
,
H.
Lin
, and
M. J.
Dunlop
, “
DeLTA: Automated cell segmentation, tracking, and lineage reconstruction using deep learning
,”
PLoS Comput. Biol.
16
,
e1007673
(
2020
).
72.
S. E. A.
Raza
,
L.
Cheung
,
D.
Epstein
,
S.
Pelengaris
,
M.
Khan
, and
N. M.
Rajpoot
, “
MIMO-Net: A multi-input multi-output convolutional neural network for cell segmentation in fluorescence microscopy images
,” in
2017 IEEE 14th Int. Symp. Biomed. Imaging (ISBI 2017)
(
IEEE
,
2017
), pp.
337
340
.
73.
R.
Hollandi
,
A.
Szkalisity
,
T.
Toth
,
E.
Tasnadi
,
C.
Molnar
,
B.
Mathe
,
I.
Grexa
,
J.
Molnar
,
A.
Balind
,
M.
Gorbe
,
M.
Kovacs
,
E.
Migh
,
A.
Goodman
,
T.
Balassa
,
K.
Koos
,
W.
Wang
,
J. C.
Caicedo
,
N.
Bara
,
F.
Kovacs
,
L.
Paavolainen
,
T.
Danka
,
A.
Kriston
,
A. E.
Carpenter
,
K.
Smith
, and
P.
Horvath
, “
nucleAIzer: A parameter-free deep learning framework for nucleus segmentation using image style transfer
,”
Cell Syst.
10
,
453
458
(
2020
).
74.
B.
Ma
,
X.
Ban
,
H.
Huang
,
Y.
Chen
,
W.
Liu
, and
Y.
Zhi
, “
Deep learning-based image segmentation for Al-La alloy microscopic images
,”
Symmetry
10
,
107
(
2018
).
75.
S. M.
Azimi
,
D.
Britz
,
M.
Engstler
,
M.
Fritz
, and
F.
Mücklich
, “
Advanced steel microstructural classification by deep learning methods
,”
Sci. Rep.
8
,
2128
(
2018
).
76.
R.
Li
,
T.
Zeng
,
H.
Peng
, and
S.
Ji
, “
Deep learning segmentation of optical microscopy images improves 3-D neuron reconstruction
,”
IEEE Trans. Med. Imaging
36
,
1533
1541
(
2017
).
77.
A.
Çiçek
,
A.
Abdulkadir
,
S. S.
Lienkamp
,
T.
Brox
, and
O.
Ronneberger
, “
3D U-net: Learning dense volumetric segmentation from sparse annotation
,” in
Int. Conf. Med. Image Comput. Comput. Assist. Interv.
(
Springer
,
2016
), pp.
424
432
.
78.
J.
Kleesiek
,
G.
Urban
,
A.
Hubert
,
D.
Schwarz
,
K.
Maier-Hein
,
M.
Bendszus
, and
A.
Biller
, “
Deep MRI brain extraction: A 3D convolutional neural network for skull stripping
,”
NeuroImage
129
,
460
469
(
2016
).
79.
E.
Betzig
,
G. H.
Patterson
,
R.
Sougrat
,
O. W.
Lindwasser
,
S.
Olenych
,
J. S.
Bonifacino
,
M. W.
Davidson
,
J.
Lippincott-Schwartz
, and
H. F.
Hess
, “
Imaging intracellular fluorescent proteins at nanometer resolution
,”
Science
313
,
1642
1645
(
2006
).
80.
Y.
Wu
,
Y.
Luo
,
G.
Chaudhari
,
Y.
Rivenson
,
A.
Calis
,
K.
De Haan
, and
A.
Ozcan
, “
Bright-field holography: Cross-modality deep learning enables snapshot 3D imaging with bright-field contrast using a single hologram
,”
Light Sci. Appl.
8
,
25
(
2019
).
81.
Y.
Rivenson
,
T.
Liu
,
Z.
Wei
,
Y.
Zhang
,
K.
De Haan
, and
A.
Ozcan
, “
PhaseStain: The digital staining of label-free quantitative phase microscopy images using deep learning
,”
Light Sci. Appl.
8
,
2047
7538
(
2019
).
82.
S.
Masoudi
,
A.
Razi
,
C. H. G.
Wright
,
J. C.
Gatlin
, and
U.
Bagci
, IEEE Trans. Med. Imaging, Tech. Rep. (
2019
).
83.
P.
Zelger
,
K.
Kaser
,
B.
Rossboth
,
L.
Velas
,
G. J.
Schütz
, and
A.
Jesacher
, “
Three-dimensional localization microscopy using deep learning
,”
Opt. Express
26
,
33166
33179
(
2018
).
84.
S.
Franchini
and
S.
Krevor
, “
Cut, overlap and locate: A deep learning approach for the 3d localization of particles in astigmatic optical setups
,”
Exp. Fluids
61
,
140
(
2020
).
85.
T.
Wollmann
,
C.
Ritter
,
J. N.
Dohrke
,
J. Y.
Lee
,
R.
Bartenschlager
, and
K.
Rohr
, “
Detnet: Deep neural network for particle detection in fluorescence microscopy images
,” in
2019 IEEE 16th Int. Symp. Biomed. Imaging (ISBI 2019)
(
IEEE
,
2019
), pp.
517
520
.
86.
C.
Ritter
,
T.
Wollmann
,
J. Y.
Lee
,
R.
Bartenschlager
, and
K.
Rohr
, “
Deep learning particle detection for probabilistic tracking in fluorescence microscopy images
,” in
2020 IEEE 17th Int. Symp. Biomed. Imaging (ISBI 2020)
(
IEEE
,
2020
), pp.
977
980
.
87.
A. B.
Oktay
and
A.
Gurses
, “
Automatic detection, localization and segmentation of nano-particles with deep learning in microscopy images
,”
Micron
120
,
113
119
(
2019
).
88.
R.
Spilger
,
A.
Imle
,
J. Y.
Lee
,
B.
Muller
,
O. T.
Fackler
,
R.
Bartenschlager
, and
K.
Rohr
, “
A recurrent neural network for particle tracking in microscopy images using future information, track hypotheses, and multiple detections
,”
IEEE Trans. Image Process
29
,
3681
3694
(
2020
).
89.
N.
Granik
,
L. E.
Weiss
,
E.
Nehme
,
M.
Levin
,
M.
Chein
,
E.
Perlson
,
Y.
Roichman
, and
Y.
Shechtman
, “
Single-particle diffusion characterization by deep learning
,”
Biophys. J.
117
,
185
192
(
2019
).
90.
S.
Bo
,
F.
Schmidt
,
R.
Eichhorn
, and
G.
Volpe
, “
Measurement of anomalous diffusion using recurrent neural networks
,”
Phys. Rev. E
100
,
010102(R)
(
2019
).
91.
P.
Kowalek
,
H.
Loch-Olszewska
, and
J.
Szwabiński
, “
Classification of diffusion modes in single-particle tracking data: Feature-based versus deep-learning approach
,”
Phys. Rev. E
100
,
032410
(
2019
).
92.
B.
Midtvedt
,
S.
Helgadottir
,
A.
Argun
,
J.
Pineda
,
D.
Midtvedt
, and
G.
Volpe
, “
Deeptrack-2.0
,” https://github.com/softmatterlab/DeepTrack-2.0 (
2020b
).
93.
B.
Midtvedt
,
S.
Helgadottir
,
A.
Argun
,
J.
Pineda
,
D.
Midtvedt
, and
G.
Volpe
, “
Deeptrack-2.0-app
,” https://github.com/softmatterlab/DeepTrack-2.0-app (
2020c
).
94.
F.
Chollet
 et al, “
Keras
,” https://keras.io (
2015
).
95.
Y.
LeCun
,
C.
Cortes
, and
C. J.
Burges
, “
MNIST handwritten digit database
,” http://yann.lecun.com/exdb/mnist/ (
2010
).
96.
D.
Midtvedt
,
F.
Eklund
,
E.
Olsén
,
B.
Midtvedt
,
J.
Swenson
, and
F.
Höök
, “
Size and refractive index determination of subwavelength particles and air bubbles by holographic nanoparticle tracking analysis
,”
Anal. Chem.
92
,
1908
1915
(
2020d
).
97.
V.
Ljosa
,
K. L.
Sokolnicki
, and
A. E.
Carpenter
, “
Annotated high-throughput microscopy image sets for validation
,”
Nat. Methods
9
,
637
(
2012
).
98.
J.
Pontén
and
E.
Saksela
, “
Two established in vitro cell lines from human mesenchymal tumours
,”
Int. J. Cancer
2
,
434
447
(
1967
).
99.
S.
Gerhard
,
J.
Funke
,
J.
Martel
,
A.
Cardona
, and
R.
Fetter
, “
Segmented anisotropic ssTEM dataset of neural tissue
,” https://figshare.com/articles/dataset/Segmented_anisotropic_ssTEM_dataset_of_neural_tissue/856713/1 (
2013
).
100.
K.
He
,
X.
Zhang
,
S.
Ren
, and
J.
Sun
, “
Deep residual learning for image recognition
,” in
Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit.
(
2016
), pp.
770
778
.
101.
P.
Isola
,
J. Y.
Zhu
,
T.
Zhou
, and
A. A.
Efros
, “
Image-to-image translation with conditional adversarial networks
,” in
Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit.
(
2017
), pp.
1125
1134
.