Vectorial phase retrieval in super-resolution polarization microscopy

In single molecule orientation localization microscopy, valuable information about the orientation and longitudinal position of each molecule is often encoded in the shape of the point spread function (PSF). This shape, though, can be affected significantly by aberrations and other imperfections in the imaging system, leading to erroneous estimation of the measured parameters. A basic solution is to model the aberrations as a scalar mask in the pupil plane that is characterized through phase retrieval algorithms. However, this approach is not suitable for cases involving polarization-dependent aberrations, introduced either through unintentional anisotropy in the elements or by using birefringent masks for PSF shaping. Here, this problem is addressed by introducing a fully vectorial model in which the polarization aberrations are represented via a spatially-dependent Jones matrix, commonly used to describe polarization-dependent elements. It is then shown that these aberrations can be characterized from a set of PSF measurements at varying focal planes and for various polarization projections. This PZ-stack of PSFs, which contains both phase and polarization projection diversity, is used in a phase retrieval algorithm based on nonlinear optimization to determine the aberrations. This methodology is demonstrated with numerical simulations and experimental measurements. The pyPSFstack software developed for the modeling and characterization is made freely available.


I. INTRODUCTION
Fluorescence microscopy is a widely-used imaging modality in biological research 1 given its strong signal, selective labeling within complex systems 2 and compatibility with super-resolution methods. 3Moreover, this technique also allows access to the sample's structural properties 4,5 , making it very useful for studying biomechanics at the molecular level.For example, in single-molecule orientation localization microscopy (SMOLM), 3D spatial localization can reach a precision of a few nanometers, while allowing simultaneously the characterization of the molecule's 3D orientational behavior (e.g.mean orientation and degree of wobble). 5Common SMOLM techniques include polarization channel splitting [6][7][8][9] and point spread function (PSF) engineering, [10][11][12][13][14] which can be used together or separately.The shape of the PSF can change considerably with the emitter's orientation and longitudinal position, so it is crucial to take this into account to enable a full estimation of the parameters 13,15,16 and to avoid localization biases. 17In that respect, PSF engineering techniques have the aim of enhancing these shape changes.Nevertheless, any optical aberration, polarization distortion or misalignment in the imaging system can affect the final shape of the PSFs and thus lead to inaccurate estimation of the parameters.In particular, polarization aberrations are delicate to correct for, as they require additional adaptive strategies that account for the vectorial nature of light propagation. 18,19 common solution to this problem is to perform a set of calibration measurements, 20,21 and use them in a phase a) Electronic mail: rodrigo.gutierrez-cuevas@espci.fr b) Electronic mail: miguel.alonso@fresnel.frretrieval optimization algorithm to determine the aberrations present in the system.For this approach to work, it is important to have an accurate model for light's propagation from a known source to the camera, and to incorporate phase diversity, e.g. through measurements at varying focal planes, both to avoid falling into local minima and to accelerate convergence. 22For single-molecule localization microscopes, the standard approach is to measure the PSFs generated by fluorescent nanobeads for varying focal planes and use this Z-stack of images in a phase retrieval algorithm. 23Initial approaches relied on scalar models assuming a point source emitting a spherical wavefront along with a scalar pupil representing the aberrations.In this simplified case, Gerchberg-Saxton iterative algorithms 24,25 can be used, thus reducing the complexity of the implementation.However, these algorithms are less flexible since the pupil model cannot be chosen freely, and they are not directly applicable to full vectorial models that consider the dipolar emission pattern and polarization distortions. 20,21,26A more flexible approach is offered by casting the phase retrieval problem as a nonlinear optimization routine where most parameters in the model can be found during the optimization process, although this requires providing analytical formulas for the gradients in these parameters, hence complicating implementation. 25More accurate models that take into account the vector nature of the emitted light have also been proposed. 27,289][30] However, these approaches assume a scalar mask to characterize all the remaining aberrations thus preventing them from correcting any residual polarization-dependent effects.Moreover, the polarized components of the emitted fields are eventually summed in order to model an unpo-larized measurement, which is not directly applicable to polarized PSFs in SMOLM.
Recently, new SMOLM techniques have been proposed that use birefringent elements either to encode efficiently the 3D orientation and 3D localization information of the emitting dipole into the shape of the two polarization components of the PSF, 10,12,13,31 or to understand the intensity and/or shape of their different polarization projections. 8,9,32For such approaches, it is essential to take into account the emission pattern of the dipole source, its interaction with the interface between the embedded medium and coverslip, and any polarization-dependent aberrations. 33To address these issues, a model is used here where all vector aspects of the propagation of the light emitted by the source to the back focal plane (BFP) are taken into account, and where the aberrations and polarization distortions are represented by a birefringent distribution at the pupil plane (BDPP) modeled with a spatially-varying Jones matrix. 34It is shown that in order to properly characterize this BDPP, it is necessary to introduce polarization projection diversity, obtained by projecting the PSFs onto various polarization states, in addition to the phase diversity given by changing the location of the focal plane.The PSF images generated with these two diversities form a PZ-stack that is fed into a nonlinear optimization algorithm that allows retrieving the unknown BDPP.This approach allows including many parameters that are necessary for a proper characterization, such as photo-bleaching amplitudes, background illumination, or diversity-dependent tilts.Accurate and computationally amenable models for the light produced by fluorescent beads (commonly used for calibration measurements) are also included.These models take into account the unpolarized nature of the emitted light and the blurring due to bead size. 35The software package pyPSFstack used for the modeling of PZ-stacks and for the phase retrieval process can be found in Ref. 36.The retrieval algorithm was implemented with the neural network framework PyTorch, 37 which greatly simplifies its implementation and flexibility due to the automatic computation of all gradients, and its integration with GPU if available.While the emphasis of this work is on the characterization of BDPPs, both the theory and code are equally applicable to scalar-only aberration pupil distributions as shown in the supplementary material (SM).

A. Field at the back-focal plane
In order to properly characterize a given system, one must first derive an accurate model.Here, the situation depicted in Fig. 1 is considered: the incoherent light emitted by a fluorescent bead is collected by an immersion microscope objective with a high numerical aper-ture (NA).The bead is assumed to be placed at an axial position z 0 from the interface between its embedding medium with refractive index n i and the coverslip assumed to have the same refractive index n f as that of the immersion liquid.(Because the bead is behind this interface, z 0 < 0.) The index mismatch between these media introduces extra aberrations (see SM Secs.SI and SII for more details) and polarization-dependent transmission following the Fresnel coefficients.9][40] All of these effects can be encapsulated into the Green tensor for a dipolar source at the BFP of the microscope objective, which can be written as 29,[41][42][43] where and g is a 2 × 3 matrix (see the SM Sec.SI for the explicit form) that includes the effect of the Fresnel coefficients for the interface and depends only on the normalized pupil coordinates at the BFP, u = (u x , u y ) whose maximum value is limited by the NA through ∥u∥ max = NA/n f , and k = 2π/λ with λ being the wavelength of the emitted light.(Note that the definition of u differs from that in Ref. 35 by a factor of n f .)For simplicity, it was assumed that the source is centered at the optical axis.The position of the focal plane z f shown in Fig. 1 requires specifying two parameters: where α is a dimensionless parameter fixing the position chosen for the reference focal plane (RFP) and ∆ the position of the focal plane with respect to the RFP.The parameter α is generally taken to be the one producing the best focus when ∆ = 0, although its definition is not unique (one option being the one that minimizes the wavefront error 44 ).As detailed in the SM, minimizing the root-mean-square (rms) spot size with fourth-order corrections for the wavefront difference between the SAF and defocus produces the simple expression with n r = n f /n i , which gives satisfactory results for both water/oil (n r = 1.1391) and air/oil (n r = 1.515) interfaces.This value is taken as the default in this work, but it can be changed to match other experimental configurations.In the SM Sec.SII we include other possible criteria for defining the best focus plane in our scenario: index matching between the objective immersion liquid and the coverslip.For situations in which this is not the case so that the coverslip has to be considered as a parallel slab of a finite thickness, expressions for the best focal plane can be found in Ref. 44.

B. Birefringence distribution at the pupil plane
As mentioned in the introduction, the goal of this work is to characterize any residual aberrations and polarization-dependent effects due to stress and/or interfaces, the use of masks aimed at tailoring the PSF, or a combination thereof.These residual effects produce a birefringent distribution at the BFP, shown as a mask in Fig. 1, which can be represented by a 2 × 2 spacedependent Jones matrix, 34 where W represents a scalar aberration function, and the scalar pupil functions q j are real.This matrix can be made unitary by enforcing the condition j q 2 j = 1, and otherwise it can include apodization effects.The birefringence distribution J M can be used to represent both a mask introduced to shape the PSFs, such as a stress-engineered optic (SEO) 13,45,46 or a q-plate, 12,47,48 and the scalar and polarization aberrations of the system.(The simplest case of a scalar mask corresponds to q 0 = 1, q 1 = q 2 = q 3 = 0.) This Jones matrix acts on the Green tensor of the dipolar source at the BFP and the result is then propagated to the image plane via where ρ denotes the transverse position at the image plane, and M is the total magnification of the system.Note that for setups using a relay system the coordinates of G should be flipped, u → −u.
For a fully polarized dipole oriented along the unit vector µ = (µ x , µ y , µ z ), the electric field distribution at the image plane is given by Therefore, the three columns of the Green tensor represent the field distribution at the pupil plane produced by a dipole along each of the three coordinate axes.Since the information about the orientation of the dipole is encoded into the components of the Green tensor, in order to retrieve the dipole's orientation from the shape of the PSF it is necessary to spatially separate its projections into two appropriately chosen orthogonal polarization states, such as horizontal and vertical linear or left and right circular.These polarization projections can be represented by two matrices P 1 and P 2 , thus for a fully polarized dipole the PSF pair is given by with j = 1, 2. For unpolarized emitters such as fluorescent beads used for characterization, the PSFs are given by the incoherent sum of the components of the final Green tensor, which amounts to the incoherent sum of the dipoles oriented along the three coordinates axes.In this case the pair of PSFs is given by Note that if no polarization projection is used the PSF is given by the sum of the pair of PSFs I IP,1 and I IP,2 .
The goal of this work is to provide a method for determining from a set of calibration PSFs the system's BDPP, represented by a Jones matrix of the form in Eq. ( 4).The various functions in this expression must first be expanded in terms of a basis whose expansion coefficients are then determined through nonlinear optimization.Common basis choices for optimization are the Zernike polynomials 49 and a direct pixel representation, 28 both of which result in comparable speeds since they employ the same number of discrete Fourier transforms, the most costly operation even when implemented through the fast Fourier transform (FFT) algorithm.
In what follows, a decomposition in the Zernike polynomial basis is used given that its elements are easy to interpret and allow an accurate description with fewer parameters (although examples using the pixel-based approach are shown in SM Sec.SVI).The components of the Jones matrix are then decomposed as where j = 0, . . ., 3, and a single index notation was used for the basis elements Z l (e.g. the Fringe notation 50 ).Note that ′ in the expression for W indicates that the terms corresponding to piston and defocus are excluded.The piston term would simply add a global phase that is unimportant and cannot be determined from intensity measurements, while the defocus term would be redundant with the more accurate defocus parameter ∆ in Eq. ( 1) which is also included as an optimization parameter.Also, note that any misalignment of the PSFs with respect to the optical axis is corrected by the scalar tilts present in W .The number of Zernike polynomials included in the decomposition depends on the expected spatial dependence of the BDPP.In this work, we found that the first 15 polynomials are usually sufficient, even when retrieving discontinuous BDPPs such as a qplate (see Fig. 3).It should also be mentioned that the smooth description provided by the Zernike model works despite fast variations due to SAF radiation since these are already included in the propagation model.[53]

B. Phase and polarization diversity
It is common practice in phase retrieval algorithms for optical microscopes to assume access to a stack of intensity images for varying focal distances ∆ ζ from the location of the best focus (a Z-stack).The phase diversity provided by the varying focal distances is taken into account by the phase factor D (∆ ζ ) (u) defined in Eq. ( 2) within the Green tensor in Eq. ( 1).This additional information, referred to as phase diversity, helps the algorithm converge to an appropriate solution without falling into local minima, as well as discriminate between vortices with opposite topological charge.
While a Z-stack is sufficient to determine scalar masks and aberrations, it is not so for BDPPs since it does not discriminate between the true J and its unitary transformations J U = U • J, where U is a constant unitary matrix, as exemplified in the following section.Therefore, it is necessary to include information about the polarization dependence of the PSFs used for the retrieval.This additional information is obtained by introducing a polarization analyzer after the birefringent mask (see Fig. 1) composed of a combination of waveplates and polarizers, where at least one element rotates to generate various polarization projections of the output.This polarization diversity is modeled by a set of constant Jones matrices P (p) that are applied to the Green tensor at the BFP along with the defocus terms for the phase diversity in order to generate a PZ-stack of Green tensors It is important to notice that the constant matrices P (p) cannot be unitary since they would give a unitary transformation and thus have no effect of the shape of the PSFs.The simplest nonunitary matrix to implement is a projection matrix obtained by placing a linear polarizer at the end of the waveplate sequence.This PZ-stack of Green tensors is then propagated to the image plane via and its components are added incoherently (modeling an unpolarized source) to obtain a PZ-stack of PSFs like the one shown in Fig. 2. The polarization projections P 1 and P 2 used to extract the orientation information can also be used to define the polarization diversity as discussed in Sec.V.It is worth noting that while experimentally the polarization diversity happens at the BFP, computationally it is better to perform it at the image plane in order to avoid the computation of unnecessary FFTs.However, as discussed in Sec.VI, there are some situations in which this is not possible.

C. Modeling the measured PSFs
Before comparing the PZ-stack computed by the model presented thus far with one measured experimentally, it is necessary to account for other effects.First, depending on the size of the fluorescent bead it might be necessary to include a blurring effect.As shown in Ref. 35, the exact three-dimensional blurring corresponds to a superposition of two-dimensional convolutions between the PSFs generated by point sources located along the longitudinal diameter of the bead with a kernel that weights more heavily the contributions near the center of the bead and vanishes for those at the poles.This exact blurring cannot be rewritten as a three-dimensional convolution due to the lack of translation invariance along the axis of propagation, but it can be approximated through a semi-analytic method based on a Taylor expansion.By keeping the first term of the expansion, we obtain a twodimensional blurring model based on a two-dimensional convolution of the PSFs.If instead we keep two terms in the expansion we get a model for the blurring along both the transverse and longitudinal directions, which is computed via two-dimensional convolutions of the PSFs and their second-derivatives with respect to the bead's axial position thus giving a three-dimensional blurring model.Both of these approximate models are computationally more efficient than the exact one and have been implemented in pyPSFstack.Moreover, they can be used during the BDPP retrieval process, as shown in Sec.V B, albeit at a computational cost due to the need of supplementary Fourier transforms.Nonetheless, they allow for the analytic computation of the gradients, hence significantly limiting the computational slowdown; using readily available functions to blur images, on the other hand, such as the one used in Ref. 28 with Gaussian kernel, would require the gradients to be computed via finite differences.
The other two effects that must be considered are photobleaching of the fluorescent beads and background illumination.Photobleaching causes the number of photons emitted by the bead to diminish with time.Its effect can be taken into account by implementing an overall amplitude factor a (p,ζ) that depends on both the phase and polarization diversities.Background illumination is then added incoherently to the photobleached PSF stack.The simplest model is to assume that the background illumination is determined by a constant term b (p,ζ) that depends on the phase and polarization diversities.Extra terms for a spatially-dependent background can be added if needed. 54The modeled PZ-stack to be compared to the experimental measurements is then given by where B denotes the blurring operation that depends on the bead radius.In SM Sec.SIII we include a short derivation of the blurring operation when beads are modeled as solid spheres, following Ref.35.

D. Assessing the accuracy with a cost function
The last piece of the forward model to be considered is the choice of cost function used to tune the parameters so that the modeled PSFs, I exp .In the absence of noise, any choice of cost function that has a minimum when the two quantities are the same should provide the same result.However, noise is always present in experimental measurements and thus must be taken account.In single molecule fluorescent microscopy one is normally limited by shot noise following a Poisson distribution, in which case the log-likelihood cost function, 55 should be used.Here, w denotes a binary window function used to represent the region considered for the optimization due to a smaller size of the experimental data, and/or to exclude bad pixels of the camera.Another common option for the cost function is the sum of differences squared, which is appropriate when the noise follows a Gaussian distribution.Both of these options are implemented in pyPSFstack.Note that for the choice of cost function to be consistent, the values of exp must actually follow the assumed distribution.This means that the images should not be denoised and that the offset of the camera should be removed.Compute gradients ▷ Uses automatic differentiation

5:
Update optimization parameters ▷ Takes a step size of η using the gradients 6: end for

IV. IMPLEMENTING THE NONLINEAR OPTIMIZATION
The goal of the nonlinear optimization routine is to find the set of optimization parameters in the forward model that minimize the cost function assessing the differences between the measured and modeled PZ-stacks.This is achieved by supplying an optimization algorithm, such as Adam 56 or L-BFGS, 57 with a function that uses all the current values of the parameters to compute the forward model all the way to the value of the cost function.Additionally, one should supply the optimization algorithm with another function that performs a backward computation to obtain the gradients of the cost function with respect to all the optimization parameters.These gradients are used to change the parameter values until a minimum is reached.The gradient computation is straightforward but tedious, and can be achieved by following the rules outlined in Ref. 58.However, an advantage of implementing the nonlinear optimization with the neural network framework PyTorch is that only the forward model must be implemented explicitly, since the backward model for computing the gradients of the various parameters is computed automatically.This framework also offers the most common optimization algorithms.
The implementation using PyTorch greatly simplifies the vectorial phase retrieval algorithm as shown in Algorithm 1. First, the system parameters and the initial values for the optimization parameters are used to initialize the forward model used to compute the esti-mated PSFs and evaluate the current value of the cost function.For all the examples presented in this work, the BDPP was initialized as the identity matrix, i.e. c (0) 0 = 1 and c for all j, l ̸ = 0.However, this can be changed if there is a priori knowledge about the BDPP.After initialization, the phase retrieval algorithm enters a For loop in which the forward model is used to compute the cost function using the current values of the parameters.Then, automatic differentiation is called to obtain the gradients.Finally, hyperparameters of the chosen optimization algorithm and the values of the gradients are used to update the values for all the optimization parameters.This process is iterated N iter times after which the process is completed.Note that other criteria can be used to terminate the process: for example, the process could be terminated when the change of the cost function falls below a set limit.All the results presented in this work were obtained using the Adam optimizer since it is more robust than simple gradient descent and faster and less memory intensive than L-BFGS.While this optimization algorithm has several parameters that can be tuned, in practice, it was found that choosing the appropriate learning rate was sufficient to obtain satisfactory solutions.The convergence was assessed by analyzing the evolution of the cost function.A more detailed explanation can be found in the examples of the repository. 36o implement the forward model, all calculations are performed numerically using FFTs, requiring all spatial quantities to be sampled consistently with the camera's pixel pitch p cam , so that the size at the BFP is fixed to L pupil = λM/(p cam n f ).The total size at the BFP is divided into N × N points used for the computation, and the resulting two-dimensional sampling is labeled with the lexicographic index ℓ which takes over all spatial dependence in terms of u and ρ, for instance G 0 (u) → G 0 (ℓ).All the parameters and steps necessary to compute the forward model are summarized in Algorithm 2.

V. NUMERICAL EXPERIMENTS A. Polarization diversity VS more phase diversity
To exemplify the implementation of the phase retrieval algorithm and, in particular, the need for polarization diversity to properly characterize a BDPP, we consider the retrieval of two masks used recently for estimating the position and orientation of single emitters: an SEO 45,46 with and the parameter c = 1.25π, and a q-plate with unit topological charge, 12,47,48  , c Procedure Forward model • G(ℓ) ▷ Apply birefringent mask to Green tensor ▷ Compute intensity both of which are shown in Fig. 3. Following the strategy outlined thus far, pyPSFstack is used to model a PZ-stack for each BDPP such as the one shown in Fig. 2 for the SEO.For the phase diversity, images are taken from −250nm to 250nm of the RFP with a step size of 50nm.For the polarization diversity, a quarter wave plate (QWP) is rotated from 0 to 3π/8 with a step of π/8 and is followed by a Wollaston prism that projects the output onto horizontal and vertical linear polarizations.This choice is inspired by the setup used in Ref. 13  where the SEO is followed by a QWP and a Wollaston prism to project the PSF into left and right circular polarizations.It is also assumed that 10000 photons arrive on average to the camera to form the PSFs, to which an additional 50 photons per pixel are added as background.Noise following a Poisson distribution is incorporated.Note that the choice on diversity follows the results presented in the SM Sec.SV, where the accuracy of the retrieval is studied as a function of measured photons for various combinations of phase and polarization diversity.For simplicity, we consider a small fluorescent bead with R b = 10nm so that spatial blurring is negligible.Nonetheless, the exact blurring model introduced in Ref. 35 was used to compute the PZ-stacks for testing the BDPP retrieval algorithm.Moreover, a random error of the order of 20nm was introduced to the distance between the bead and the coverslip, and one of the order of 25nm to the location of the RFP.The algorithm was run on a desktop computer with an AMD Ryzen 9 3900X 12-Core CPU, and a NVIDIA GeForce RTX 2080 GPU.Using a pupil sampling of 128 × 128 points the full retrieval routine consisting of 200 iterations took only a few seconds (see Table I for more details).
The results of the procedure are shown in Fig. 3, where the Jones matrix for the ground truth is compared to the ones retrieved from the PZ-stacks.This figure also shows the PSFs formed by dipoles oriented along the three Cartesian axes and by an unpolarized dipole (constructed as an incoherent mixture of the three orthogonal dipoles).
The PSFs shown are modeled using the standard projectors P 1 and P 2 (Eqs.( 7) and ( 8)) for each birefringent mask: for the SEO the output is projected onto left and right circular polarizations, while for the q-plate the output is projected onto the horizontal and vertical polarizations.To assess the accuracy of the retrieval, both the correlation between the retrieved and true BDPPs as well as the root-mean square (RMS) error, ϵ RMS , between the retrieved and true PSFs are shown in Fig. 3.For the SEO, the retrieved BDPP and the corresponding PSFs are almost indistinguishable from those corresponding to the ground truth.However, for the q-plate there are appreciable differences between the original and retrieved BDPPs.The deviation at the center is due to the chosen model, since the Zernike polynomials struggle to reproduce the singularity at the center of the q-plate.
In general, this type of singularity does not arise as an aberration, but rather by design of the mask, so that it can be incorporated as prior information.For instance, on top of the Zernike decomposition, a q-plate could be added to the model with its center and orientation as optimization parameters.Another noticeable difference for the case of the q-plate is that the phase between the two rows of the Jones matrix is not correct.This error happens only for birefringent masks that generate rotationally symmetric PSFs for unpolarized emitters for all polarization projections.For these masks, the algorithm cannot determine the global phase between the rows of the corresponding Jones matrix.This problem can be solved by placing the device that introduces the polarization diversity before the birefringent mask.Even when this solution is not used, the correlation (when neglecting the phase difference between the rows) is quite high and the reproduced PSFs for any dipole orientation are again indistinguishable from the true ones.Therefore, these differences are inconsequential for the purposes of single molecule fluorescence microscopy.Additionally, the error introduced for z 0 and α has no impact on the retrieval as long as the defocus ∆ is included as an optimization parameter.Similar results using a pixel-based model are shown in the SM Sec.VI, as is a comparison of the convergence of the cost function between the two models.
For comparison, the retrieval for the same birefringent mask is also performed without using polarization diversity.To make this comparison consistent and fair, the number of photons reaching the camera and those in the background illumination are doubled, since there is no polarization separation.Moreover, the phase diversity images are now taken from −250nm to 250nm of the RFP with a reduced step size of 5nm so that the total number of PSFs used is larger than in the previous case.The results are shown in Fig. 3, where the algorithm is seen to fail to retrieve the appropriate BDPP, and thus it cannot generate the correct PSFs when they are projected onto a given polarization state.These results show the need for polarization diversity when characterizing a BDPP.A more detailed comparison between the scalar and vector models is provided in the SM Sec.IV.Note that runtime is much longer due to the increase in the number of FFTs required.(This runtime should be comparable with that of a model that includes tilts but no blurring; see Table I).

B. Incorporating blurring due to size to the birefringent distribution retrieval
As an additional test of the present implementation, the retrieval of a BDPP from a highly blurred PZ-stack like the one shown in Fig. 4 is considered.Using the SEO as an example, a PZ-stack is constructed as in the previous section with the significant difference of increasing the radius of the nanobead, which is chosen randomly from a uniform distribution between 150nm and 170nm.Here again the distance to the coverslip is taken to be equal to the radius of the bead.As shown in Fig. 4, this bead size causes a significant blur in the resulting PSFs.The blurred PSFs were computed with the exact blurring presented in Ref. 35, but this approach turns out to be computationally expensive and not necessary for the retrieval procedure as shown in what follows.
A first attempt to retrieve the BDPP is performed by completely neglecting the blurring effect.However, as shown in Fig. 4, this approach fails to retrieve the appropriate BDPP, showing that bead size effects cannot be neglected in this case.Next, as discussed in the SM Sec.SIII, the two approximate models presented in Ref. 35  were used for the retrieval where the radius of the bead is used as an optimization parameter for the blurring with an initial value of 150nm.The first of these approaches is a 2D convolution with a kernel given by a spherical Bessel function.This approach produces a satisfactory result, whose retrieved BDPP has a correlation of 98.6% with the true one.The second is the semi-analytic model described in Section III C for reproducing the effects of the three-dimensional blurring, based on a Taylor expansion around the center of the bead.In this case, a BDPP indistinguishable from the true one is again retrieved with a slightly better correlation of 99.5%.As shown in Table I, both blurring models significantly increase the time per iteration due to the required supplementary Fourier transforms.This increase is particularly important for the 3D blurring model, which requires the computation of the first and second derivatives of the Green tensor with respect to the bead's axial position, hence tripling the number of Fourier transforms, in addition to two convolutions at the image plane.Therefore, the small gain provided by the 3D model might not justify the extra computational resources needed for it.Nonetheless, both models are able to perform the full retrieval in well under a minute with the CPU and in a few seconds with the GPU.

VI. CHARACTERIZATION FROM EXPERIMENTAL DATA
After validating the proposed retrieval procedure on simulated data, we apply it to retrieve a BDPP distribution from a PZ-stack measured experimentally.We use a sparse sample of fluorescent nanobeads with a diameter of 20 nm (orange carboxylate-modified FluoSpheres), immobilized on the surface of a poly-L-lysine-coated coverslip and embedded in water.The sample is mounted on a XYZ piezo stage (Physik Instrumente), and is excited by a continuous wave laser emitting at 561 nm (Oxxius L4Cc) in a wide-field illumination configuration using an oil immersion objective lens (APO TIRF ×100, NA= 1.49, Nikon).The emitted fluorescence is collected by the same objective lens, and then passes through two multiband dichroics (Semrock, R405/488/561/635-t1-25x36) and a fluorescence filter (Semrock, 605/40).A telescopic relay system (composed of two achromatic doublets with f = 250 mm so that the magnification is unity) is used to access the BFP of the objective.The different polarization projections are taken by using a QWP (AQWP05M-600, Thorlabs) placed on a rotating mount (Newport, PR50CC), followed by a quartz Wollaston polarizing 2.2°beamsplitter (Edmunds, 68-820).The final images are measured using an ORCA Fusion-Digital CMOS C14440-20 UP (1024×1024 pixels, 6.5 × 6.5 µm pixel size, Hamamatsu).PZ-stacks were acquired with a step size of 50 nm, and a rotation step for the QWP of 30 • .Figure 5 shows part of the experimental PZ-stack.
Before launching the retrieval on the experimental PZstack, it should be noted that there are several factors that lead to the introduction of a diversity-dependent phase tilt at the BFP.First, the use of a Wollaston prism to separate spatially the two polarization components onto different sections of the camera might make it difficult to have the same center for the PSFs for each polarization component.Second, any slight wedge on the rotating QWP introduces a tilt that rotates with it.Finally, any slight misalignment between the stage moving the sample and the optical axis defined by the microscope objective leads to a defocus-dependent tilt.Therefore, it is best to introduce in the forward model outlined in Sec.IV extra optimization parameters to independently adjust these tilts at the BFP for each combination of diversities.Specifically, steps 4 and 5 in Algorithm 2 should be reversed in order to apply the polarization diversity before propagating to the image plane, and the following additional step should be added after applying the polarization diversity: 4b.Apply a phase tilt T (ℓ) = exp i2πt (ζ,p) • ℓ to each diversity.Optimization parameters: t (ζ,p) = (t ).
The downside of including these extra parameters is that the number of FFTs needed for each iteration increases by a factor equal to the number of polarization projection steps (see Table.I).However, the total run time remains quite manageable even if the 2D blurring model is used.
All the optimization parameters for the retrieval procedure were used, except for the bead radius (for the blurring) since it can be neglected due to the small bead size (R b = 10nm).The results of the retrieval process are shown in Fig. 5, where the strong agreement between the measured PZ-stack and the one modeled with the retrieved BDPP (see SM Sec.SVIII for a full comparison) can be appreciated.Moreover, the presence of an SEO is visible from the retrieved BDPP.In this case, since it is known that there is an SEO at the BFP, it is worth trying to separate the total retrieved BDPP into a misaligned ideal SEO and a BDPP one containing the scalar and polarization aberrations of the system.It should be noted that scalar tilts and defocus are not shown as part of the scalar aberrations.From this decomposition it can be seen that the largest birefringent contribution comes from the SEO, but the aberrations are not negligible and must be taken into account (see SM Sec.SVIII for more details).Note also that the polarization aberrations show larger variations that the scalar ones, which are almost flat, showing the need to consider polarization aberrations when using polarization-dependent systems.Given that for a retrieval from experimental data the ground truth is not known a priori, the same retrieval was performed for other beads within the field of view to corroborate the results.The retrieved BDPPs are indeed highly correlated to the one shown in Fig. 5 (see SM Sec.SVIII for full details).

VII. CONCLUSIONS
A methodology and phase retrieval algorithm were presented for the characterization of birefringent distributions at the pupil plane (BDPP) from stacks of PSFs.In particular, it was shown that, for polarization-dependent systems, aberrations should be modeled as a BDPP encoding both scalar and polarization deviations from the ideal system, and that the use of polarization diversity is essential for its proper characterization.The software program pyPSFstack created and used for the modeling and retrieval presented in this work is freely available.In particular, the BDPP retrieval implementation based on PyTorch makes this software flexible and customizable.This is shown in this work by incorporating several optimization parameters apart from those used to describe the BDPP, as well as the blurring models presented in Ref. 35.Even when considering these extra parameters and models the runtime remains manageable.Additionally, this software can also be used for the retrieval of scalar BDPPs, as shown in the SM Sec.SVII for a mask known as the tetrapod. 59hile it was assumed that the sources used for the characterization were unpolarized, the retrieval model and algorithm presented here can be easily adapted to sources that are dipolar, partially polarized, or with a fixed orientation such as molecules fixed on a surface 14 or in DNA origami. 60,61Likewise, this model can also be used to find optimal scalar or BDPPs minimizing a particular combination of the Cramèr-Rao bounds for the parameters to be extracted from the shape of the PSFs.Finally, note that there are some modifications that could be implemented in the retrieval algorithm that might allow a faster convergence, such as the use of stochastic gradient descent, where only a given set of measurements are used during each iteration, or spectral initialization. 62ectorial phase retrieval in super-resolution polarization microscopy -Supplementary Material R. Gutiérrez-Cuevas, 1, 2, a) L.A. Alemán-Castañeda, 2 I. Herrera, 2 S. Brasselet, 2 and M.A. Alonso 2, 3, b) This document presents supporting material and proofs for the results presented in the main text.Section SI provides the exact expression of the Green tensor.Section SII presents details about the different criteria used to define the best focus.Section SIII discusses the blurring models used to take into account the bead's size.Section SIV gives a more detailed comparison between the full vectorial phase retrieval algorithm presented in the main text and a scalar version.Section SV examines the effects of noise, as well as the effects of the number and type of diversity measurements on the accuracy of the retrieval.Section SVI presents the results obtained with a pixel-based model and compares them to those based on Zernike polynomials.Section SVII provides an example of the retrieval of a scalar mask.Section SVIII provides supplementary results that corroborate the retrieved birefringent distribution at the pupil (BDPP) from experimental data presented in the main text.

SI. EXPRESSIONS FOR THE GREEN TENSOR AT THE BACK FOCAL PLANE
A closed-form for the Green tensor at the back-focal plane for a dipolar source placed close to an interface can be obtained, as outlined in Ref. 1.In particular the components of the tensor g in Eq. ( 1) of the main text are given by where with being the Fresnel coefficients for p and s polarized light, and

SII. CHOOSING THE BEST FOCUS LOCATION
The index mismatch between the embedding medium and the coverslip, assumed to be the same as that of the immersion liquid of the objective, leads to a shift of the paraxial focus and higherorder circularly symmetric aberrations, such as spherical.The presence of these aberrations makes a) Electronic mail: rodrigo.gutierrez-cuevas@espci.fr b) Electronic mail: miguel.alonso@fresnel.fr4. the minimization of the rms wavefront error when it is approximated up to sixth order, which gives α = n 5 r 36624n 4 r + 9352n 6. the minimization of the rms spot size when the wavefront error is approximated up to fourth order, which gives 7. the minimization of the rms spot size when the wavefront error is approximated up to sixth order, which gives 8. the circle of least confusion, where the marginal ray intersects the caustic, for which α is given by the solution of the following equation: As shown in Fig. S1, the PSF generated at the location of the rms spot size with a fourthorder approximation to the wavefront error provides a good starting point and a simple formula.Therefore, this value is used as the default in pyPSFstack and in all computations performed in the main text.The best focus locations just described are derived in an index-matching scenario between the objective immersion liquid and the coverslip, which is common in super-resolution microscopes, and simplifies the calculations since only one interface needs to be considered.However, as mentioned in the main manuscript, for setups without index-matching where the coverslip must be considered as a parallel slab of a finite thickness, expressions for the best focal plane can be found in Ref. 2. For comparison, note that options 2 and 5, i.e. the ones minimizing the rms wavefront error either in its paraxial approximation or its full expression, correspond respectively to the so-called "Gaussian" and "diffraction" planes in Ref. 2.

SIII. BLURRING MODELS A. Exact model
As a reminder, the PSF of an unpolarized point dipole is given by However, fluorescent nanobeads used for characterizing the performance of a microscope can have a non-negligible size that must be taken into account since it leads to the blurring of the PSFs.As outlined in Ref. 3, the blurred PSF generated by a fluorescent bead can be obtained by integrating the PSF of unpolarized point dipoles over its volume with r = ρ + ẑz, and where r 0 denotes the location of the center of the bead with respect to the optical axis and interface, R b is the bead's diameter, and circ(r) = 1 if 0 ≤ r ≤ 1 and 0 otherwise.Note that it was assumed that the bead emits as a transparent sphere in which all points within the volume emit light.Another shell-based emission model was also presented in Ref. 3, but is not discussed here since a full sphere model is deemed more adequate for the fluorescent beads used.
Given the three-dimensional nature of the beads, their PSF is not expressible as a single convolution since the system is not translation invariant along the z axis.Nonetheless, by leveraging the transverse translation invariance I IP (ρ; r 0 + ρ) = I IP (ρ − M ρ; r 0 ) and separating the integral into its transverse an longitudinal components, it is possible to express it as the integral of a two-dimensional convolution: where ρM = M ρ (hence can be seen as a variable at the detector/image plane), the bead is assumed to be located along the optical axis so that r 0 = ẑz 0 , and we defined ζ = z − z 0 .This greatly simplifies the numerical calculation since only PSFs for varying distances z need to be computed.Using the convolution theorem, and remembering that the Fourier conjugate variable to the pupil coordinate u is the scaled transverse coordinate ρ′ = n f ρ/M λ, the convolution can be rewritten as where the blurring kernel in pupil coordinates is given by and the Fourier transform is defined as B. Semi-analytic model Following Ref. 3, instead of having to compute the intensity distribution for several values of ζ, i.e. for several slices of the bead, we can instead perform a Taylor expansion of the intensity around ζ = 0, that is the center of the bead, The ζ dependence comes from the phase factor giving rise to the angle fluorescence (SAF) radiation, and assuming that |ζ| < |z 0 | (the bead cannot be embedded in the coverslip) then |z 0 + ζ| = |z 0 | + ζ and where With this expansion, the intensity distribution of the bead can be written as The advantage of this form is that the integral in ζ can be computed analytically; it vanishes for odd m while for even m it is given by Having performed this integral, the intensity distribution produced by the bead can be written as a series of two-dimensional convolutions: Moreover, the expression of the blurring kernels in the Fourier plane can also be computed analytically where j n is a spherical Bessel function of order n.As discussed in Ref. 3, for small size beads, the zeroth order approximation suffices and hence only the first blurring Kernel must be used with the intensity distribution of an unpolarized source at the center of the bead.This is the 2D model used in the main text.Note that this result differs from other approaches generally used in the literature, such as convolutions with a hard disk 4 or a Gaussian 5,6 .For the 3D model, it suffices to include some of the next terms in the series expansion.

SIV. SCALAR VS VECTORIAL MODEL
In addition to the results of the numerical experiments presented in the main text, here a more in-depth comparison between the scalar and vector models for the phase retrieval is presented.Note that the aberrations and polarization distortions are represented by a birefringent distribution at the pupil plane (BDPP) modeled with a spatially-varying Jones matrix.Using the PSF stack generated with an SEO, the four following cases are considered: The results are presented in Fig. S2.Like for the results presented in the main text, the PZ-stacks are obtained by taking images between −250nm to 250nm of the RFP with a step size of 50nm and projecting the polarization by rotating a quarter wave plate (QWP) from 0 to 3π/8 with a step of π/8, and then separating in horizontal and vertical linear polarizations.The Z-stacks are obtained by taking images between −250nm to 250nm of the RFP with a reduced step size of 5nm.
The first thing to notice is that a scalar model for the BDPP fails to accurately reproduce the Zand PZ-stacks generated with the SEO, whether polarization diversity is used or not.This shows that a scalar model is generally not sufficient to reproduce the PSFs generated by birefringent widows.In comparison, when a fill birefringent model is used the Z-stack can be accurately retrieved whether polarization diversity is used or not, as seen in Fig. S2 (second column) in which both the root-mean square (rms) error, ε rms , are comparable.However, in order to reproduce the PZ-stack accurately, i.e. for different polarization projections, polarization diversity is necessary; as shown in Fig. S2 (third column) the conventional retrieval without polarization diversity fails and hence its overall ε rms is much higher.This failure is not surprising since, without polarization diversity, the phase retrieval algorithm cannot distinguish between unitary transformations of the true BDPP and thus will, most likely, converge to a wrong solution.

SV. EFFECTS OF NOISE AND OF NUMBER AND TYPE OF DIVERSITY MEASUREMENTS
An important choice to be made for the vector phase retrieval algorithm presented in the main text is the number of phase and polarization diversity measurements used.To determine some guidelines, the performance of the retrieval of an SEO is studied as a function of the number of detected photons, which controls the noise, for various combinations of phase and polarization diversity measurements.For a given number of detected photons (taken to be the number of incident photons in the brightest pixel), 50 retrievals are performed using PZ-stacks with five phase diversity measurements (obtained by taking images between −100nm to 100nm of the RFP with a step size of 50nm) and with two (quarter-wave plate at π/4 followed by Wollaston prism), four (quarter-wave plate at 0 and π/4 followed by Wollaston prism) and eight (quarter-wave plate at 0, π/8, π/4 and 3π/8 followed by Wollaston prism) polarization projection diversity.Similar PZstacks are used but with eleven phase diversity measurements (obtained by taking images between −250nm to 250nm of the RFP with a step size of 50nm).For every case, the background number of photons is equal to 50 for every pixel.To assess the accuracy of the retrieval, two metrics are used: the correlation between the retrieved BDPP and the true one, and the rms error between the retrieved and true PSFs generated by point dipoles along each of the three Cartesian axes plus an unpolarized one (as those shown in Fig. 3 in the main text).
The results are shown in Fig. S3, where similar trends can be appreciated for both metrics.The first thing to notice is that a diversity of just two polarization projections is not sufficient to fully determine a BDPP, no matter how many photons are detected.With just two polarization projections not all the elements of the Jones matrix can be determined, and thus the algorithm FIG.S2.Scalar vs vector phase retrievals.Results of the phase retrieval methods using the scalar (first and second rows) or birefringent (third and fourth rows) models for the BDPP without (first and third rows) and with (second and fourth rows) polarization diversity.The ground truth (fifth row) is shown for comparison.For each case, the retrieved BDPP (first column) along with samples of the Z-stack (second column) and PZ-stack (third column) are shown.To assess the performance of the retrieval, the rms error, εrms, between the retrieved and true Z-stack and PZ-stack are also reported.
almost always converges to the wrong BDPP.When the number of polarization projections is increased to four, the performance improves significantly, and accuracy increases with the number of detected photons.However, a successful retrieval is not guaranteed.We observe that by using an eight polarization projection diversity and eleven phase diversity measurements is a successful retrieval achieved when the number of incident photons in the brightest pixel is larger than 250, which is the case for the results presented in the main text.Therefore, an increase in the number of diversity measurements and of incident photons generally improves the performance of the algorithm.Each point is obtained by averaging the results of 50 retrieval runs using the PZ stack generated by an SEO.The reported detected photon number represent the maximum number of photons incident in the brightest pixel and the background number of photons is equal to 50 for every pixel.

SVI. PIXEL-BASED MODEL
When using a pixel-based model for the BDPP, it is still convenient to use a decomposition as the one in Eq. ( 4) of the main text.Then, instead of expanding the components W and q j into Zernike polynomials, these are discretized using the same sampling outline for the numerical implementation, and each value labeled by the lexicographic index ℓ becomes an optimization parameter with all those falling outside the aperture being set to zero.A technical point for this model is that, for the optimization, the defocus parameter ∆ should be removed since it will be automatically taken care of.The defocus term can then be removed from the retrieved BDPP by fitting to it the appropriate function.
This discretized model is also implemented in pyPSFstack, and Fig. S4 shows the results obtained when it is used for the retrieval of the SEO and q-plate with the same parameters as those used to obtain the results presented in Fig. 3 in the main text.While the general shape of the true BDPPs can be identified from the retrieved ones, there is a distinctive granularity with fast variation from on pixel to the next.This granularity decreases the correlation between the retrieved BDPP and the ground truth, but does not affect the modeling of the PSFs which still present a small error (see the rms errors reported in Fig. S4).The granularity of the retrieved BDPP is a consequence of the increased number of parameters used to model the BDPP, which makes the optimization prone to fall into local minima.Nonetheless, these fast variations in the BDPP have little to no impact on the resulting PSFs since they mostly send light outside the regions used to compute the cost function, which are controlled by w in Eq. ( 14) of the main text.This is shown in the first column of Fig. S5 where the same PSFs shown in Fig. S4 for the SEO are plotted over a larger area and in log scale.The dashed square represents the area used for the retrieval and is the one used to compute the rms error in Fig. S4.Even in log scale it is hard to see any the difference between the retrieved and true PSFs lying inside the dashed square.The largest differences happen outside, where a speckle pattern is formed by the rapid fluctuations of the retrieved BDPP.Therefore, these retrieved BDPPs effectively behave as low-pass filtered versions of themselves within the region used for the retrieval.One way to obtain a BDPP that is closer to the true one is then to filter out the high frequencies.Another is to use a larger area for the retrieval which results in the PSFs shown in the second column of Fig. S5.
Let us now consider the convergence of the algorithm and the evolution of the BDPP for both the Zernike and pixel models.Figure S6 shows the evolution of the BDPPs, the cost function and the correlation between retrieved BDPP and the true one (an SEO) with respect to the iteration number.Both models converge approximately at the same rate and produce similar values for the cost function.However, the granularity of the pixel-based pupil prevents the correlation from approaching unity.Moreover, from the evolution of this pupil, it can be seen that the granularity appears from the beginning.Therefore, early stopping would not solve this issue, which it is not due to over-fitting but to an increase of the number of local minima.

SVII. RETRIEVAL OF A SCALAR PHASE MASK
As mentioned in the main text, pyPSFstack can also be used for the retrieval of scalar phase distributions at the pupil using both Zernike-and pixel-based models, like those mentioned for BDPPs when q j = 0 for j = 1, 2, 3. Nonetheless, these have been implemented as separate models.As an example, the retrieval of a tetrapod phase mask 7 is considered.Figure S7 shows a phase mask designed to optimize the localization for defocus ranging from −1.5µm to 1.5µm and the Z-stack used for the retrieval.Also shown are the results of the retrieval with both the Zernike and pixel-based models, along with the Z-stacks modeled with the retrieved BDPPs.We see that both models provide accurate results, with the slight difference of obtaining a smoother BDPP for the Zernike-based model.Additionally, it should be mentioned that the runtime for both models is quite similar when using the same sampling since, as mentioned in the main manuscript, the main bottleneck is the number of FFTs required, which is the same for both models, and the gradients for all parameters are computed analytically.

SVIII. VERIFICATION OF EXPERIMENTAL RETRIEVAL
To further support the results of the retrieval of a BDPP from experimental measurements, Fig. S8 compares the full measured PZ-stack to the one generated using the retrieved BDPP.Excellent agreement between the two can be appreciated across all the polarization and phase diversity measurements.Moreover, to show the need for a proper characterization of the system for single molecule orientation localization microscopy Fig. S9 compares the PSFs generated by point dipoles along each of the three Cartesian axes as well as an unpolarized dipole, when only a misaligned SEO is used and when the SEO plus the full polarization aberrations are considered.From their shape and the rms error reported it is clear that without the full polarization characterization the appropriate PSFs cannot be reproduced.Additionally, from the values retrieved from the algorithm it seems that photobleaching must be taken into account since there are significant variations in the number of photons emitted by a single bead.However, these numbers of photons do not follow a downward trend with respect the acquisition time but rather they oscillate.This behavior can be explained by the fact that the beads contain many fluorophores so that only some of them are photobleached at a given time.
When retrieving a BDPP from experimental measurements, the ground truth is not known.Therefore, in order to corroborate these results the same retrieval is carried out using the PZ-stack generated by nine more beads positioned across the field of view.The retrieved BDPPs are then compared to the reference one, shown in the main text.The bead leading to the reference BDPP produced the brightest PZ-stack which increases the confidence in the retrieval.Moreover, as can be seen from the results presented in Fig. S10, most PZ-stacks lead to BDPPs that are highly correlated to the reference one.Note that some variations are expected due to field-dependent aberrations.Only the BDPP retrieved from bead 6 significantly differs from the reference one, but its PZ stack has less than a third of the photons of those for the reference, impacting the accuracy of the retrieval as shown in Fig. S3.Nonetheless, if the phase retrieval algorithm is run again, this time setting as a starting point the reference BDPP then it converges to a BDPP, with a lower cost function and with a correlation of 91.5%.

FIG. 1 .
FIG.1.Schematic of the experimental setup for the collection and shaping of the emission by a source.(a) Position of the fluorescent nanobead of radius R b , embedded in a medium of refractive index ni, with respect to the interface created by the coverslip and the immersion liquid of the microscope objective, with refractive index n f , and the focal plane located at z f , z f < 0 (z f > 0) if the focal plane is in the medium with refractive index ni (n f ).(b) Schematic of the collection arm composed of a microscope objective (MO), followed by a birefringent mask (BM) and a polarization analyzer (PA) at the back focal plane (BFP).The light at the BFP is then focused onto the camera by the tube lens (TL) of focal length f tl .

FIG. 2 .
FIG. 2. PZ-stack for an unpolarized emitter shaped with an SEO element and modeled with pyPSFstack.Only the PSFs at the initial, middle and final values of the defocus parameter ∆ ζ used for the phase diversity in the retrieval of the BDPP shown in Fig. 3 are shown.For each ∆ ζ we show the PSFs for all polarization projections of the diversity.This polarization diversity is generated with a rotating quarter wave plate followed by a projection onto linear horizontal or vertical polarization states.
FIG.3.BDPP retrieval with and without polarization diversity.(first row) Elements of the Jones matrix for the ground truth and the retrieved BDPP with and without polarization diversity for (left) an SEO and (right) a q-plate.Also shown are the PSFs generated by point dipoles (second to fourth row) oriented along each of the three Cartesian axes and (last row) an unpolarized dipole, all for each of the standard projectors used for each type of birenfringent window.For the SEO, the PSFs are projected onto left-and right-circular polarizations, while for the q-plate they are projected onto linear horizontal and vertical polarizations.Also given are the correlation between the retrieved and the true BDPP, as well as the RMS error between the retrieved and true PSFs generated by point dipoles along each of the three Cartesian axes and an unpolarized dipole.

FIG. 4 .
FIG. 4. BDPP retrieval from highly blurred PSFs.Comparison of PSFs for an SEO at the reference focal plane (∆ ζ = 0) for the same polarization diversities as those shown in Fig. 2 for (first row) a point source and (second row) a fluorescent bead with radius R b = 150 nm.(third row) Elements of the Jones matrix for the ground truth, and the retrieved BDPPs without blurring, with a two-dimensional blurring model, and a three-dimensional semi-analytic blurring model.

FIG. 5 .
FIG. 5. BDPP retrieval from experimental data, taken from fluorescent nanobeads of diameter 20nm.(first row) Measured and (second row) retrieved PZ-stacks, where only the PSFs at the initial, middle and final values of the defocus parameter ∆ used for the phase diversity are shown, for all polarization projections comprising the polarization diversity.(last row) The retrieved elements of the Jones matrix for the BDPP and its decomposition into a misaligned SEO element, and scalar and polarization aberrations.
1. a scalar model for the BDPP without polarization diversity; 2. a scalar model for the BDPP with polarization diversity; 3. a birefringent model for the BDPP without polarization diversity; 4. a birefringent model for the BDPP with polarization diversity.
FIG.S3.Performance of the retrieval process as a function of the number of detected photons for various combinations of phase and polarization diversity measurements.Both the mean correlation (left) between the retrieved and the true BDPP and the rms error (right) between the retrieved and true PSFs generated by point dipoles along each of the three Cartesian axes and an unpolarized dipole are shown for six different combinations of phase and polarization diversity measurements.The phase and polarization diversities are labeled Npp and Nzz, where Np and Nz are the number of measurements used comprising each diversity respectively.Each point is obtained by averaging the results of 50 retrieval runs using the PZ stack generated by an SEO.The reported detected photon number represent the maximum number of photons incident in the brightest pixel and the background number of photons is equal to 50 for every pixel.
FIG. S4.Pixel-based retrieval.(first row) Elements of the Jones matrix for the ground truth and the retrieved BDPPs with a pixel-based model for (left) an SEO and (right) a q-plate.Also shown are the PSFs generated by point dipoles (second to fourth rows) oriented along each of the three Cartesian axes, and (last row) an unpolarized dipole, for each of the standard projectors used for each type of birenfringent window.For the SEO the PSFs are projected onto left-and right-circular polarizations, while for the qplate they are projected onto linear horizontal and vertical polarizations.Also indicated are the correlation between the retrieved and true BDPP and the rms error between the retrieved and true PSFs generated by point dipoles along each of the three Cartesian axes, and an unpolarized dipole are also reported.
FIG. S7.Retrieval of a scalar phase mask.(first row) Z-stack of PSFs modelled with the software pyPSFstack for the tetrapod scalar phase mask.(second row) Scalar pupils for the ground truth and the retrieved pupils with Zernike-and pixel-based models.
FIG. S8.Comparison between the complete measured PZ stack generated with an SEO and the retrieved one.The corresponding retrieved pupil is shown in Fig. 5 of the main text and is the one with the label 0 in Fig. S10.
Implementation of the forward model also known as a vortex wave-Algorithm 2

TABLE I .
Runtime per iteration for a pupil sampling of 128 × 128 points, eleven phase diversities and eight polarization diversities for different blurring models and with and without the diversity-dependent tilts mentioned in Sec.VI.