The dominant eigenfunctions of the Koopman operator characterize the metastabilities and slowtimescale dynamics of stochastic diffusion processes. In the context of molecular dynamics and Markov state modeling, they allow for a description of the location and frequencies of rare transitions, which are hard to obtain by direct simulation alone. In this article, we reformulate the eigenproblem in terms of the ISOKANN framework, an iterative algorithm that learns the eigenfunctions by alternating between short burst simulations and a mixture of machine learning and classical numerics, which naturally leads to a proof of convergence. We furthermore show how the intermediate iterates can be used to reduce the sampling variance by importance sampling and optimal control (enhanced sampling), as well as to select locations for further training (adaptive sampling). We demonstrate the usage of our proposed method in experiments, increasing the approximation accuracy by several orders of magnitude.
I. INTRODUCTION
Many realworld stochastic processes contain rare events, for example, folding and binding events in molecular systems. The analysis of the frequency and mechanism of these events often takes operators associated with the process and their dominant invariant subspaces into account.^{1–5} Usually, this type of analysis leads to a kind of chicken and egg problem. In order to compute the dominant invariant subspace of the Koopman operator of the process, one has to somehow also “sample those events that are usually rare.” However, in order to know how to generate a bias on the process for observing these events, one would need information from the dominant invariant subspace of the Koopman operator.
The key idea of this article is to take an iterative algorithm that approximates the dominant eigenfunctions of the operator and to use the intermediate approximations for sampling along the reaction paths and generating an optimal bias to observe the relevant events. The algorithm for approximating eigenfunctions is ISOKANN^{4} (Invariant subspaces of Koopman operators with artificial neural networks), and the bias is computed according to the theory of optimal importance sampling and the change of path measures.^{6}
Briefly speaking, ISOKANN can be thought of as an Arnoldilike method using neural networks as function representations and replacing the subspace projections by a transformation suitable to its application on neural networks. Instead of computing the eigenfunctions themselves, it stores linear combinations of them, the socalled χfunctions. These span an invariant subspace of the Koopman operator and can be used to reconstruct the eigenfunctions. Furthermore the χfunctions themselves allow for an interpretation as reaction coordinates indicating the locations of rare events and their reaction paths.^{7,8}
Using this interpretation, the χfunctions obtained during previous iterations can be used to adapt the sample locations for further iterations. For example, this is done in Sec. II F by χstratified sampling, which provides better global coverage and facilitates exploration. The theory of optimal importance sampling on the other hand allows to exploit the information locally by decreasing the variance of the samples required to approximate the action of the Koopman operator. This in turn allows ISOKANN to arrive at results either more quickly or more precisely.
It is due to this feedback loop, coupling local and global information, together with its representation of the χfunctions as neural networks that we expect ISOKANN to perform well even for complex highdimensional systems. While these general ideas are far from being formalized yet, in this article we will try to construct the basic building blocks in a way amenable for their future analysis.
In the following, we start by recalling basic knowledge about the eigenfunctions of Koopman operators (Sec. II A) and provide an abstract formulation of the ISOKANN problem in terms of χfunctions (Sec. II B) encoding the invariant subspaces of the Koopman operator. This abstract formulation is incomplete without the choice of an adequate transformation, which we discuss in Sec. II C. In particular, we show how the explicit solution to PCCA^{+} with two eigenfunctions leads to 1DISOKANN which corresponds to the classical ISOKANN algorithm in 4. After proving convergence of 1DISOKANN we show how to reconstruct the eigenfunctions of the Koopman operator from the χfunctions in Sec. II E. We then conclude Sec. II by providing an algorithmic description in form of Algorithm 1 and discussing the actual sampling and computation procedure (Sec. II F). Section III starts with an introduction to the theory of optimal importance sampling of classical random variables (Sec. III A) before reciting the result for the optimal sampling of path observables for diffusion processes with the Girsanov reweighting in Sec. III B. After showing how to apply this result to obtain a zero variance sampler for the evaluation of the Koopman operator (Sec. III C) we discuss how to integrate the result into the ISOKANN framework (Sec. III D). In the end (Sec. III E) we apply the 1DISOKANN algorithm to a onedimensional doublewell potential and observe the improvement of the accuracy of the controlled vs the uncontrolled case.
Input: N the number of power iterations 
M the number of xsamples 
K the number of Koopman MonteCarlo samples 
χ_{0} initial neural network 
Output:χ_{N} approximates a χ function [cf. Eq. (5)] 
1: algpx@indentEndPage0 for n = 0 to N − 1 do 
2: algpx@indentEndPage1 for m = 1 to M do 
3: x_{m} ← sampleX_{0}() ⊳ sample training points 
4: algpx@indentEndPage2 for k = 1 to K do 
5: y_{k} ← sampleX_{T}(x_{m}) ⊳ simulate trajectories 
6: $ \kappa m \u2190 1 K \u2211 k \chi n ( y k ) $ ⊳ Koopman approximation 
7: s ←S(κ) ⊳ transformed target data 
8: $\Delta \u2190 \u2207 \theta n \u2211 m ( \chi n ( x m ) \u2212 s m ) 2 $ ⊳ compute loss gradient 
9: χ_{n+1} ←OPTIM(χ_{n}, Δ) ⊳ train the neural network 
10: return χ_{N} 
Subroutines: 
sampleX_{0}: Subroutine sampling the starting points x_{m}  either uniform or χstratified (Sec. II F). 
sampleX_{T}: SDE solver, e.g., EulerMaruyama  either uncontrolled or controlled (Sec. III D). 
S: empirical shiftscale or PCCA^{+} (Sec. II D or Sec. II C) 
optim: Gradient based optimization of the neural network (e.g., 100 SGD steps) 
Input: N the number of power iterations 
M the number of xsamples 
K the number of Koopman MonteCarlo samples 
χ_{0} initial neural network 
Output:χ_{N} approximates a χ function [cf. Eq. (5)] 
1: algpx@indentEndPage0 for n = 0 to N − 1 do 
2: algpx@indentEndPage1 for m = 1 to M do 
3: x_{m} ← sampleX_{0}() ⊳ sample training points 
4: algpx@indentEndPage2 for k = 1 to K do 
5: y_{k} ← sampleX_{T}(x_{m}) ⊳ simulate trajectories 
6: $ \kappa m \u2190 1 K \u2211 k \chi n ( y k ) $ ⊳ Koopman approximation 
7: s ←S(κ) ⊳ transformed target data 
8: $\Delta \u2190 \u2207 \theta n \u2211 m ( \chi n ( x m ) \u2212 s m ) 2 $ ⊳ compute loss gradient 
9: χ_{n+1} ←OPTIM(χ_{n}, Δ) ⊳ train the neural network 
10: return χ_{N} 
Subroutines: 
sampleX_{0}: Subroutine sampling the starting points x_{m}  either uniform or χstratified (Sec. II F). 
sampleX_{T}: SDE solver, e.g., EulerMaruyama  either uncontrolled or controlled (Sec. III D). 
S: empirical shiftscale or PCCA^{+} (Sec. II D or Sec. II C) 
optim: Gradient based optimization of the neural network (e.g., 100 SGD steps) 
II. ISOKANN THEORY
In this chapter we recall the basics of Koopman operator theory and summarize the ISOKANN method while also complementing it by some new results. In particular, we provide a new dimensionagnostic formulation (6) which naturally leads to a possible extension of ISOKANN to higher dimensions. We then show how the classical 1DISOKANN can be seen as a special case of this formulation and prove convergence of the algorithm to a χ function (Theorem 1). After showing how to reconstruct the dominant eigenfunctions from the ISOKANN result (Proposition 1) we conclude this section with a discussion of the actual implementation of ISOKANN, suggesting a new adaptive sampling scheme. In the next chapter, we will then introduce the key idea of this paper, namely optimal importance sampling in order to estimate the Koopman expectations with less variance.
A. The Koopman operator
The dominant eigenfunctions are of particular interest as they decay the slowest and hence dominate the long time behavior of the system. The number d of eigenfunctions of interest depends on the time scales of the system and is usually chosen up to a spectral gap.
There exist different approaches to estimate the eigenfunctions. Many depend on the discretization of the statespace into cells leading to a matrix representation of K. A classical method is starting trajectories in each such cell and counting how many of them end up in a certain cell. This sampledriven method can be interpreted as an approximate Ulam/Galerkin discretization onto indicator functions of the cells.^{9} The eigenfunctions of K are then approximated by the eigenfunctions of its (dense) matrix approximation. Unfortunately this scheme breaks down in high dimensions as the number of cells in a structured grid increases exponentially.
One possible remedy to this problem is posed by the SquareRootApproximation method (SQRA).^{10} It approximates the infinitesimal generator of the Koopman operator by a finite volume approximation where the volumes are implicitly defined by the Voronoi tesselation induced by some sample points. Using only evaluations of the potential U at these samples it can be understood as a semiparametric method resulting in a sparse matrix representation, which in turn can be used for the computation of the eigenfunctions.
All these classical approaches however depend on a discretization before solving the eigenproblem. As an alternative we will now summarize ISOKANN, a recent matrixfree approach learning a linear combination of the eigenfunctions by neural networks.
B. ISOKANN– computing the dominant eigenspace
The ISOKANN algorithm^{4} uses a nonparametric representation in the form of a neural network in order to learn the dominant invariant subspace by interleaving an Arnoldilike power iteration with the approximation of the Koopman operator by MonteCarlo simulations.
Let us note that this presentation of ISOKANN differs from the original variant^{4} in that the iteration in the form of (6) allows general ddimensional valued χ functions. However we will see that for the case of d = 2 this is equivalent to the original variant, which by slight abuse of notation we will refer to as 1DISOKANN.
1. Illustrative example
C. Choice of the transformation S
In the previous description we were talking about an “appropriately chosen” linear transformation S or a sequence of S_{n}. We will now explain the role that S plays in ISOKANN and suggest a way of determining suitable S.
In principle the map S can be chosen arbitrarily such that (5) has a solution in χ. However to obtain a useful result and for the ISOKANN iterations to converge with the corresponding choice of S_{n} we require some specific properties.
In order to understand the role of S let us for now think about the classical Arnoldi method to find the dominant eigenfunctions of a matrix. In principle the Arnoldi method also takes the form of (1), where the application of S_{n} corresponds to a GramSchmidt orthonormalization. Here the orthogonalization ensures that the leading eigenvectors are projected out from the subsequent ones and the following normalization step then ensures that the eigenvectors do not decay over multiple iterations.
For ISOKANN we want S to fulfill the same role: The goal of each linear transformation S_{n} is to counteract the decay of the dominant eigenfunction components contained in χ_{n} after application of the Koopman operator K.
However, we are interested in linear combinations of the eigenfunctions over a continuous space represented by a neural network. In this setting orthonormalization is a hard problem, involving integration over the whole state space. In principle one might resort to restricting the χ functions to a finite number of fixed points and orthonormalize wrt. the resulting vectors. We expect this to work fine as long as one includes those points where the eigenfunction differ strongly, i.e. the individual metastabilites, which are usually not known a priori.
On the other hand, orthonormality is a strong assumption which is not necessarily required. If we manage to choose S such that it amplifies the first d eigenfunctions such that they stay bounded away from zero, we will obtain a representation of the dominant subspace, which in sequence allows for the reconstruction of eigenfunctions (c.f. Sec. II E).
We now motivate a heuristic approach, based on the PCCA^{+} algorithm,^{11} to construct a transformation S and will prove its convergence in the 1Dsetting.
Let us shortly summarize the idea of the PCCA^{+} methodology. In the context of metastable systems the metastabilities are those regions where one of the dominant eigenfunctions becomes approximately constant as well as extremal. In practice, plotting the image of the eigenfunction’s components v_{i} the resulting set often resembles a simplex in which the metastable regions accumulate at its vertices. PCCA^{+} can be understood as a method to identify this simplex structure and construct the linear transformation (in the space of eigenfunction components) mapping this set into the unitsimplex (see also Fig. 4), such that the result becomes “as big as possible” (specified by an optimization problem).
We can similarly imagine this picture for the χ functions: After sufficient time propagation (or power iterations), the χ functions are mainly composed of the dominant eigenfunctions. The image of χ thus will be close to the image of v modulo a linear transformation/a change of basis. We can thus apply PCCA^{+} onto our intermediate Kχ_{n} in order to find a transformation such that the resulting χ_{n+1} fills this unitsimplex. This inhibits the exponential decay of the nontrivial eigenfunctions by maintaining a set of d linearly independent components. Since the dominant eigenfunctions decay slower then the following nondominant ones, they will dominate the behavior and prevail for n → ∞.
Even though preliminary results have shown that using PCCA^{+} to construct S in higher dimensions works fine (with a suitable stabilization against permutations), the focus of this paper is on the optimal control so we will reserve a more detailed report to future work and hence give a proof only for the simpler case of computing the first nontrivial eigenfunction.
D. 1DISOKANN with explicit PCCA^{+}
In the case where one is interested merely in the first nontrivial eigenfunction, v_{2}, the PCCA^{+} solution can be computed explicitly, which in turn allows us to prove convergence of ISOKANN for d = 2 and explicitly specify S.
This representation allows us to solve the two dimensional ISOKANN problem with just one scalar function, approximated by a series of scalar neural networks $ \chi \u0304 n :X\u2192R$. This scalar representation is the approach taken in 4 and in the following we will refer to it as 1DISOKANN (since ISOKANN for d = 1 would otherwise only denote the trivial solution χ ≡ 1).
Let us start by constructing the explicit PCCA^{+} solution. Recall that v_{1} ≡ 1 and note that $ v 2 ( X ) = min ( v 2 ) , max ( v 2 ) $. Thus the image of X forms a line segment, i.e. a 1dimensional simplex, in the v_{1}v_{2}plane. PCCA^{+} then constructs the unique map S which maps this simplex onto the unit simplex $ ( x , y )  x + y = 1 , x > 0 , y > 0 $ (see Fig. 4).
Using $ S \u0304 $ to learn only the first component of the χ function, we now can formulate the following explicit iterative 1DISOKANN procedure:
To summarize, we have shown how the representation of ISOKANN for d = 2 as a scalar problem reproduces the classical 1DISOKANN procedure^{4} where the role of the linear map S (determined by PCCA^{+}) gets replaced by the (explicitly given) affinelinear $ S \u0304 $. This simpler representation allowed us to show convergence to a χfunction and thus solving the ISOKANN problem.
E. Restoring the eigenfunctions
F. Computational procedure
With these theoretical considerations, let us now discuss how to apply the algorithm in practice, i.e. using neural networks as function approximators and Monte Carlo (MC) simulations for the Koopman evaluations.
What now remains is the application of the shiftscale S_{n}. In case of 1DISOKANN the action of S_{n} is determined by the shiftscale $ S \u0304 $ from (12) which depends on the (global) extrema of the input function k. In practice we thus use the empirical extrema over the observed data {κ_{m}} to directly compute $ s m = S \u0304 ( \kappa ) m $ without explicitly constructing S_{n}. In higher dimensions we compute the matrix S_{n} using PCCA^{+} to find the transformation that maps the columns of the matrix K = [κ_{1} … κ_{m}] into the unit simplex. Note that the use of the empirical extrema requires that the training points x_{m} indeed cover the areas where Kχ_{n} becomes (approximately) extremal.
This brings us to the choice of the M training points x_{m}. In principle ISOKANN can be applied to find the χ functions of a system based on a fixed set of precomputed or assimilated trajectories (replacing the SDE integration). However its iterative nature makes it especially useful in the synthetic data regime where the trajectories are computed online as it allows adapting the training points x_{m}, and as we will see the trajectory simulations too, to the information obtained so far.
Heuristically speaking we obtain samples x_{m} which are uniform in χ and hence provide good coverage or “bridges” along the transition region, thus facilitating an efficient “flow of information” during the poweriteration process. Furthermore the regions with a higher variation of χ, i.e. those that are “harder to learn,” will also obtain more samples which in turn is also beneficial for the training of the neural network itself.
Last but not least the samples obtained this way followed the system’s ergodic dynamics and will therefore approximate the stationary distribution (restricted on each level set of χ). This allows us to evade the curse of dimensionality by restricting the sampling to physically meaningful samples along the reaction paths.
To summarize, χstratified sampling allows us to sample uniform along χ and stationary conditioned on it without much additional cost and adapted to the learning process.
The main ISOKANN routine can be summarized by three loops (N power iterations, M training points, K trajectories) which can be loosely tied to the three main ingredients of ISOKANN:

The power iteration learning the dominant subspace.

The neural network approximation of the next χ iterate.

The MonteCarlo simulation of the Koopman evaluation.
In the outermost loop (1) we perform the power iteration χ_{n} to χ_{n+1}. Whereas in theory we have convergence for n → ∞, we choose to terminate after a fixed number of iterations N. This could be replaced by classical convergence criteria such as relative and absolute tolerances. Note that the rate of convergence and hence the required number of power iterates N depends on the eigenvalues of the Koopman operator, where a bigger spectral gap implies faster decay of the nondominant spectrum and hence faster convergence.
When training the neural network (2) we loop over a batch of labeled training data at the M training points x_{m} which in turn (3) require K individual trajectory simulations. Both the number of training points M as well as trajectories per point K depend on the step sizes chosen for the neural network optimizer. Since in practice the evaluation of the Koopman operator is rather expensive compared to the neural network update, it may be efficient to perform multiple update steps on the same batch of data before proceeding to the next iteration.
Note that the variance of the training data, scaling with K^{−1}, is particularly high for metastable systems due to the impact of rare transitions. Whereas above we proposed the use of χstratified subsampling as a heuristic to deal with sampling in X space, we will now address the problem of variance in the “Kχ_{n}direction” using the techniques of optimal control and importance sampling.
III. OPTIMAL SAMPLING OF KOOPMAN EIGENFUNCTIONS AND χFUNCTIONS
In this chapter we first recall importance sampling, before showing how the theory allows to better sample eigenfunctions of the Koopman operator and ISOKANNχfunctions. We conclude this chapter with a numerical example.
A. Importance sampling for random variables
B. Optimal importance sampling for diffusion processes
Note that even though the expected value of this estimator is the same for any control u its variance will vary. Analogous to the case above there exists an optimal measure corresponding to an optimal control u* for which the controlled estimator exhibits zero variance:^{12,13}
C. Optimal sampling of eigenfunctions of the Koopman operator
We will now show how this optimal control theorem can be used to evaluate the Koopman operator.
This result shows us that in order to compute the optimal control for evaluating Kh we need to have access to the derivatives of Kh. This conundrum is in line with the general optimal importance result and comes at no surprise. In Sec. III D we will argue how this result can still be of use for ISOKANN where the convergence of the χ_{n} provides us with an approximate description which we will use to compute the control.
This simple example provides a good point to get a feeling for how optimal importance sampling works. We can see that the control pushes the system in the direction of (relative) maximal ascent. If the system follows the forcing its expected evaluation increases, but the path taken also has an increased probability, resulting in a decreasing Girsanov weight. Both increments happen on a commensurate “relative scale”: the expectation value gets pushed by an amount relative to its current value $ ( \u2207 v v ) $ whereas the reweighting is adjusted relatively in magnitude due to the exponentiated integral (which may be seen as an infinite product over all timepoints) in (34). In this way the increase of the observable is balanced with the decreasing weight exactly so that no matter the path taken these always equalize and one obtains a zerovariance sampler.
Unfortunately however, the control becomes singular whenever v_{i}(x) = 0. According to the PerronFrobenius theorem, every nontrivial eigenfunction is unsigned, i.e. crosses the 0 at some point, so we have to find a way around that problem. We can alleviate this problem by shifting and [anticipating the form of $ S \u0304 $ in (12)] also rescaling the eigenfunction.
In this case we see that control is time dependent (which makes sense as the relative contributions of the dominant eigenfunctions to the expectation value change over time). From λ_{i}(t) ≤ λ_{i}(0) = 1 we can conclude that $ u * ( \u22c5 , t ) =\gamma \sigma \u22a4 \u2207 \chi \chi $ with function γ monotically increasing with γ(1) = 1, i.e. the control is pointing in the same direction as for the pure eigenfunction case, starting weaker and increasing until hitting the full magnitude at t = T.
Note that the requirement for χ functions to satisfy 0 ≤ χ ≤ 1, which so far was merely motivated by their interpretation as macrostates, now also facilitates their optimally controlled importance sampling.
D. Application to ISOKANN
In Sec. III C we have shown how to obtain a zerovariance sampler for the Koopman operator in terms of the gradient of its solution, either for general observables h or (shiftscaled) eigenfunctions v_{i} (χ). In either case the solution has to be known a priori as to compute the control. We will now argue how to integrate this result into the ISOKANN procedure.
The main idea of using optimal importance sampling in ISOKANN is to use the intermediate results χ_{n} and S_{n} to compute a pseudooptimal control as to lower the sampling variance. We therefore have to assume that using an approximation to the optimal control indeed leads to a variance reduction. Whereas we do not know of any proof to this statement it was shown that the objective of the associated optimal control problem is indeed convex in the control^{14} which leads us to conjecture that the variance should be wellbehaving for approximate optimal controls as well.
Note that the importance sampler (35) is unbiased for any control. Thus, as long as the variance does not become unbounded ISOKANN will still converge for a bad control (under the usual conditions for stochastic descent convergence, i.e. decaying learn rate). Note that the control can always be bound artificially by clipping, bounding the Girsanov reweighting term and in turn also the overall sampling variance.
Note that in the general ddimensional case the expectation values in the ISOKANN iterations K^{T}χ_{n} are vector valued. Optimal importance sampling however works only in the scalar case. Therefore we have to compute an individual control for sampling each component $ ( K T \chi n ) i $ individually.
In the case of 1DISOKANN the action of the Koopman operator on χ_{n} converges to a shiftscale as in (43) and we can therefore estimate the parameters α, β and λ_{2} from the extrema of Kχ_{n−1} as to apply the explicit control for the shiftscaled eigenfunction (44).
In this way (and with the above assumption) we obtain a feedback loop where better approximation of the χ functions results in in a better approximation of the action of K and hence in a better approximation of the optimal control. This pseudooptimal control in turn decreases the sampling variance which facilitates better approximation of the power iterates, i.e. the χ function.
As a proof of concept we will now illustrate the reduction of variance at the hand of the classic doublewell potential.
E. Example: Controlled 1DISOKANN for the double well
Let us consider the controlled process as stated in (33) for the doublewell potential (9) which leads to the simplest problem exhibiting metastable behavior and hence a challenging sample variance. In our experiments we compare the training performance of the ISOKANN algorithm, both, with and without the control (44).
We start with a randomly initialized fully connected network $ \chi 0 :R\u2192R$ with sigmoidal activation functions and 2 hidden layers, each of size 5 (i.e. with layer sizes 1 × 5 × 5 × 1). For each network generation n, we compute Monte Carlo approximations of the Koopman expectation at M = 30 positions. These are initially drawn uniformly from the interval [−2, 2] and subsequently obtained by χstratified subsampling as described in Sec. II F. From each starting position we then simulate K = 20 trajectories using the SROCK2 SDE integrator of strong order 1 with stepsize Δt = 0.001. The next generation n + 1 is trained against these M training points by L = 500 stochastic gradient descent steps using the ADAM optimizer (with learning rate η = 0.001). We repeat this evaluationtraining procedure (corresponding to a single power iteration) for a total of N = 50 iterations.
Let us now compare the uncontrolled with the controlled experiment. Figure 5 shows the square root of our training loss together with the standard deviation of the Monte Carlo estimator for the two cases of study. In 5(a) we observe that the uncontrolled system quickly (after 3 iterations) approaches its plateau at an error of about 10^{−1} but afterwards the training loss does not decrease any further. This comes at no surprise since the training data exhibits noise of the same magnitude, and we cannot expect the average loss to be lower than the noise in the data. Note, however, that even though the loss itself seems to have leveled off, this does not necessarily mean that the solution does not improve. Whilst the empirical loss will necessarily remain at the level of the noise, the solution could still converge due to the inherent averaging of that noise in the SGD method.
Looking at the controlled experiment, we observe in 5(b) that for the loss behaves similar to the uncontrolled experiment for the first 3 iterations, reaching a value of 10^{−1}. From there on however the loss decreases further getting close to 10^{−3} after 50 power iterations and still not having hit a plateau. Notice that the training noise, in strong contrast to the uncontrolled case, decreases rapidly from the beginning of the training. It is furthermore interesting to see that the training loss seems to be following the noise level closely, indicating that the sampling variance is indeed of high importance.
Looking more closely at these performance plots we can furthermore identify the individual training batches: The mean standard deviation (MSTD) is piecewise constant along each such batch, since the training data (and hence its standard deviation) is updated only inbetween the neural network training loops. Whereas the rootmeansquare error (RMSE) plateaus continuously decrease during each batch, we can observe how they jump up slightly at the beginning of each new batch. These jumps are caused by overfitting the previous batch (the plateau) without generalization to the following batch (the flank). We suspect that the performance of the training could be improved by taking the plateaus as indicators for the generation of new training data.
Let us finally look at Fig. 6, which shows the different learned χ functions after the application of the ISOKANN algorithm together with the evaluations of SKχ_{n−1}(x_{m}) at the M random locations x_{m}. The error bars represent the standard deviation of the individual MonteCarlo estimators, i.e. the noise in the training data. We see that both learned χ functions qualitatively match the expectation. The uncontrolled case however has problems reaching 0 resp. 1 at the boundaries, which can be understood as a result of the noise and the subsequent noisy estimation of the empirical shiftscale. Last we notice that the MonteCarlo standard deviation for the Koopman evaluation at the χsampled positions is considerably lower by the controlled approach.
IV. CONCLUSION
In this article we started by enhancing ISOKANN with new theoretical results that prove the strengths of ISOKANN, namely a convergence proof (Theorem 1) and a method for reconstructing eigenfunctions from χfunctions (Proposition 1). We also proposed a new adaptive sampling strategy called χstratified sampling, which complements ISOKANN well and deserves further investigation. By formulating ISOKANN in terms of the transformation S (6), we paved the way for higherdimensional χfunctions while generalizing the original 1DISOKANN. However, whereas we argued for using PCCA^{+} for the construction of S for d > 1 a more detailed study and a proof of convergence for this case remain open for future work.
The second main contribution in this article is the introduction of importance sampling into ISOKANN. Whereas we know that the resulting estimator is unbiased, we argue only heuristically why the variance should not explode. A proof of the convexity of the variance in the control, or even better, a proof of the convergence of the control in ISOKANN is still missing. Note, furthermore, that the concept of optimal importance sampling may be useful for the iterative solution of Koopman evaluations in general (cf. Corollary 2.1).
An important next step would be to apply controlled ISOKANN to an actual molecular dynamics (MD) system to test how well the introduced techniques fare with the complexities of realworld problems. This, however, requires a way to run many trajectories with different start locations and lowoverhead as well as to inject the optimal control into the MD simulations. We hope that once these interfaces are implemented, ISOKANN will enhance the research on molecular systems.
ACKNOWLEDGMENTS
We thank Luca Donati and Luzie Helfmann for their support in proofreading and insightful discussions. This research has been funded by Deutsche Forschungsgemeinschaft (DFG) through Grant No. CRC 1114 “Scaling Cascades in Complex Systems,” Project Nos. 235221301 and Project A05 “Probing scales in equilibrated systems by optimal nonequilibrium forcing.”
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
A. Sikorski: Conceptualization (lead); Formal analysis (lead); Methodology (lead); Software (lead); Visualization (lead); Writing – original draft (lead); Writing – review & editing (lead). E. Ribera Borrell: Conceptualization (supporting); Formal analysis (supporting); Methodology (supporting); Writing – original draft (supporting); Writing – review & editing (supporting). M. Weber: Conceptualization (supporting); Funding acquisition (lead); Methodology (supporting); Supervision (lead); Writing – review & editing (supporting).
DATA AVAILABILITY
The data that support the findings of this study are openly available in GitHub at https://github.com/axsk/OptImpSampling.jl/tree/jmp.^{15}