The inverse mechano-electrical problem in cardiac electrophysiology is the attempt to reconstruct electrical excitation or action potential wave patterns from the heart’s mechanical deformation that occurs in response to electrical excitation. Because heart muscle cells contract upon electrical excitation due to the excitation–contraction coupling mechanism, the resulting deformation of the heart should reflect macroscopic action potential wave phenomena. However, whether the relationship between macroscopic electrical and mechanical phenomena is well-defined and unique enough to be utilized for an inverse imaging technique in which mechanical activation mapping is used as a surrogate for electrical mapping has yet to be determined. Here, we provide a numerical proof-of-principle that deep learning can be used to solve the inverse mechano-electrical problem in phenomenological two- and three-dimensional computer simulations of the contracting heart wall, or in *elastic excitable media*, with muscle fiber anisotropy. We trained a convolutional autoencoder neural network to learn the complex relationship between electrical excitation, active stress, and tissue deformation during both focal or reentrant chaotic wave activity and, consequently, used the network to successfully estimate or reconstruct electrical excitation wave patterns from mechanical deformation in sheets and bulk-shaped tissues, even in the presence of noise and at low spatial resolutions. We demonstrate that even complicated three-dimensional electrical excitation wave phenomena, such as scroll waves and their vortex filaments, can be computed with very high reconstruction accuracies of about $95%$ from mechanical deformation using autoencoder neural networks, and we provide a comparison with results that were obtained previously with a physics- or knowledge-based approach.

The beating of the heart is triggered by electrical activity, which propagates through the heart tissue and initiates heart muscle contractions. Abnormal electrical activity, which induces irregular heart muscle contractions, is the driver of life-threatening heart rhythm disorders, such as atrial or ventricular tachycardia or fibrillation. Presently, this abnormal electrical activity cannot be visualized in full, as imaging technology that can penetrate the heart muscle tissue and resolve the inherently three-dimensional electrical phenomena within the heart walls has yet to be developed. A better understanding of the abnormal electrical activity and the ability to visualize it non-invasively and in real-time is necessary for the advancement of therapeutic strategies. In this paper, we demonstrate that machine learning can be used to reconstruct the electrical activity from the mechanical deformation, which occurs in response to the electrical activity, in a simplified computer model of a piece of the heart wall. Our study suggests that, in the future, machine learning algorithms could be used in combination with a high-speed 3D ultrasound, for instance, to determine the hidden three-dimensional electrical wave phenomena inside the heart walls and to non-invasively image heart rhythm disorders or other electromechanical dysfunctions in patients.

## I. INTRODUCTION

The heart’s function is routinely assessed using either electrocardiography or echocardiography. Both measurement techniques provide complementary information about the heart. Whereas electrocardiographic imaging^{1–4} or other electrical techniques, such as catheter-based electro-anatomic mapping,^{5–8} electrode contact mapping,^{9–11} and high-density electrical mapping^{12,13} provide insights into the heart’s electrophysiological state, at relatively high spatial and temporal resolutions on the inner or outer surface of the heart chambers, echocardiography, in contrast, provides information about the heart’s mechanical state,^{14} capturing mechanical contraction and deformation that occurs in response to electrical excitation throughout the entire heart and the depths of its walls. Therefore, electrical imaging provides information that mechanical imaging does not provide and vice versa. The integrated assessment of both cardiac electrophysiology and mechanics,^{15,16} either through simultaneous multi-modality imaging or the processing and interpretation of one modality in the context of the other, could greatly advance diagnostic capabilities and provide a better understanding of cardiac function and pathophysiology. In particular, the analysis of cardiac muscle deformation in the context of electrophysiological activity could help to fill in the missing information that is not accessible with current electrical imaging techniques.

Because heart muscle cells contract in response to an action potential due to the excitation–contraction coupling mechanism,^{17} see Fig. 1, the resulting deformation on the whole organ level should reflect the underlying electrophysiological dynamics. In anticipation of this, it has been proposed to compute action potential wave patterns from the heart’s mechanical deformations using inverse numerical schemes.^{18} Furthermore, it was recently demonstrated that cardiac tissue deformation and electrophysiology can be strikingly similar even during heart rhythm disorders.^{19} More specifically, it was shown that focal or rotational electrophysiological wave phenomena, such as action potential or calcium waves, visible on the heart surface during ventricular arrhythmias^{20–22} induce focal or rotational mechanical wave phenomena within the heart wall, and it was demonstrated that these phenomena can be resolved using high-resolution 4D echocardiography.^{19} The experimental evidence supports the notion of electromechanical waves^{23,24} that propagate as coupled voltage-, calcium-, and contraction waves through the heart muscle.^{19} The high correlation between electrical and mechanical phenomena^{16,19} and their wave-like nature^{19,23,24} has recently motivated the development of an inverse mechano-electrical numerical reconstruction technique,^{25} which utilizes, in particular, the wave-like nature of strain phenomena propagating through the heart muscle. By assimilating observations of these wave-like strain phenomena into a computer model and continuously adapting or synchronizing the model to the observations, such that it develops electrical excitation wave patterns, which in turn cause the model to deform and reproduce the same strain patterns as in the observations, the overall dynamics of the model constitute a reconstruction of the observed dynamics, including the excitation wave dynamics. Consequently, it was demonstrated *in silico* with synthetically generated data that even complicated three-dimensional electrical excitation wave patterns, such as fibrillatory scroll waves and their corresponding vortex filaments, can be reconstructed inside a bulk tissue solely from observing its mechanical deformations.

In this study, we present an alternative approach to solving the inverse mechano-electrical problem using deep learning. The field of machine learning and artificial intelligence has seen tremendous progress over the past decade,^{26} and techniques such as convolutional neural networks (CNNs) have been widely adopted in the life sciences.^{27,28} Convolutional neural networks^{29} are a deep learning technique with a particular neural network architecture specialized in learning features in image or time-series data. They have been widely used to classify images,^{30} segment images,^{31} or recognize features in images, e.g., faces,^{32} with very high accuracies and have been applied in biomedical imaging to segment cardiac magnetic resonance imaging (MRI), computed tomography (CT), and ultrasound recordings.^{33} Convolutional neural networks are supervised machine learning techniques, as they are commonly trained on labeled data. Autoencoders are an unsupervised deep learning technique with a particular network architecture, which consists of two convolutional neural network blocks: an encoder and a decoder. The encoder translates an input into an abstract representation in a higher dimensional feature space, the so-called *latent space*, from where the decoder translates it back into a lower dimensional output and in doing so can generally translate an arbitrary input into an arbitrary output [see Fig. 2(a)]. Convolutional autoencoders employ convolutional neural network layers and are, therefore, particularly suited to process and translate image data: autoencoders are typically used for image denoising,^{34,35} image segmentation,^{31} image restoration or inpainting,^{36} or for enhancing image resolution.^{37} Autoencoders and other machine or deep learning techniques, such as reservoir computing or echo state networks,^{38–40} were recently used to replicate and predict chaotic dynamics in complex systems.^{41–45} Particularly, combinations of autoencoders with conditional random fields,^{44} as well as echo state neural networks,^{43} were recently applied to predict the evolution of electrical spiral wave chaos in excitable media and to “cross-predict” one dynamic variable from observing another. Moreover, it was recently shown that convolutional neural networks can be used to detect electrical spiral waves in excitable media,^{46} and, last, deep learning was used to estimate the deformation between two cardiac MRI images^{47} and the ejection fraction in ultrasound movies of the beating heart.^{48} The recent progress and applications of deep learning to excitable media and cardiac deformation quantification suggests that deep learning can also be applied to solving the inverse mechano-electrical problem.

Here, we apply an autoencoder-based deep learning approach to reconstruct excitation wave dynamics in elastic excitable media by observing and processing mechanical deformation that was caused by the excitation. We developed a neural network with a 2D and 3D convolutional autoencoder architecture, which is capable of learning mechanical spatiotemporal patterns and translating them into corresponding electrical spatiotemporal patterns. We trained the neural network on a large set of image- and video-image pairs, showing on one side mechanical deformation and on the other side electrical excitation patterns, to have the network learn the complex relationship between excitation, active stress, and deformation in computer simulations of an elastic excitable medium with muscle fiber anisotropy (see Fig. 2). We consequently show that it is possible to predict or reconstruct electrical excitation wave patterns, even complicated two- and three-dimensional spiral or scroll wave chaos, from the deformations that they have caused and, therefore, provide a numerical proof-of-principle that the inverse mechano-electrical problem can be solved using machine learning.

## II. MATERIALS AND METHODS

We generated two- and three-dimensional synthetic data of excitation waves in an accordingly deforming elastic excitable medium with muscle fiber anisotropy and used the data to train a convolutional autoencoder neural network, see Fig. 2(c), to learn the complex relationship between excitation and direct kinematic quantities such as tissue displacements as the medium deforms due to the excitation via the excitation–contraction coupling mechanism [see Sec. II B]. The trained network was then used to estimate time-varying spatial distributions of electrical excitation that caused a particular time-varying mechanical deformation (see Fig. 2). The neural network uses static or dynamic images or videos of mechanical deformation as input, respectively, and finds the corresponding distributions of excitation that had caused the deformation and returns them as output. We then compared the estimated excitation to the ground truth excitation. Next to excitation, also, active stress can be estimated independently from mechanical deformation.

### A. Neural network architecture

We developed a convolutional autoencoder neural network comprising convolutional layers, activation layers, and maxpooling^{49} layers, an encoding stage that uses a two- or three-dimensional vector field of tissue displacements $u\u2192\u2208R2,R3$ as input, and a decoding stage that outputs a two- or three-dimensional scalar-valued spatial distribution of estimates for the excitation $V~\u2208R$, respectively, which are approximations of the ground truth excitation $V\u2208R$ that had originally caused the deformation. Next to the electrical excitation, the neural network can also be trained to estimate the distribution of active stress $Ta\u2208R$ (see also Sec. II B). More precisely, the input vector field can optionally either be (i) a static two-dimensional vector field $u\u2192x,y(x,y)$ describing a deformation in a two-dimensional space, (ii) a static three-dimensional vector field $u\u2192x,y,z(x,y,z)$ describing a deformation in a three-dimensional space, or (iii) a time-varying two-dimensional vector field $u\u2192x,y(x,y,t)$ describing motion and deformation in a two-dimensional space. The latter vector fields are given as short temporal sequences of two- or three-dimensional vector fields, respectively, e.g., ${u\u2192t\u22122,u\u2192t\u22121,u\u2192t0}$ with the temporal offset $\tau =|t0\u2212t\u22121|$ between vector fields being about $5%$ of the dominant period of the activity, e.g., the spiral period. The vector fields describe displacements of the tissue $u\u2192r$ with respect to either the stress-free, undeformed mechanical reference configuration $\chi 0$ or, alternatively, instantaneous shifts $u\u2192i$ from frame to frame with respect to either the stress-free, undeformed configuration $\chi 0$ or an arbitrary configuration $\chi t$ [see Figs. 3(c) and 10(c)]. The basic network architecture is a convolutional autoencoder with three stages in the encoding and decoding parts, respectively [see Fig. 1(b)]. Each stage corresponds to a convolutional layer, an activation layer, and a maxpooling or upsampling layer, respectively. Table I summarizes the different autoencoder models used in this work, which all share a similar autoencoder network architecture. In the following, we distinguish the different autoencoder models by whether they process two-dimensional or three-dimensional data, by whether they process static or time-varying data, as well as by the spatial input resolution and filter numbers of the convolutional layers. In addition, we distinguish the models by the type of data that they were exposed to during training (see Sec. II C, Table I, and Fig. 4). The notation (s) or (f) at the end of the model description indicates that the model was just trained on spiral chaos or focal wave data. Otherwise, it was trained on both data types.

Model . | Model param. . | Convolutional layers— Number of filters . | Input ch. . | Input size . | Accuracy . | Training duration . | Training data type . |
---|---|---|---|---|---|---|---|

2Dt-A3 | 1 332 033 | 64↓128↓256↓–256↑128↑64↑ | 6 | 128 × 128 | 96.2% ± 3.1% | 15–20 min | Focal, spiral, chaos |

2Dt-A3f | 1 332 033 | 64↓128↓256↓–256↑128↑64↑ | 6 | 128 × 128 | 98.6% ± 1.4% | 15–20 min | Focal |

2Dt-A3s | 1 332 033 | 64↓128↓256↓–256↑128↑64↑ | 6 | 128 × 128 | 95.1% ± 2.6% | 15–20 min | Spiral, chaos |

2Dt-A2s | 1 330 881 | 64↓128↓256↓–256↑128↑64↑ | 4 | 128 × 128 | 94.8% ± 2.6% | 15–20 min | Spiral, chaos |

2Ds-A1s | 1 329 729 | 64↓128↓256↓–256↑128↑64↑ | 2 | 128 × 128 | 93.4% ± 3.2% | 15–20 min | Spiral, chaos |

2Ds-A1 | 1 329 729 | 64↓128↓256↓–256↑128↑64↑ | 2 | 128 × 128 | 95.9% ± 3.3% | 15–20 min | Focal, spiral, chaos |

2Dt^{′}-A2s | 1 330 881 | 64↓128↓256↓–256↑128↑64↑ | 4 | 128 × 128 | 94.8% ± 2.6% | 15–20 min | Spiral, chaos (inst.) |

2Dt^{′}-A1s | 1 329 729 | 64↓128↓256↓–256↑128↑64↑ | 2 | 128 × 128 | 93.5% ± 3.2% | 15–20 min | Spiral, chaos (inst.) |

2Dt*-A2s | 1 330 881 | 64↓128↓256↓–256↑128↑64↑ | 4 | 128 × 128 | 94.8% ± 2.6% | 15–20 min | Spiral, chaos (ref. χ_{t}) |

2Ds*-A1s | 1 329 729 | 64↓128↓256↓–256↑128↑64↑ | 2 | 128 × 128 | 93.5% ± 3.0% | 15–20 min | Spiral, chaos (ref. χ_{t}) |

2Dt-B3s | 334 241 | 32↓64↓128↓–128↑64↑32↑ | 6 | 128 × 128 | 94.5% ± 2.7% | 5–10 min | Spiral, chaos |

2Ds-B1s | 333 089 | 32↓64↓128↓–128↑64↑32↑ | 2 | 128 × 128 | 92.4% ± 3.5% | 5–10 min | Spiral, chaos |

2Dt-C3s | 229 329 | 64↓128↓–128↑64↑ | 6 | 128 × 128 | 92.7% ± 3.4% | 5–10 min | Spiral, chaos |

2Dt-D3s | 741 953 | 128↓256↓–256↑128↑64↑ | 6 | 64 × 64 | 94.1% ± 3.1% | 5–10 min | Spiral, chaos |

2Ds-E1s | 518 337 | 64↓–256↑128↑64↑ | 2 | 32 × 32 | 89.9% ± 4.3% | 5–10 min | Spiral, chaos |

2Dt-F3s | 520 641 | 64–256↑128↑64↑ | 6 | 16 × 16 | 90.8% ± 4.7% | 5–10 min | Spiral, chaos |

2Ds-F1s | 518 337 | 64–256↑128↑64↑ | 2 | 16 × 16 | 87.3% ± 5.0% | 5–10 min | Spiral, chaos |

3Ds-A1 | 1 000 131 | 32↓64↓128↓–128↑ 64↑32↑ | 3 | 104 × 104 × 24 | 95.7% ± 0.5% | 2.5 h | Chaos |

3Ds-B1 | 1 000 131 | 32-64↓128↓–128↑64↑32↑ | 3 | 52 × 52 × 12 | 95.7% ± 0.6% | 1–2 h | Chaos |

3Ds-C1 | 947 331 | 64↓128↓–128↑64↑32↑ | 3 | 52 × 52 × 12 | 94.8% ± 0.6% | 1–2 h | Chaos |

3Ds-D1 | 1 000 131 | 32-64-128↓–128↑64↑32↑ | 3 | 26 × 26 × 6 | 94.8% ± 0.7% | 1–2 h | Chaos |

3Ds-E1 | 731 139 | 128↓–128↑64↑32↑ | 3 | 26 × 26 × 6 | 92.6% ± 0.9% | 1–2 h | Chaos |

3Ds-F1 | 1 000 131 | 32-64-128–128↑64↑32↑ | 3 | 13 × 13 × 3 | 89.4% ± 1.4% | 1–2 h | Chaos |

3Ds-G1 | 288 387 | 128↑64↑32↑ | 3 | 13 × 13 × 3 | 85.1% ± 1.4% | 1–2 h | Chaos |

Model . | Model param. . | Convolutional layers— Number of filters . | Input ch. . | Input size . | Accuracy . | Training duration . | Training data type . |
---|---|---|---|---|---|---|---|

2Dt-A3 | 1 332 033 | 64↓128↓256↓–256↑128↑64↑ | 6 | 128 × 128 | 96.2% ± 3.1% | 15–20 min | Focal, spiral, chaos |

2Dt-A3f | 1 332 033 | 64↓128↓256↓–256↑128↑64↑ | 6 | 128 × 128 | 98.6% ± 1.4% | 15–20 min | Focal |

2Dt-A3s | 1 332 033 | 64↓128↓256↓–256↑128↑64↑ | 6 | 128 × 128 | 95.1% ± 2.6% | 15–20 min | Spiral, chaos |

2Dt-A2s | 1 330 881 | 64↓128↓256↓–256↑128↑64↑ | 4 | 128 × 128 | 94.8% ± 2.6% | 15–20 min | Spiral, chaos |

2Ds-A1s | 1 329 729 | 64↓128↓256↓–256↑128↑64↑ | 2 | 128 × 128 | 93.4% ± 3.2% | 15–20 min | Spiral, chaos |

2Ds-A1 | 1 329 729 | 64↓128↓256↓–256↑128↑64↑ | 2 | 128 × 128 | 95.9% ± 3.3% | 15–20 min | Focal, spiral, chaos |

2Dt^{′}-A2s | 1 330 881 | 64↓128↓256↓–256↑128↑64↑ | 4 | 128 × 128 | 94.8% ± 2.6% | 15–20 min | Spiral, chaos (inst.) |

2Dt^{′}-A1s | 1 329 729 | 64↓128↓256↓–256↑128↑64↑ | 2 | 128 × 128 | 93.5% ± 3.2% | 15–20 min | Spiral, chaos (inst.) |

2Dt*-A2s | 1 330 881 | 64↓128↓256↓–256↑128↑64↑ | 4 | 128 × 128 | 94.8% ± 2.6% | 15–20 min | Spiral, chaos (ref. χ_{t}) |

2Ds*-A1s | 1 329 729 | 64↓128↓256↓–256↑128↑64↑ | 2 | 128 × 128 | 93.5% ± 3.0% | 15–20 min | Spiral, chaos (ref. χ_{t}) |

2Dt-B3s | 334 241 | 32↓64↓128↓–128↑64↑32↑ | 6 | 128 × 128 | 94.5% ± 2.7% | 5–10 min | Spiral, chaos |

2Ds-B1s | 333 089 | 32↓64↓128↓–128↑64↑32↑ | 2 | 128 × 128 | 92.4% ± 3.5% | 5–10 min | Spiral, chaos |

2Dt-C3s | 229 329 | 64↓128↓–128↑64↑ | 6 | 128 × 128 | 92.7% ± 3.4% | 5–10 min | Spiral, chaos |

2Dt-D3s | 741 953 | 128↓256↓–256↑128↑64↑ | 6 | 64 × 64 | 94.1% ± 3.1% | 5–10 min | Spiral, chaos |

2Ds-E1s | 518 337 | 64↓–256↑128↑64↑ | 2 | 32 × 32 | 89.9% ± 4.3% | 5–10 min | Spiral, chaos |

2Dt-F3s | 520 641 | 64–256↑128↑64↑ | 6 | 16 × 16 | 90.8% ± 4.7% | 5–10 min | Spiral, chaos |

2Ds-F1s | 518 337 | 64–256↑128↑64↑ | 2 | 16 × 16 | 87.3% ± 5.0% | 5–10 min | Spiral, chaos |

3Ds-A1 | 1 000 131 | 32↓64↓128↓–128↑ 64↑32↑ | 3 | 104 × 104 × 24 | 95.7% ± 0.5% | 2.5 h | Chaos |

3Ds-B1 | 1 000 131 | 32-64↓128↓–128↑64↑32↑ | 3 | 52 × 52 × 12 | 95.7% ± 0.6% | 1–2 h | Chaos |

3Ds-C1 | 947 331 | 64↓128↓–128↑64↑32↑ | 3 | 52 × 52 × 12 | 94.8% ± 0.6% | 1–2 h | Chaos |

3Ds-D1 | 1 000 131 | 32-64-128↓–128↑64↑32↑ | 3 | 26 × 26 × 6 | 94.8% ± 0.7% | 1–2 h | Chaos |

3Ds-E1 | 731 139 | 128↓–128↑64↑32↑ | 3 | 26 × 26 × 6 | 92.6% ± 0.9% | 1–2 h | Chaos |

3Ds-F1 | 1 000 131 | 32-64-128–128↑64↑32↑ | 3 | 13 × 13 × 3 | 89.4% ± 1.4% | 1–2 h | Chaos |

3Ds-G1 | 288 387 | 128↑64↑32↑ | 3 | 13 × 13 × 3 | 85.1% ± 1.4% | 1–2 h | Chaos |

The models for (time-varying) two-dimensional data use padded two-dimensional convolutional layers (2D-CNN) with filter size $3\xd73$ and rectified linear unit^{50} (ReLU) as activation function. All convolutional layers of models 3Ds-xx are padded three-dimensional convolutions (3D-CNN) with filter size $3\xd73\xd73$ followed directly by a batch normalization^{51} layer (to accelerate the training and improve the accuracy of the network) and a ReLU activation layer. Some two-dimensional models (e.g., 2Dt-A3/s/f and/or 2Dt-B3s) read a series of three subsequent two-dimensional video images as input and the individual images showing mechanical deformations induced by focal (f) and/or spiral wave chaos (s) at time $t0$ and two previous time steps $t\u22121$ and $t\u22122$ (see also Sec. III C). Therefore, note that, if we refer to “video images/frames” or “samples,” each of these samples may refer to a single or a short series of two to three frames in the case that the network analyzes spatiotemporal input. Model 2Ds-A1 reads only a single video image (two input channels) and, therefore, processes only static deformation data. The number of input channels of the first network layer is $nc=nd\u22c5nT$ for all models, where $nd$ is the number of dimensions of each input displacement vector and $nT$ is the number of time steps that are used for the reconstruction, e.g., $2\u22c53=6$ input channels correspond to $ux$- and $uy$-components of a two-dimensional vector field [see Fig. 3(d)] for three time steps. The spatial input size of the models is given in Table I and depends on whether spatial subsampling of the dataset is used [see Sec. III G and Fig. 13(d)]. The last network layer for all models is one convolutional filter using a sigmoid activation function, and the spatial output size is $128\xd7128$ for 2D-CNNs and $104\xd7104\xd724$ for 3D-CNNs for all models (see Sec. II C).

For instance, model 2Dt-B3s processes time-varying two-dimensional (2D+t) video data and retains $334241$ trainable parameters (see Table I). The network consists of one input layer of size $128\xd7128\xd76$ with six channels for three video frames, each frame containing two components for $ux$- and $uy$-displacements, respectively, and seven convolutional layers in total. The encoding part consists of a total of three convolutional layers of sizes $128\xd7128\xd732$, $64\xd764\xd764$, and $32\xd732\xd7128$, each followed by a maxpooling layer of filter size $2\xd72$ reducing the spatial size by a factor of 2. The decoding part of the autoencoder network architecture contains 3 convolutional layers of sizes $16\xd716\xd7128$, $32\xd732\xd764$, and $64\xd764\xd732$, each followed by an upsampling layer of filter size $2\xd72$ increasing the spatial size by a factor of 2 and a final convolutional layer of size $128\xd7128\xd71$ with a sigmoid activation function. All networks were implemented in Keras^{52} with the Tensorflow^{53}\enlargethispage{-1pt} backend.

### B. Training data generation using computer simulations of elastic excitable media

Two- and three-dimensional deformation and excitation wave data were generated using computer simulations. The source code for the computer simulations is available in Ref. 25. In short, nonlinear waves of electrical excitation, refractoriness, and active stress were modeled in two- or three-dimensional simulation domains representing the electrical part of an elastic excitable medium using partial differential equations and an Euler finite differences numerical integration scheme, as described previously,^{54–56}

Here, $V$, $r$, and $Ta$ are dimensionless, normalized dynamic variables for excitation (voltage), refractoriness, and active stress, respectively, and $k$ and $a$ are electrical parameters, which influence properties of the excitation waves, such as their wavelengths. Together with the diffusive term in Eq. (1),

including the diffusion constant $D$, the model produces nonlinear waves of electrical excitation (anisotropic in a two-dimensional model) followed by waves of active stress and contraction, respectively. The contraction strength or magnitude of active stress is regulated by the parameter $kT$.

The electrical model facilitates stretch-activated mechano-electrical feedback via a stretch-induced ionic current $Is$, which modulates the excitatory dynamics upon stretch,

The current strength depends on the area $A$ of one cell of the deformable medium, the equilibrium potential (here, $Es=1$), and the parameter $Gs$, which regulates the maximal conductance of the stretch-activated ion channels. With excitation–contraction coupling and stretch-activated mechano-electrical feedback, the model is coupled in a forward and backward direction.

Soft-tissue mechanics were modeled using a two- or three-dimensional mass-spring damper particle system with tunable fiber anisotropy and a Verlet finite differences numerical integration scheme, similarly as described previously.^{25,57–59} In short, the elastic simulation domain consists of a regular grid of quadratic or hexahedral cells (pixels/voxels), respectively, each cell being defined by vertices and lines or faces, respectively. The edges are connected by passive elastic springs. At the center of each cell is a set of two or three orthogonal springs that can be arbitrarily oriented. One of the springs is an active spring, which contracts upon excitation or active stress and is aligned along a defined fiber orientation. In the two-dimensional model, muscle fibers are aligned uniformly or linearly transverse and can be aligned in any arbitrary direction (0–$360\xb0$). In the three-dimensional model, muscle fibers are organized in an orthotropic stack of sheets, which are stacked in the $ez$-direction with the fiber orientation rotating by a total angle of $90\xb0$ throughout the stack. The muscle fiber organization leads to highly anisotropic contractions and deformations of the sheet or bulk tissues. The elastic part of the model exhibits large deformations. The size of the two- and three-dimensional simulation domains was $200\xd7200$ cells/pixels and $100\xd7100\xd724$ cells/voxels, respectively, where one electrical cell corresponds to one mechanical cell.

Figure 1(a) shows an example of an excitation wave propagating through an accordingly deforming tissue (from left to right). A depolarized or an excited tissue is shown in white and a repolarized or resting tissue is shown in black (normalized units on the interval [0,1]). The excitation wave is followed by an active stress wave, which exerts a contractile force along fibers (red arrows). Note that the contraction sets in at the tail of the excitation wave since there is a short electromechanical delay between excitation and active stress. The numerical simulation was implemented in C++ and runs on a CPU and uses multi-threading for parallel computation on multiple cores.

### C. Training data and training procedure

Using the computer simulation described in Sec. II B, we generated two- and three-dimensional training datasets consisting of corresponding mechanical and electrical data. The two-dimensional training data include two datasets, one with focal wave data and the other with single spiral wave and spiral wave chaos data [see Figs. 4(a) and 4(b)]. The focal dataset includes data from $200$ different simulations of $200$ focal or target waves originating from randomly chosen stimulation sites, as shown in Fig. 4(a). The simulations were initiated with randomized electrical parameters $a\u2208[0.05,0.1]$ (in steps of 0.01), $k\u2208[7,8,9]$, $Gs\u2208[0,1]$ (in steps of 0.1), and randomized values for the diffusion constant $D\u2208[0.1,1]$ (in steps of 0.1) and fiber angle $\alpha \u2208[0\xb0,90\xb0]$ (in steps of $1\xb0$). Each focal wave sequence is comprised of $80$ video frames, which were each saved every $50$ simulation time steps. Using data-augmentation, the size of the focal training data was increased to $\u224863000$ frames, and then $20000$ frames were randomly chosen for training; see also Fig. 11(c). Data-augmentation obviated computing additional training data in computer simulations. During data-augmentation, the two-dimensional data were first rotated by $90\xb0$ and then flipped both horizontally and vertically such that all fiber alignments from 0 to $360\xb0$ were equally likely to be present in the data. The spiral wave and spiral wave chaos dataset includes data from ten different simulations with either single stationary or meandering spiral waves or persistent or decaying spiral wave chaos, as shown in Fig. 4(b). Each of the ten datasets was initiated with slightly different electrical and/or mechanical parameters with a fiber alignment of $0\xb0$, $15\xb0$, $30\xb0$, or $45\xb0$ (and accordingly $180\xb0$, $195\xb0$, $210\xb0$, and $225\xb0$, respectively) and consists of 300–800 video frames, which were each saved every $50$ simulation time steps. The ten datasets contained in total about $7000$ video frames. Using data-augmentation, the size of the spiral chaos training data was increased to $\u224857000$ frames and then $20000$ frames were randomly chosen for training [see also Fig. 11(c)]. During data-augmentation, the two-dimensional data were first rotated by $90\xb0$ such that in addition to the $0\xb0$, $15\xb0$, $30\xb0$, and $45\xb0$ fiber alignments, see Sec. II B, the data also included $90\xb0$, $105\xb0$, $120\xb0$, and $135\xb0$ fiber alignments. Then, the data were flipped both horizontally and vertically such that all fiber alignments from 0 to $360\xb0$ in steps of $15\xb0$, including $60\xb0$, $75\xb0$, $150\xb0$, and $165\xb0$ and their corresponding inverse vectors were present in the training data. For training on both focal and spiral chaos data, $20000$ frames were randomly chosen from both datasets with $50%$ of the frames being focal and $50%$ being spiral chaos data. Each video frame of both the focal and spiral chaos datasets was resized from $200\xd7200$ cells to $128\xd7128$ pixels before being provided as input to the network.

The three-dimensional training datasets contain initially $8400$ volume frames selected randomly from a set of $11$ different chaotic simulations containing $2000$ volume frames each. Using data-augmentation, we increased the size of each three-dimensional training dataset to $\u224867000$ frames by randomly flipping frames along the $ex$-, $ey$-, and/or $ez$-axis. Validation was performed after training on a different dataset containing $16500$ volume frame data from $11$ different simulations with $1500$ volume frames each. Each training or validation simulation uses a different value for the mechano-electrical feedback strength $Gs\u2208[0,2]$ (in steps of $0.2$) [see Fig. 7(e)]; the electrical parameters $a=0.05$, $k=8$, and the diffusion constant $D=0.05$ are the same for all simulations. The fiber angle $\alpha $ rotates linearly between $0\xb0$ (bottom) and $90\xb0$ (top) of the thin three-dimensional bulk. The simulation data were padded with zeros from the simulation domain of $100\xd7100\xd724$ cells to $104\xd7104\xd724$ voxels to fit the network architecture, and the neural network predictions are truncated back to $100\xd7100\xd724$ voxels for analysis.

The two- and three-dimensional training data were specifically not used during the reconstruction. For validation purposes (e.g., determining the reconstruction error), the two-dimensional reconstructions were performed on a validation dataset including $20000$ video frames. All validation datasets were separate datasets in order for the network to always exclusively estimate excitation patterns from data that it had not already seen during training. A fraction of 20% of the frames of the training datasets were used for testing during training. To simulate noisy mechanical measurement data, we added noise to a fraction of the mechanical training data (see Fig. 12). The noise, normally distributed with varying average amplitudes, was added to the individual displacement vector components independently in each frame and independently over time.

The 2D-CNNs were trained with a batch size of $128$ using the Adadelta^{60} optimizer with a learning rate of $0.001$ and binary cross entropy as loss function, whereas the 3D-CNNs were trained with a batch size of $16$ using the Adam^{61} optimizer with a learning rate of $0.001$ and mean squared error as loss function. All models were typically trained for $50$ epochs, if not stated otherwise (see also Fig. 11). With the models 2Dt-xx, the training procedure takes roughly $20$ s per epoch or 15–20 min in total with $\u223c20000$ frames or samples, respectively. With model 3Ds-A1, the training procedure takes roughly $160$ s per epoch or $2.5$ h in total for $50$ epochs with $8400$ samples, which are augmented during training. Training and reconstructions were performed on a Nvidia GeForce RTX 2080 Ti graphics processing unit (GPU).

### D. Reconstruction accuracy and error

The overall reconstruction error of a neural network model (2D/3D) was determined by calculating the absolute differences between all estimated excitation values $V~$ and original ground truth excitation values $V$,

and calculating the average over all pixels or voxels and all frames (mean absolute error: MAE), respectively. The reconstruction accuracy is implicitly given as $1\u2212\u27e8\Delta V\u27e9$. Throughout this study, the uncertainty of the reconstruction error or accuracy is stated either as (i) the standard deviation $\sigma \u27e8V\u27e9$ of the per-frame-average of the reconstruction error across the frames or (ii) the standard deviation $\sigma V$ of all absolute difference values over all pixels or voxels in all frames (stated in brackets) [cf. Figs. 11(b) and 11(c), black and gray error bars, respectively]. For two-dimensional data, the reconstruction error was calculated using $N=20000$ estimations or frames.

## III. RESULTS

Autoencoder neural networks can be used to compute electrical excitation wave patterns from mechanical motion and deformation in a generic two-dimensional sheet or three-dimensional bulk-shaped numerical simulations of cardiac muscle tissues with very high reconstruction accuracies of 90%–98% (see Table I). Various two- and three-dimensional electrical excitation wave phenomena, such as focal waves, planar waves, spiral and scroll waves, and spatiotemporal chaos, can be reconstructed from mechanical motion and deformation, even in the presence of noise, low resolution, and using either spatiotemporal or static mechanical data (see Figs. 5–12 and movies 1–10 in the supplementary material).

### A. Recovery of two-dimensional electrical excitation wave patterns from mechanical deformation

Figures 5 and 6 demonstrate that various two-dimensional electrical excitation wave patterns, such as focal waves originating from random stimulation sites, as well as linear waves, single spiral and multi-spiral wave patterns, and even spiral wave chaos, can be reconstructed with very high precision from the resulting mechanical deformation using an autoencoder neural network (see also movies 1 and 7 in the supplementary material). It is important to take notice of the fact that the neural network solely processes and analyzes tissue displacements, does not analyze pre-computed strains or stresses during the processing, and does not have direct knowledge about the underlying cardiac muscle fiber alignment. Neither during training nor during the reconstruction was the underlying muscle fiber direction known to the network. However, the contractile forces triggered by the electrical excitation wave patterns act along an underlying linearly transverse muscle fiber organization, which leads to highly anisotropic macroscopic contractions and deformations of the tissue, which is then reflected in the data. The top panels in Figs. 5(a), 5(b), and 6(a) show the deformations that were analyzed by the network, the two top rows in Figs. 5(a), 5(b), and 6(b) show the reconstructed electrical excitation patterns $u~(x,y)$, and the two bottom rows in Figs. 5(a), 5(b), and 6(c) show the original ground truth excitation patterns $u(x,y)$ during focal activity and during spiral wave chaos, respectively. The different muscle fiber alignments are indicated by black arrows: $40\xb0$ in Fig. 5(a), $1\xb0$ in Fig. 5(b), and $0\xb0$ or linearly transverse in the $ey$-direction, $90\xb0$ or linearly transverse in the $ex$-direction, and $30\xb0$ and $120\xb0$ from left to right in Fig. 6(a).

In many cases, the autoencoder’s reconstructions are almost visually indistinguishable from the original excitation patterns, e.g., as seen in Fig. 5(a) or 6, with reconstruction errors $\u27e8\Delta V\u27e9$ in the order of 2%–6%. On spiral chaos data (model 2Dt-A3s, see Table I), the reconstruction accuracy is $95.1%\xb12.6%(\xb18.5%)$; on focal data (model 2Dt-A3f), the reconstruction accuracy is $98.6%\xb11.4%(\xb14.9%)$; and on a mix of spiral chaos and focal data (50%–50% mix, model 2Dt-A3), the reconstruction accuracy is $96.2%\xb13.1%(\xb17.8%)$, respectively (see Figs. 5 and 6 and also Secs. III C–III E for a more detailed discussion). Figure 6(d) shows the absolute difference between reconstructed and ground truth excitation $\Delta V=|V~(x,y)\u2212V(x,y)|$ during spiral wave chaos. One sees that residual reconstruction errors occur mostly at the waves fronts and backs, but that the reconstructed and original excitation patterns are largely congruent. In Fig. 5(b), one notices that the reconstructions at times also contain clearly visible artifacts, particularly for focal waves with longer wavelengths. However, the larger artifacts are relatively rare; see also movie 7 in the supplementary material for an impression of their prevalence over time and across stimuli. Accordingly, the standard deviation $\sigma \u27e8V\u27e9$ across frames is in the order of $\xb12%$ to $\xb15%$, whereas the standard deviation $\sigma V$ across all pixels is in the order of $\xb15%$ to $\xb110%$ for both focal and spiral wave chaos data. Overall, the inverse mechano-electrical reconstruction’s accuracy is very high and consistently greater than $90%$, regardless of the particular neural network model (see Table I and Secs. III C–III G). Even spiral waves with both long and short and changing wavelengths during breathing instabilities or alternans can be recovered from analyzing the correspondingly induced mechanical deformation (see third and fourth columns in Fig. 6). During spiral wave chaos, the reconstruction error does not change significantly over time, regardless of whether single spiral waves, multiple spiral waves, or spiral wave chaos are estimated with the same autoencoder network (e.g., model 2Dt-A3s) [see also Fig. 8(c)]. Training the network models on either just focal data or just spiral chaos data, as shown in Figs. 4(a) and 4(b), or on a mixture of both data types may be advantageous; see Sec. III E for a more detailed discussion on training bias. The estimation of a two-dimensional excitation wave pattern ($128\xd7128$ pixels) can be performed within less than $0.8ms$ by the neural network (model 2Dt-A3, average from $1000$ frames, Nvidia GPU, see Sec. II). The data shown in Figs. 5–6 and 10–12 were not seen by the network during the training procedure.

### B. Recovery of three-dimensional electrical excitation wave patterns from mechanical deformation

Figures 7 and 8 demonstrate that various three-dimensional electrical excitation wave patterns, such as planar or spherical waves, single scroll waves, or even scroll wave chaos, can be reconstructed from the resulting mechanical deformation using an autoencoder neural network with three-dimensional convolutional layers (see also movies 2–4 in the supplementary material). The neural network solely processes and analyzes three-dimensional tissue motion [see Fig. 7(a)]. The deformation shown in panel (a) shows the bulk surface, but three- dimensional kinematic data are used for the reconstruction (the displayed deformation is exaggerated to facilitate viewing of length changes, and displacements were multiplied by factor $2$). Movie 3 in the supplementary material shows the corresponding deformation over time (original without exaggerating the motion’s magnitude). As in the two-dimensional tissue, the contractile forces act along muscle fibers, which rotate throughout the three-dimensional bulk by a total angle of $120\xb0$ (see Sec. II B). Figure 7(b) shows volume renderings of the original three-dimensional electrical excitation wave dynamics $V(x,y,z)$ that led to the deformations of the bulk. These deformations were then analyzed by the autoencoder network. The electrical wave dynamics correspond to scroll wave chaos that originated from spherical and planar waves through a cascade of wave breaks. Panel (c) shows volume renderings of the reconstructed electrical excitation wave pattern $V~(x,y,z)$ that was estimated from the bulk’s deformations. The reconstructed electrical scroll wave pattern is visually indistinguishable from the original electrical scroll wave pattern. The reconstruction error $\u27e8\Delta V\u27e9$ is in the order of $4%\xb11%$, similar as seen with the two-dimensional reconstructions in Sec. III A. The reconstruction error remains small for a broad range of electrical parameters, e.g., for mechano-electrical feedback strengths ranging from $Gs=0.0,\u2026,2.0$, see Fig. 7(e), even if the network was trained only partially on $Gs$ (gray: $Gs\u2208[0,1]$ or $Gs\u2208[1,2]$, white: $Gs$ from the whole range $[0,2]$ included in training, the error largely unaffected). None of the data used for reconstruction and determining the error were used during training (see also Sec. II C). The evolution of the reconstructed electrical scroll wave dynamics appears smooth, see also movie 2 in the supplementary material, even though each three-dimensional volume frame was estimated or reconstructed individually by the neural network (cf. Sec. III C). Individual time-series show that the action potential shapes and upstrokes of the electrical excitation are reconstructed robustly with minor deviations from the ground truth [see Figs. 9(a) and 9(b)]. The reconstruction error does not fluctuate much over time ($\sigma \u27e8V\u27e9<1%$), even though the sequence contains planar waves, single scroll waves, as well as fully turbulent scroll wave chaos [see Figs. 8(c), 8(d), and 9(c)]. Figure 8 shows a comparison of the reconstruction accuracy obtained with the autoencoder with that obtained with a physics-based mechano-electrical reconstruction approach, which we published recently.^{25} The difference maps in panel (d) show substantially larger residual errors than with the autoencoder in panel (c). Correspondingly, the reconstruction error shown in Fig. 9(c) is more than threefold larger and fluctuates much more with the physics-based approach than with the autoencoder used in this work. The estimation of a single three-dimensional excitation wave pattern ($100\xd7100\xd724$ voxels) can be performed within $20ms$ by the neural network (model 3Ds-A1, Nvidia GPU, see Sec. II). The data shown in Figs. 7 and 8 were not seen by the network during the training procedure.

#### 1. Recovery of electrical vortex filaments from 3D mechanical deformation data

We computed electrical vortex filaments or three-dimensional electrical phase singularities, which describe the cores or rotational centers of electrical scroll waves, from both the original and reconstructed excitation wave patterns via the Hilbert transform, see panel (d) in Fig. 7 and movie 2 in the supplementary material. Both filament structures are almost identical with an average distance between original and reconstructed filaments of $1.1\xb10.2$ voxel [see Fig. 7(f)]. The mismatch is in the order of the precision with which the spatial locations of the filaments can be computed. One voxel corresponds to about $1%$ of the medium size ($100\xd7100\xd724$ cells). Except for a few short-term fluctuations, the mean distance between original and reconstructed electrical vortex filaments does not decrease or increase much over time, see Fig. 9(d), and stays small for different dynamics and over a broad range of mechano-electrical feedback strengths $Gs=0.0,\u2026,2.0$ [see Fig. 7(f)]. The filament data show that the topology of the electrical excitation wave pattern can be reconstructed with very high accuracy from deformation.

### C. Reconstruction from spatiotemporal mechanical deformation data and with arbitrary reference frames

Both static and short spatiotemporal sequences of mechanical deformation patterns can be processed by the autoencoder depending on the number of input layers [see Figs. 2(b) and 2(c)], and both lead to accurate reconstructions of the underlying excitation wave patterns (see Fig. 10). Panel (a) shows that the reconstruction achieves an accuracy of $93.4%\xb13.2%(\xb111.0%)$ when processing a single mechanical frame (using model 2Ds-A1s with two input channels for $ux$- and $uy$-components of displacement fields, respectively), which contains displacements $u\u2192r$ that describe the motion of each of the tissue’s material segment with respect to their original position in the stress-free, undeformed reference configuration $\chi 0$ (only the $uy$-component shown with scale [$\u22124,4$] pixels); see also schematic (left) in Fig. 10(c). The accuracy can be further improved to $94.8%\xb12.6%(\xb18.7%)$ and $95.1%\xb12.6%(\xb18.5%)$ when feeding two or three consecutive mechanical frames (or two or three consecutive sets of $u\u2192r={u\u2192r(t0),u\u2192r(t\u22121),\u2026}$) as input into the network (models 2Dt-A2s with four or 2Dt-A3s with six input channels), respectively [see also movie 5 in the supplementary material and Fig. 10(c)]. Analyzing spatiotemporal mechanical data improves the reconstruction’s accuracy and robustness (cf. models 2Ds-B1s and 2Dt-B3s in Table I), particularly also with low resolution mechanical data (cf. models 2Ds-F1s and 2Dt-F3s in Sec. III G).

Next, the autoencoder can also successfully process and obtain highly accurate reconstructions with instantaneous displacement data $u\u2192i$, which describes the motion of the tissue from the previous time step(s) to the current time step (temporal offset $\u223c5%$ of the spiral period) [see also Fig. 3(c)]. Figure 10(b) shows that the reconstruction (model 2Dt$\u2032$-A1s) achieves an accuracy of $93.5%\xb13.2%(\xb111.0%)$ when processing a single mechanical frame, which contains instantaneous frame-to-frame displacements (only the $uy$-component shown with scale [$\u22122,2$] pixels) calculated as the difference between two subsequent displacement vectors $u\u2192i(t0)=u\u2192r(t0)\u2212u\u2192r(t\u22121)$ in time [see schematic (center) in Fig. 10(c)]. The accuracy increases slightly further to $94.8%\xb12.6%(\xb19.2%)$ when processing two consecutive mechanical frames (model 2Dt$\u2032$-A2s) containing instantaneous frame-to-frame displacements $u\u2192i={u\u2192i(t0),u\u2192i(t\u22121)}={u\u2192r(t0)\u2212u\u2192r(t\u22121),u\u2192r(t\u22121)\u2212u\u2192r(t\u22122)}$, respectively, which were calculated as the difference between subsequent displacement vectors in time [see schematic (center) in Fig. 10(c)]. Analyzing displacements $u\u2192r$ or instantaneous frame-to-frame displacements $u\u2192i$ yields almost equal reconstruction accuracies.

Note that in some imaging situations, particularly during arrhythmias, the tissue’s stress-free undeformed mechanical configuration $\chi 0$ may not be known, and accordingly, displacement data $u\u2192r$ would not be readily available. Nevertheless, the autoencoder neural network can also correctly reconstruct excitation wave patterns, even if the input to the network is displacement vectors $u\u2192r\u2217$ that indicate motion and deformation of the tissue segments with respect to an arbitrary deformed configuration $\chi t$ in an arbitrary frame [see schematic (right) in Fig. 10(c)]. Analyzing the tissue’s relative motion with respect to an arbitrary deformed configuration $\chi t$, the inverse mechano-electrical reconstruction nevertheless achieves reconstruction accuracies of $93.5%\xb13.0%(\xb110.7%)$ and $94.8%\xb12.6%(\xb19.1%)$ when analyzing one (model 2Ds$\u2217$-A1s) or two (model 2Dt$\u2217$-A2s) mechanical frames, respectively, each displacement vector indicating shifts of the tissue $u\u2192r\u2217$ relative to a deformed configuration $\chi t\u22122$ a few time steps prior to the current frame. In more detail, in a series of three subsequent mechanical frames, the current ($t0$) and/or previous frame ($t\u22121$), which indicate motion with respect to the tissue’s configuration in the first frame of that series ($t\u22122$), was/were analyzed by the network. The network models were trained accordingly.

The data demonstrate that the inverse mechano-electrical reconstruction can be performed robustly with both static or time-varying mechanical input data, as well as with absolute or instantaneous frame-to-frame displacement data, which can alternatively describe motion with respect to the undeformed, stress-free, or an arbitrary reference configuration. Principally, similar kinematic data can be retrieved with numerical motion tracking in imaging experiments.

### D. Effect of training duration, training dataset size, and neural network size onto reconstruction accuracy

The training duration and the size of the training dataset are generally important parameters in data-driven modeling. Figure 11 shows how the training duration, measured in training epochs, and the size of the training dataset affect the reconstruction error for two-dimensional data (model 2Dt-A3s, see Table I). The image series in Fig. 11(a) demonstrate how the reconstruction accuracy increases with training duration. The excitation wave pattern can already be identified after two to five epochs, however, with substantial distortions. After 10–20 epochs, the reconstruction progressively recovers and resolves finer details, and after 40–50 epochs, also residual artifacts are removed to a large extent (see also movie 6 in the supplementary material). The ground truth excitation wave pattern $V(x,y)$ is shown on the right. Accordingly, panel (b) shows the mean absolute error, see Eq. (6), plotted over the number of training epochs. The reconstruction error $\u27e8\Delta V\u27e9$ quickly decreases to below $10%$ just after a few training epochs and approaches a mean absolute error of about 5%–10% after 10–20 training epochs. The error continuously decreases and saturates at about 4%–5% for 40–50 epochs. After $200$ training epochs, the reconstruction accuracy (model 2Dt-A3s) is $96.8%\xb11.6%(\xb15.7%)$ compared to $95.1%\xb12.6%(\xb18.5%)$ after $50$ epochs, respectively. All reconstructions in this study were obtained with $50$ training epochs if not stated otherwise. Note that the (gray) error bars in Figs. 11(b) and 11(c) reflect the larger residual errors. Overall, the reconstruction error $\u27e8\Delta V\u27e9$ is comparable in two- and three-dimensional tissues [cf. Figs. 7(e) and 9(c)]. Panel (c) demonstrates how the size of the training dataset determines the reconstruction accuracy. Generally, the larger the training dataset, the better the reconstruction. However, the reconstruction accuracy does not appear to improve significantly with more than $20000$ frames. In our study, the number of network model parameters affected the network’s performance only slightly. For instance, smaller two-dimensional models with about $300000$ trainable model parameters achieve a slightly lower reconstruction accuracy (e.g., model 2Dt-B3s: $94.5%\xb12.7%$) than larger models with more than $1000000$ model parameters on the same data (e.g., model 2Dt-A3s: $95.1%\xb12.6%$).

### E. Training bias

We found that the training data can influence the model’s reconstruction accuracy or generate a bias toward the training data. With the two-dimensional neural network models, the reconstruction error worsened substantially if the training was performed, for instance, only with focal wave patterns, and then, subsequently, the reconstruction was applied on spiral wave patterns. In the following, we distinguish different network models based on the type of data they were trained on, focal (f) and/or spiral chaos (s) data, see also Table I, and assess their reconstruction errors. If the network (model 2Dt-A3) was trained on both focal and spiral wave chaos data (50%/50% ratio in $20000$ samples), its reconstruction error for a 50%–50%-mix of both focal and spiral wave chaos data is $3.8%\xb13.1%(\xb17.8%)$; for focal wave data, it is $1.6%\xb11.3%$ ($\xb14.4%$); and for spiral wave chaos data, it is $6.0%\xb12.8%(\xb19.9%)$, respectively. However, if the network (model 2Dt-A3f) was trained just on focal (f) data, it performs very well on focal data, as expected, its reconstruction error being $1.4%\xb11.4%(4.9%)$, but poorly on spiral chaos data, on which the reconstruction error becomes $20.7%\xb16.9%(\xb125.7%)$, respectively. Accordingly, the same network yields a mediocre overall reconstruction error of $11.0%\xb110.8%(\xb120.9%)$ with a 50%–50%-mix of both focal and spiral wave chaos patterns, where the error largely results from analyzing the spiral wave patterns, while the focal patterns are reconstructed accurately. Vice versa, if the network (model 2Dt-A3s) was trained just on spiral chaos data, the network’s reconstruction error is $4.9%\xb12.6%(\xb18.5%)$ for spiral chaos data, $11.3%\xb17.8%(\xb115.5%)$ for focal data, and $5.8%\xb16.2%(\xb112.3%)$ for a 50%–50%-mix of both patterns, respectively. The results suggest, first, that the careful selection of electrical and mechanical training data will be critical in future imaging applications, and, second, that specialized neural networks trained exclusively on particular types of rhythms or arrhythmias may achieve higher reconstruction accuracies in specialized applications than general networks (cf. model 2Dt-A3f vs 2Dt-A3 on focal data). However, in some situations, the increase in accuracy associated with specialization might be negligible, as general networks trained on various rhythms may already achieve similarly high reconstruction accuracies as well (cf. model 2Dt-A3 vs 2Dt-A3s on spiral wave chaos data).

### F. Robustness against measurement noise

Mechanical measurement data obtained in imaging experiments are likely to contain noise (from noisy image data or intrinsic noise of numerical tracking algorithms). To strengthen the network’s ability to reconstruct excitation wave patterns with noisy mechanical input data, we added Gaussian white noise to the mechanical training data that otherwise contained smooth motion or displacement vector fields produced by the computer simulations. Since denoising is one of the particular strengths of autoencoders,^{34,35} they should be particularly suited to handle noise as they perform mechano-electrical reconstructions. Indeed, Fig. 12 shows that if the two-dimensional network models are trained with noisy mechanical input data, they develop the capability to estimate the excitation despite the presence of noise in the mechanical input data. The data also show that, on the other hand, when presenting noisy mechanical input data to a network that was not trained with noise, the reconstruction accuracy quickly deteriorates. Figure 12(a) shows four curves representing the reconstruction errors $\u27e8\Delta V\u27e9$ that were obtained with increasing mechanical noise $\xi $ with four different network models, one being trained without noise (black: $\xi ~=0.0$) and three being trained with noise (gray: $\xi ~=0.05,0.1,0.3$), respectively. The noise is normally distributed and was added onto the $ux$- and $uy$-components of the displacement vectors [see panel (c)]. All stated values correspond to the standard deviation $\sigma $ of the noise. Noise with a magnitude of $\xi \u22480.1$ pixels corresponds to $\u224810%$ of the distance between displacement vectors. Without noisy training, the reconstruction error increases steeply if noise is added during the reconstruction procedure, almost twofold for small to moderate noise levels of $\xi \u22480.05$. In contrast, training with noise (dark gray curves: $\xi ~=0.05,0.1,0.3$) flattens the error curve substantially and yields acceptable reconstruction errors ($\u27e8\Delta V\u27e9<6%$) up to the noise levels that were used during training. However, training with noise comes at the cost of increasing the error at baseline (light gray curve: $\u27e8\Delta V\u27e9>6%$ with $\xi ~=0.3$). Figure 12(b) shows the corresponding reconstructions with different noise levels $\xi =0.0,0.08,0.16,0.24,and0.32$ during estimation (horizontal) for the four different models with training noise $\xi ~=0.0,0.05,0.1,and0.3$ (vertical). Based on the results in Fig. 12, we used training data that contained both noisy and noise-free mechanical training data.

### G. Lower spatial resolution of mechanical data

Mechanical measurement data obtained in imaging experiments may retain a lower spatial resolution than desired. Since enhancing the resolution of image data (superresolution) is a particular strength of autoencoders,^{37} they should be particularly suited to perform inverse mechano-electrical reconstructions also with sparse or low resolution mechanical data. Figure 13 demonstrates that our autoencoder network is able to provide sufficiently accurate reconstructions, even if the mechanical input data have a lower spatial resolution, and illustrates the degree of deterioration of the reconstruction accuracy with increasingly lower spatial resolutions. The original size of the two-dimensional electrical or mechanical data used throughout this study for training and reconstructions is $128\xd7128$ pixels. To emulate lower spatial resolutions of the mechanical data, we downsized it to $64\xd764$, $32\xd732$, $16\xd716$, and $8\xd78$ displacement vectors in each time step, e.g., by repeatedly subsampling or leaving out every second displacement vector, respectively, and adjusted the input layers of the autoencoder accordingly, while keeping the output layer for the electrics constant at $128\xd7128$ pixels. The downsized mechanical displacement data were scaled accordingly (e.g., factor $0.5$ with $64\xd764$ pixels). The number of convolutional and downsampling (maxpooling) or upsampling layers in the encoding and decoding parts of the original network (model 2Dt-A3s) was modified to achieve the desired upsampling of lower resolved mechanical data to excitation data with size $128\xd7128$ pixels [see Fig. 13(d)]. For instance, mechanical data with size $64\xd764$ were read by a network (model 2Dt-D3s) with an input layer of size $64\xd764\xd76$ and encoded by two stages with two $32\xd732\xd7128$ and $16\xd716\xd7256$ convolutional layers and two maxpooling layers in between (see Table I). Mechanical data with size $16\xd716$ were provided directly after a $16\xd716\xd764$ convolutional layer to the latent space (e.g., models 2Dt-Fxx), and data with size $8\xd78$ were first upsampled and then send through a $16\xd716\xd764$ convolutional layer before being provided to the latent space [see Fig. 13(d)]. Figure 13(b) shows the reconstructions, all $128\xd7128$ pixels in size, of various electrical excitation wave patterns for different mechanical input sizes ($64\xd764$ to $4\xd74$ vectors or the subsampling/downsizing factor of $2\xd7$ to $32\xd7$) provided to the autoencoder network. Remarkably, the autoencoder achieved reconstruction accuracies of $94.1%\xb13.1%(\xb110.2%)$, $92.1%\xb13.7%(\xb111.8%)$, $90.8%\xb14.7%(\xb113.6%)$, and $83.2%\xb16.1%(\xb117.9%)$ when analyzing low resolution mechanical data consisting of $64\xd764$, $32\xd732$, $16\xd716$, and $8\xd78$ vectors, respectively [see also Fig. 13(b)]. Lower mechanical resolutions ($4\xd74$) are too coarse for the two-dimensional autoencoder, and the reconstruction accuracy drops to $72.6%\xb17.4%(\xb117.9%)$ accordingly. Confirming the results in Sec. III C, the reconstruction accuracy decreases to $89.9%\xb14.3%(\xb113.7%)$ (model 2Ds-E1s) and $87.3%\xb15.0%(\xb115.7%)$ (model 2Ds-F1s) with $32\xd732$ and $16\xd716$ vectors, respectively, if only static mechanical data are analyzed by the network [see also Fig. 13(c)]. Remarkably, with three-dimensional mechanical data, the reconstruction accuracy remains the same at $2\xd7$ lower mechanical resolution, $95.7%\xb10.5%(\xb16.8%)$ (model 3Ds-B1) vs $95.7%\xb10.6%(\xb16.7%)$ (model 3Ds-A1). The accuracy decreases to $94.8%\xb10.7%(\xb18.1%)$ and $89.4%\xb11.4%(\xb114.4%)$ at $4\xd7$ and $8\xd7$ lower mechanical resolutions (models 3Ds-B1 and 3Ds-F1), respectively (see Table I and also movie 8 in the supplementary material). The results demonstrate that autoencoders are very effective at interpolating from sparse data and that, accordingly, an autoencoder-based inverse mechano-electrical reconstruction approach is effective even when mechanics are analyzed at low spatial resolutions.

### H. Reconstruction of active stress (or calcium wave) patterns from mechanical deformation

The neural network can be trained to estimate independently either electrical excitation $V$ or active stress $Ta$ from mechanical deformations (see Fig. 14). Estimating the active stress variable $Ta$, see Eq. (3), from mechanics after the network (model 2Dt-A3) was trained with mechanical and active stress data, the reconstruction accuracy is $\u27e8\Delta Ta\u27e9=96.8%\xb12.4%(\xb13.2%)$, which is slightly (insignificantly) better than when $V$ is estimated with the same network [see Fig. 14(b) and Table I]. To allow the comparison, the active stress variable $Ta$ was scaled by a factor of $2$ before training (all $Ta<0.45$ [n.u.]) and normalized (with a fixed factor $\u223c0.45\u22121$) before calculating the reconstruction accuracy. Afterward, the estimated active stress $T~a$ can be used to indirectly estimate $V$, see Fig. 14(c), using a second network that performs the corresponding cross-estimation between the two scalar patterns $Ta\u2192V$ [accuracy: $97.9%\xb11.4%(\xb13.8%)$ with $nd=1$ in the input and output layer]. If the second network was trained with ground truth values of $Ta$ and $V$ (top), the two subsequent estimations $\chi t\u2192Ta\u2192V$ yield an overall reconstruction accuracy of $93.9%\xb14.6%(\xb112.2%)$. If instead the second network was trained with estimations of $T~a$ and ground truth values of $V$ (bottom, asterisk), the overall reconstruction accuracy after $\chi t\u2192T~a\u2192V$ slightly increases: $\u27e8\Delta V\u27e9=95.9%\xb13.7%(\xb19.6%)$.

## IV. DISCUSSION

Our study demonstrates that autoencoders are a simple, yet powerful machine learning technique, which can be used to solve the inverse mechano-electrical problem in numerical models of cardiac muscle tissues. We showed that neural networks with a generic convolutional autoencoder architecture are able to learn the complex relationship between electrical excitation, active stress, and mechanical deformation in simulated two- and three-dimensional cardiac tissues with muscle fiber anisotropy and can generate highly accurate reconstructions of electrical excitation or active stress patterns through the analysis of the deformation that occurs in response to the excitation or active stress, respectively. In the future, similar deep learning techniques could be used to estimate and visualize intramural action potential or calcium wave dynamics during the imaging of heart rhythm disorders or other heart disease in both patients or during basic research. Given adequate training data, autoencoder-based convolutional neural networks and extensions or combinations thereof could be applied to analyze the contractile motion and deformation of the heart in imaging data (e.g., obtained with ultrasound^{19} or magnetic resonance imaging) to compute reconstructions of electrophysiological wave phenomena in real-time. Our network is able to analyze a single volumetric mechanical frame and reconstruct a three-dimensional excitation wave pattern within that volume (containing $100\xd7100\xd724=240000\u2248603$ voxels) in less than $20ms$, which suggests that performing the computations in real-time at a rate of $50$ volumes s could in principle be achieved in the near future (e.g., with an Acuson sc2000 ultrasound system from Siemens using the matrix-array transducer 4Z1c, which provides volumes with $\u223c803$ voxels at that imaging rate).

We previously demonstrated—also in numerical simulations—that it is possible to reconstruct complex excitation wave patterns from mechanical deformation using a physics- or knowledge-based approach.^{25} In contrast to the data-driven modeling approach reported in this study, the physics-based approach required a biophysical model to enable the reconstruction of the excitation from mechanics. The biophysical model contained generic descriptions of and basic assumptions about the underlying physiological processes, including cardiac electrophysiology, elasticity, and excitation-contraction coupling and required the careful optimization of both electrical and mechanical model parameters for the reconstruction to succeed. In contrast, the data-driven approach in this study does not require knowledge about the biophysical processes or a physical model at all, which not only reduces computational costs, but also obviates constructing a model and selecting model parameters, which both could potentially contain bias or inaccurate assumptions about the physiological dynamics. Data-driven approaches generally circumvent these problems but do require adequate training data that include all or many of the problem’s important features and let the model generalize. Our model was trained on a relatively homogeneous synthetic dataset and proved to be robust in its ability to reconstruct excitation waves from deformation in the presence of noise, at lower spatial resolutions, with arbitrary mechanical reference configurations, and in parameter regimes that it was not trained on. The robustness of the approach may indicate that similar deep learning techniques could be applied to solve the inverse mechano-electrical problem with experimental data, which is typically more heterogeneous, less systematic, and of a lower quality overall. Nevertheless, in future research, one of the main challenges may be to generate training data that enable solving the real-world mechano-electrical problem with imaging data and that, at the same time, capture the large variability and heterogeneity of diseases and disease states in patients.

In this study, we showed that autoencoder neural networks can reconstruct excitation wave patterns robustly and with very high accuracies from deformation, even though both the electrical excitation and the underlying muscle fiber orientation are completely unknown. Nevertheless, at the same time, the autoencoder did not provide a better understanding of the relationship between excitation, active stress, and deformation. Similarly, we showed that the excitation wave pattern can be recovered, even though the underlying dynamical equations and system parameters that describe the relationship between electrics and mechanics are completely unknown to the reconstruction algorithm, but we did not gain insights into the mechanisms that underlie the reconstruction itself. Instead, the autoencoder learned the relationship automatically, in an unsupervised manner, and behaves like a “black box” during prediction, a caveat that is typically associated with artificial intelligence. It is accordingly difficult to assess whether it will be possible to develop a better understanding of the mutual coupling between voltage, calcium, and mechanics in the heart using machine learning and whether it will be feasible to predict under which circumstances machine learning algorithms may fail to provide accurate reconstructions. Hybrid approaches or combinations of machine learning and physics-based modeling^{62} may be able to address these issues in the future.

We made a critical simplification in this study in that we assumed that the coupling between electrics and mechanics would be homogeneous and intact throughout the medium, and we did not consider electrical or elastic heterogeneity, such as scar tissue, heterogeneity in the electromechanical delay, the possibility of electromechanical dissociation, or fragmented, highly heterogeneous calcium wave dynamics during fibrillation. We aim to investigate the performance of our approach in the presence of heterogeneity and dissociation-related electromechanical phenomena more systematically in a future study. For a more detailed discussion about potential future biophysical or physiology-related limitations, such as the degeneration of the excitation–contraction coupling mechanism or dissociation of voltage from calcium cycling during atrial or ventricular arrhythmias, we refer the reader to our previous publications^{19,25} and to related literature.^{63–69}

## V. CONCLUSIONS

We provided a numerical proof-of-principle that cardiac electrical excitation wave patterns can be reconstructed from mechanical deformation using machine learning. Our mechano-electrical reconstruction approach can easily recover even complex two- and three-dimensional excitation wave patterns in simulated cardiac muscle tissue with reconstruction accuracies close to or better than $95%$. At the same time, the approach is computationally efficient and easy to implement, as it employs a generic convolutional autoencoder neural network. The results suggest that machine or deep learning techniques could be used in combination with high-speed imaging, such as ultrasound, to visualize electrophysiological wave phenomena in the heart.

## SUPPLEMENTARY MATERIAL

See the supplementary material for noisy mechanical deformation data that are analyzed by a neural network autoencoder (see also movie 10). The inverse mechano-electrical reconstruction achieves high reconstruction accuracies even with noisy data (here shown for noise with a magnitude of 0.3); see also Fig. 12.

The movies display the reconstruction of two-dimensional chaotic electrical excitation wave dynamics from mechanical deformation in an elastic excitable medium with muscle fiber anisotropy, the reconstruction of three-dimensional electrical excitation wave dynamics from deformation in a deforming bulk tissue with muscle fiber anisotropy, electrical scroll wave chaos and deformation induced by the scroll wave chaos shown separately for the top layer of the bulk, the three-dimensional reconstruction of the dataset used in Ref. 25, reconstruction with network model 2Dt-A3s using a temporal sequence of three mechanical frames to reconstruct one electrical frame showing an estimate of an electrical excitation wave pattern, an improvement of reconstruction with increasing training duration, the reconstruction of two-dimensional focal excitation wave dynamics from mechanical deformation in an elastic excitable medium with muscle fiber anisotropy, the reconstruction of three-dimensional electrical excitation wave dynamics from deformation at 4$\xd7$ lower spatial resolution, the reconstruction of two-dimensional electrical excitation wave dynamics from deformation at lower resolutions, and noisy mechanical displacement data (with a magnitude of 0.3).

## ACKNOWLEDGMENTS

This research was funded by the German Center for Cardiovascular Research (DZHK e.V.), Partnersite Göttingen (to J.C.). We would like to thank S. Herzog and U. Parlitz for fruitful discussions on neural networks and S. Luther and G. Hasenfuẞ for continuous support.

## DATA AVAILABILITY

The data that support the findings of this study are available from the corresponding author upon reasonable request.

## REFERENCES

*Annals of Biomedical Engineering*(Springer, 2010), pp. 3112–3123.

*Advances in Neural Information Processing Systems*(Curran Associates, Inc., 2012), Vol. 25, pp. 1097–1105.

*Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015*, Lecture Notes in Computer Science (Springer International Publishing, 2015), pp. 234–241.

*Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS’16)*(Curran Associates, Inc., 2016), pp. 2810–2818.

*Statistical Atlases and Computational Models of the Heart. Multi-Sequence CMR Segmentation, CRT-EPiggy and LV Full Quantification Challenges*(Springer International Publishing, 2020), pp. 186–194.

*2007 IEEE Conference on Computer Vision and Pattern Recognition*(IEEE, 2007).

*Proceedings of the 32nd International Conference on Machine Learning*, Proceedings of Machine Learning Research Vol. 37, edited by F. Bach and D. Blei (PMLR, Lille, France, 2015), pp. 448–456.

*Computer Animation and Simulation 2000*, Eurographics (Springer, Vienna, 2000), pp. 113–123.