The extraordinary success of convolutional neural networks (CNNs) in various computer vision tasks has revitalized the field of artificial intelligence. The out-sized expectations created by this extraordinary success have, however, been tempered by a recognition of CNNs’ fragility. Importantly, the magnitude of the problem is unclear due to a lack of rigorous benchmark datasets. Here, we propose a solution to the benchmarking problem that reveals the extent of the vulnerabilities of CNNs and of the methods used to provide interpretability to their predictions. We employ cellular automata (CA) to generate images with rigorously controllable characteristics. CA allow for the definition of both extraordinarily simple and highly complex discrete functions and allow for the generation of boundless datasets of images without repeats. In this work, we systematically investigate the fragility and interpretability of the three popular CNN architectures using CA-generated datasets. We find a sharp transition from a learnable phase to an unlearnable phase as the latent space entropy of the discrete CA functions increases. Furthermore, we demonstrate that shortcut learning is an inherent trait of CNNs. Given a dataset with an easy-to-learn and strongly predictive pattern, CNN will consistently learn the shortcut even if the pattern occurs only on a small fraction of the image. Finally, we show that widely used attribution methods aiming to add interpretability to CNN outputs are strongly CNN-architecture specific and vary widely in their ability to identify input regions of high importance to the model. Our results provide significant insight into the limitations of both CNNs and the approaches developed to add interpretability to their predictions and raise concerns about the types of tasks that should be entrusted to them.
INTRODUCTION
Following the excitement about expert systems in the 1980s, the study of artificial intelligence (AI) entered another “winter” period.1,2 The development of, first, deep neural networks and, then, convolutional neural networks (CNNs) and their success at computer vision tasks created a wave of interest in AI that has not yet subsided. CNNs have been reported to achieve super-human performance in object classification and face recognition.3,4 Because of such successes, they have been applied to a variety of domains that have a strong influence on lives and the society, including detecting diseases from medical images,5 self-driving cars,6 job applications screening,7 urban surveillance systems,8 and scientific discovery.9–11
Some of the excitement and trust in CNNs may be due to claims of a strong analogy between how they work and how information is processed in the human brain. It is hypothesized, if not outright assumed, that CNNs develop representations of objects through their training.12,13 Despite the successes and excitement, there is still a wide gap between the practical success and the theoretical understanding of CNNs.
Many studies have found that because CNNs are over-parametrized, they overfit the data and are susceptible to identifying “shortcuts,” i.e., powerful predictors that are easy to learn but unrelated to the task at hand.14–16 Consequently, those models can become very sensitive to small changes that are unremarkable to humans.15,17 Concerns about the fragility of CNNs have led to the development of adversarial training strategies.18,19 However, many studies have also demonstrated that adversarial training is not a panacea.20–23
The concern about the black-box nature of CNN models and their fragility has spurred the development of computational approaches to the interpretability of a model’s predictions.24–29 While there have been attempts at benchmarking attribution approaches (see Refs. 30–32 and the references therein), those attempts are limited by the difficulty in creating flexible, controllable, and rigorous datasets that are sufficiently large and require no human labeling.
Analyzing CNNs from a theoretical perspective is also not without challenges due to the astronomical degrees of freedom of the system. Some studies have suggested that the extraordinary performance of over-parametrized CNNs can be explained using renormalization group methods33,34 or information theory perspectives.35,36
Attempts at addressing these heterogeneous challenges have also called upon experimental investigations. Such studies suggest that CNNs—unlike humans, who attended to the global shape of an object—preferentially learn “texture” instead of object shape.37–39 However, progress is hindered because the available datasets are, to a great extent, convenience samples with uncharacterizable properties that make controlled experiments impossible.40,41
We present here a solution to the lack of flexible, controlled, rigorously characterizable datasets. Specifically, we use cellular automata (CA)42,43 to generate images with rigorously controllable characteristics. This choice is motivated by the nature of images where the interactions between pixels determine the macroscopic pattern (rather than the value of individual pixels). The great success of CNNs has been attributed to their capabilities to capture the relations of multiple local pixels by convolutional operations.44–46 CA offers a minimum toy example to the test of the above-mentioned hypothesis. In CA, macroscopic patterns arise from the interactions of a predefined number of pixels and a set of evolutionary rules. Moreover, CA allow for the definition of both extraordinarily simple and extraordinarily complex discrete functions and for the generation of limitless datasets of images without repeats. Cellular automata have long been an object of study in AI. Wolfram investigated the Turing universality of cellular automata,42 and Crutchfield and Mitchell used them to study genetic algorithms.43,47 More recently, Gilpin studied the equivalence between CNNs and two-dimensional CA,48 and Mordvintsev et al. developed neural cellular automata for pattern regeneration.49
RESULTS
Cellular automata
Cellular automata are discrete functions of a finite number of discrete inputs—“rules”—that update the state of a system synchronously (see “Methods”). The most well-studied class of CA are denoted elementary cellular automata (ECA). They are applied to one-dimensional systems where each component’s state is a binary variable and the state of each component is updated according to its state and those of its two nearest neighbors [Fig. 1(a)]. Despite their apparent simplicity, CA are capable of generating rich phenomena, including stationary, periodic, and chaotic patterns [see Figs. 1(b) and S1 for example of patterns].42 Importantly, if one starts from a random state, a single ECA rule can generate an essentially limitless set of patterns—a system with L = 100 binary components can have 2100 ≈ 1.3 × 1030 distinct states.
Similarly, the number of distinct CA can be vast. An update rule that returns one of ns possible values and takes as inputs its own state and that of its k nearest neighbors allows for the definition of distinct rules. For example, k = 2 and ns = 3 yields distinct rules. Finally, and perhaps most importantly, CA build correlations within the local neighborhood of a pixel, thus generating objects with a shape. These characteristics make CA ideal generative processes for datasets of images used to probe how CNNs learn about the organization of the neighborhood of a given pixel.
Trainability of CNNs on CA-generated images
We first investigate the extent to which CNNs are able to learn CA rules [Fig. 1(a)]. To align with the classification nature of most vision tasks, we formulate our experiments as a classification problem. For each experiment, we first select the values of ns and k and then select a rule R(ns, k) at random from the set of possible rules for those parameters and generate images by iterating R. For most experiments, we consider a state vector of length L = 224 and iterate the rule 223 times for each of 4000 + 1000 + 1000 randomly selected initial configurations for training, validation, and testing, respectively. We generate negative instances by shuffling the values in each of the CA-generated images (see “Methods”). Thus, the key difference between a positive and a negative CA-generated image will be the correlations within the neighborhood of a pixel.
We hypothesized that CNNs would readily learn to identify CA rules because of the similarity between the convolutional operations and the definition of local rules. Indeed, Gilpin showed that arbitrary CA may be represented by convolutional filters analytically.48 We test the effectiveness of three state-of-the-art CNN architectures—VGG19,46 ResNet18,50 and GoogleNet51—in learning CA with different characteristics [Fig. 1(c)]. We find that there is little qualitative difference in the performance whether we fine-tune the fully connected layers of the three CNNs or retrain both features layers and the fully connected layers [Fig. 1(d)]. Thus, we conclude that these CNN architectures are highly effective at learning simple CA rules.
Learning limits of CNNs
Next, we systematically study the ability of CNNs to learn arbitrarily complex CA rules. To this end, we randomly select five CA rules for values of ns in the range of 2–6 and values of k in the range of 2–20 [Fig. 2(a)]. For each Ri, we generate a dataset consisting of 4000 + 4000 images for training, 1000 + 1000 images for preventing over-fitting during training (validation), and 1000 + 1000 test images never seen during training for estimating the performance (testing). We detail the training process in the “Methods” section (also see Fig. S2 for learning curves).
While it is beyond the scope of this study to fully investigate the characteristics of this transition, we recognize its similarity to the transition reported between P and NP phases for some class of problems.53–55 We also note that the dataset sizes used appear to be above the threshold needed for the CA rule to be learned if it was learnable (Fig. S5).
Significantly, latent space entropy may not be the only factor affecting the learnability of CA rules. To test this possibility, we systematically investigate the loss of predictability of VGG19 for every ECA rule. For each of the 256 ECA, we generate images of size (50, 50, 3). As before, each dataset comprises an equal number of cellular automaton images and negative images—8000 images for training, 2000 images for selection, and 2000 images for testing.
We find that fine-tuned VGG19 shows different difficulties in learning 256 ECA (Fig. S1). For four rules belonging to the class of ECA rules capable of producing chaotic patterns,56 we find a less than perfect performance for a fine-tuned model, whereas a fully retrained model yields a perfect performance in all cases (Fig. S3). This finding suggests that some learning situations may require full retraining of the CNN.
The simplest strongly predictive pattern is an attractor of the learning dynamics
It is well known that CNNs are prone to overfitting the training data and that they frequently learn “shortcuts” instead of task-related patterns (see the work by Wang16 for an example of this). However, the field’s understanding of the factors determining shortcut learning remains primarily qualitative and ad hoc. Armed with the wide spectrum of difficulty in novel cellular automata datasets, we are now able to design controlled experiments that test the characteristics of shortcut learning in CNNs.
Following on the work of Wang,16 we first investigate the impact of signal destruction on a CNN’s ability to learn the relevant pattern. Generalizing the approach in the work of Wang,16 we generate datasets as before but shuffle pixel values outside of a central square taking a fraction fs of the CA-generated image [Fig. 3(a)]. Our exploratory analysis confirms that for easy-to-learn patterns (CA rules drawn from small latent spaces), the CNN can learn the pattern even when it comprises a very small fraction, fs → 0, of the image. For higher complexity CA rules, the learning performance decreases more gradually with decreasing fs, recapitulating the results in the work of Wang.16
Next, we address the more realistic case of competition between multiple patterns that—if learned correctly—would yield high predictability but which would have different intrinsic levels of learning difficulty. Specifically, we generate patterns independently using pairs of CA with staggered latent space entropies and then concatenate the images. To specify the CA, we draw rules for values of ns and k that yield a set of 18 latent space entropies (see Table S2). Then, if we denote by CAl (CAr); the CA used to generate the leftmost (rightmost) columns of the image, CAl uses rules with increasing values of S, whereas CAr use rules with decreasing values of S. Again, images in the datasets are generated with sizes of (224, 224, 3), and each dataset comprises 8000 images for training, 2000 for validation, and 2000 images for testing.
If CNNs preferentially learn “shortcuts” (i.e., the easiest-to-learn strongly predictive pattern), then their performance should track the performance on the CA with the lowest latent space entropy. As the data shown in Fig. 3(c) demonstrate, this is indeed what happens but with a caveat. GoogleNet’s performance on the entire image more closely tracks the expected performance for the CA with the lowest latent space entropy regardless of its location or of the fraction of the image it covers. However, when the CA have similar learning difficulties, GoogleNet displays a lower performance than what would be expected for the simplest CA to learn. We find similar behaviors for VGG19 and ResNet18 (Fig. S8) as well as for horizontally split images (Fig. S9). Thus, we conclude that the simplest strongly predictive patterns are unavoidable attractors of the training dynamics.
Benchmarking attribution methods
A criticism frequently thrown at CNNs is that they are uninterpretable “black boxes”. In response, researchers have developed several approaches for estimating the importance of specific inputs to the CNN model. However, due to a lack of standard, rigorous, and controllable benchmarking datasets, we currently lack a deep qualitative and quantitative understanding of the strengths and weaknesses of each of those attribution methods. Fortunately, CA offer near limitless opportunities for altering inputs in controlled manners so as to truly measure the performance of competing attribution approaches and gain generalizable insights.
To rigorously quantify the performance of an attribution approach, we start by considering the 256 easy-to-learn ECA. We then alter the images created using a given ECA rule in order to destroy the pattern imposed by the rule. Specifically, we divide each ECA-generated image into four quadrants. In one quadrant, we leave unaltered; in a second (third) quadrant, we perturb by shuffling the sections of the columns (rows) falling within the quadrant; and in a fourth quadrant, we perturb by shuffling all the pixel values within the quadrant.
Our expectation is that during training, the CNN will assign the greatest importance to the inputs in the unaltered quadrant and the least importance to the inputs in the shuffled quadrant because the latter stores no information, whereas the former stores full information about the pattern. Similarly, because the rules are applied synchronously to an entire row, we expect that the CNN will assign greater importance to the quadrant with shuffled rows than to the quadrant with shuffled columns. These considerations effectively create sanity tests for attribution methods.57
We generate, according to each ECA rule, datasets of (50, 50, 3) images—8000 images for training, 2000 images for validation, and 2000 images for testing—and consider two treatments. In the first, which we denote with fixed quadrants, the positioning of the different types of perturbations is fixed. As shown in Fig. 4(a), the top left quadrant is where all the pixels are kept untouched, the top right is where rows are shuffled, the bottom left is where columns are shuffled, and the bottom right is where both rows and columns are shuffled.
Because in real-world conditions the patterns to be learned are not consistently within a fixed region of an image, we also consider a second treatment, which we denote stochastic quadrants. In this treatment, the positioning of the four types of perturbation is selected at random for each individual image [Fig. 4(c)].
We find that models still perform extraordinarily well for these datasets with test accuracies close to 100%. The question, thus, is how well different attribution methods work. Figure 4(a) shows two input images and the corresponding attribution maps obtained with guided backpropagation28 and deconvolution24 for VGG19 and GoogleNet (see the supplementary material for ResNet-18, other attribution methods, and other ECA rules). It is visually apparent that, unlike guided backpropagation, deconvolution cannot uncover the fact that the retrained VGG19 model must be finding different levels of information in different quadrants.
Figure 4(b) shows the average fractional importance of each region averaged over all the images in the datasets for fine-tuned and fully retrained VGG19 and GoogleNet models, and several attribution methods. Except for deconvolution, all the other attribution methods pass our sanity test for VGG19, but only two pass the sanity test for retrained GoogleNet and ResNet18 models (Figs. S10 and S11).
Next, we define a signal-to-noise ratio (S/N) using the fractional importance of the signal and shuffled-both quadrants, to quantitatively compare the performance of the different attribution methods. We find that several attribution methods have a strong performance (S/N ) for VGG19, especially for the fully re-trained models.
The outcomes are more sobering for the stochastic quadrants treatment. First, we uncovered that fine-tuned VGG19 (pre-trained on ImageNet) has an inbuilt preference for features in the top left corner (see Fig. S12). Second, signal-to-noise ratios decrease across the board (but not as much for fully retrained models). For example, S/N approaches 1 for deconvolution, showing that it fails our sanity test.
For the other attribution methods, we see that either the CNNs are no longer able—or “willing” (because of shortcut learning)—to glean the partial information in the quadrants with shuffled rows or columns, or that the attribution methods are not sensitive enough to the differences in importance. In view of our results for shortcut learning (Fig. 3), we believe that learning from shuffled data is more challenging and thus less likely to occur, making the former explanation more likely.
We also extended this analysis to three local attribution methods—occlusion,24 LIME,58 and feature permutation.59 All three fail our sanity tests (Fig. S10). In addition, our analysis confirms that attribution methods are highly architecture-dependent. Successful attribution methods in VGG19 fail to attribute high scores to the regions where the signal is located when used on other CNN architectures for both treatments (Fig. S11).
DISCUSSION
The lack of flexible, controlled, and rigorous benchmarking datasets has hindered our ability to understand the limitations of CNNs in tasks such as image classification. For the most part, we do not know whether there are limits to the complexity of the patterns that a CNN can learn, whether shortcut learning occurs only for certain types of patterns, or whether the interpretability of CNN predictions can be accomplished with the current attribution methods. This study significantly advances our understanding of all of these critical matters.
By using CA, a class of discrete functions operating on a finite number of discrete inputs, we can generate nearly limitless numbers of images displaying nearly limitless patterns. Significantly, by controlling the number of possible outputs of the function (with the parameter ns) and the number of potential input combinations (with the parameters ns and k), we can tune both the size of the neighborhood that the CNN has to learn to consider and the number of possible distinct patterns. For example, for CA that have three distinct outputs and consider the states of the two nearest neighbors (S = 3.30), the CNN has to learn to recognize the relevant 4-pixel combinations. CNNs accomplish this quite easily. However, the CNNs considered in this study start failing to learn when the latent space entropy approaches 10. Making the situation even more acute, not all S ≤ 10 CA rules are likely to be equally learnable. It is unlikely that CNNs will be able to conquer all the problems that we pose to them in a task-appropriate manner.
This brings us to the problem of shortcut learning. Our results strongly suggest that CNN learning dynamics are attracted toward the simplest—i.e., that encompasses the smallest neighborhoods—strongly predictive pattern. This also explains why previous studies found CNNs biased toward the local texture (fewer pixels) rather than the global shape of objects, which involves many pixels.37,38 Without access to controlled experimentation or robust attribution methods, it is likely that one will not be able to determine whether a CNN’s learning used a shortcut or not. This implies that uninterpretable CNNs are extraordinarily dangerous when used for critical matters. Indeed, studies have demonstrated that while CNNs appear to be able to accurately recognize faces, they show high errors for faces of individuals from marginalized groups,60 while CNNs appear able to select good applicants from standard resumes, they incur unacceptable levels of false negatives for members of under-represented groups.61
While the use of attribution methods may ameliorate concerns about what a CNN is learning (see the remarkable study by DeGrave et al.15), this functionality relies on demonstrating that the attribution method is truly highlighting the inputs that contribute the most to the CNN’s predictions. While prior studies have suggested that attribution should be used exclusively with CNN architectures for which they were developed, it has not been clear the extent to which some attribution methods may fail to appropriately identify important regions of an image. By perturbing CA-generated images, we are able to demonstrate that attribution methods optimized for a specific CNN architecture should not be used in conjunction with other CNN architectures and that attribution methods require retraining of the model—even those with high performance—in order to better capture important features for the model. For the attribution methods developed for VGG19, our results demonstrate beyond any doubt that local methods are utterly inadequate and that not all gradient methods are equally accurate.
Despite being task-specific, the insights gained from our study may extend to other areas of computer vision and beyond. They reveal the nature of model fragility, of learning limits in high entropy latent spaces, of the inevitability of shortcut learning, and of the limitations of current attribution methods that aim to make CNN predictions interpretable. Our study also demonstrates how synthetic datasets with rigorously quantifiable properties can advance our understanding of over-parameterized learning algorithms and open new opportunities for future research. Indeed, our approach to the generation of rigorously quantifiable datasets can be extended to other areas of computer vision, allowing greater insights into tasks such as segmentation and de-noising. Outside of computer vision, finite state automata have been used for modeling language production,62,63 suggesting that the approach illustrated here may be extendable to that domain as well.
The strength of these conclusions must be tempered by recognizing some of the potential limitations of our study. First, it is not clear that CA can capture all the heterogeneities present in real-world photos. Nonetheless, CA can display extraordinarily complex and heterogeneous patterns that are able to demonstrate practical limits to the degree of complexity that a CNN can learn. Second, many of our experiments rely on CNNs pre-trained on ImageNet. These pre-trained models were primarily exposed to color natural images that differ quite dramatically from the black and white CA-generated images. We note, however, the similarity of the results obtained using fine-tuned vs retrained models increasing the robustness of our findings and the confidence one can have in the findings about the limitations of CNN learning. Indeed, while fully retrained models display higher performances for a range of intermediate pattern complexities, retraining does not remove the sharp transition from a learnable to an unlearnable phase. Third, we do not investigate how these results would change when using adversarial training approaches, an important open question that requires further study. Finally, our study focuses on CNNs and, thus, does not explore other approaches to deep learning, such as vision transformers.64 Although alternative approaches are becoming increasingly popular and have demonstrated a strong performance in various computer vision tasks, much is still unknown about their ability to learn arbitrarily complex patterns, their interpretability, or their fragility. We believe that synthetic image generators, including CA, will also advance our understanding of the capabilities and limitations of those approaches.64
METHODS
CNN training
We use Pytorch (version 1.4.9)65 and PyTorch Lightning (version 1.4.9)66 implementations of the three CNNs. We train the models on a Tesla A100 GPU to minimize the cross entropy loss with a batch size of 256 and a learning rate of 10−4.
For most of the results shown in the main text, we use VGG19, ResNet18, and GoogleNet models pre-trained on ImageNet and fine-tune them for each CA rule. Specifically, we fine-tune the fully connected layers with a learning rate equal to 0.0001 for 1000 epochs. During training, we monitor the validation loss and select the best model as the one with the lowest validation loss. We estimate the model performance on the test set, which is never seen by the model during the training.
For increased robustness of our conclusions, we also retrain the three CNNs. We initialize weights with the values from the pre-trained models but then train both the feature layers and the fully connected layers with a learning rate equal to 0.0001 for 1000 epochs. During training, we again monitor the validation loss and select the best model as the one with the lowest validation loss. We estimate the model performance on the test set, which is never seen by the model during the training.
Cellular automata
A transition rule of a cellular automaton with ns distinct states, Xs = {0, 1, …, s − 1}, and k nearest neighbors is defined as the set of mappings from distinct permutations of an array of length k + 1 each position with a value in Xs to another value also in Xs. Each set of mappings corresponds to one rule. For a particular set of parameters (ns, k), permutations each can map to ns possible values; there are distinct mappings (“rules”). Elementary cellular automata (ECA) refer to the simplest among all possible nontrivial cellular automata, it comprises 256 different rules (ns = 2 and k = 2).
Each rule R(ns, k) contains mappings. For example, take ECA rule 30. The set of mappings are R30(2, 3) = {111 ↦ 0, 110 ↦ 0, 101 ↦ 0, 100 ↦ 1, 011 ↦ 1, 010 ↦ 1, 001 ↦ 1, 000 ↦ 0}. The number 30 is the decimal representation of the new states in a binary number system .
We use the Python package CellPyLib (version 2.3.1)67 to generate the CA datasets studied. Each image is generated by iterating a specific rule L − 1 times for an array with length L and randomly assigned initial values from Xs. We then stack the same L × L CA-generated image three times to get an array with three channels (L, L, 3).
For instance, as shown in Fig. 1(a), the generation of the 10 × 10 array starts with an initial row of random values of length 10, . The subsequent row, , is generated by applying transformation rules defined by R30(2, 3) under periodic boundary conditions. Each element in the second row is computed based on specific mappings from neighboring pixels in the preceding row: for example, the first element “1” is derived from pixels at positions 10, 1, and 2 in the first row (010), while the second “1” stems from pixels 1, 2, and 3 (100). This process iterates across the entire row, resulting in the formation of the second row . This procedure is repeated sequentially eight more times to generate a 10 × 10 array. Subsequently, this 10 × 10 array is replicated three times to form a three-dimensional image of dimensions (10, 10, 3).
Negative instance
To avoid introducing unintended differences that could act as “learning shortcuts,” we shuffled the values in each of the CA-generated images to ensure the same probability distribution of states in CA-generated images and negative instances. In this way, the key difference between a CA-generated image and a negative instance is the correlation between pixel values within a local neighborhood.
Latent space entropy
Shortcut learning experiments
To investigate the impact of uninformative features on the learning of CNNs, we generate two sets of rules of different latent space entropies (S = 2.08: k = 2 and ns = 2, rules 8, 18, 4, 22, and 19; S = 7.62: k = 2 and ns = 2, rules 1 021 279, 997 448, 1 058 637, 1 010 286, and 1 049 629).
To study the effect of competing predictive patterns, we generated datasets consisting of images where a fraction fl was generated using a rule from CAl, while the remaining 1 − fl fraction was generated using a rule from CAr. We generate 18 competing predictive patterns with an increasing (decreasing) latent space entropy—3.47, 4.83, 5.38, 5.49, 6.93, 7.69, 8.05, 9.01, 9.7, 9.89, 10.4, 11.27, 11.78, 12.08, 14.28, 15.25, 16.48, and 18.02 (reversed)—for CAl (CAr). We have listed the datasets in Table S2. To prevent shuffling information from the left and right sides together in negative controls, we generated CAl, CAr, and their corresponding negative controls independently and then concatenated them.
Robust least square
To get robust estimates of transition threshold Sx and avoid the impact of large fluctuations near the transition region (especially for retrained VGG19), we use robust regression to penalize outliers with large residues.68 Robust least square curves in our results minimized the soft L1 loss, , instead of the used in least squares, where {ri} are the residues.
Attribution methods
To create benchmark images for attribution methods, we divided each image into four quadrants. In one quadrant, we left the image unaltered, which we labeled as “CA”. In one negative control region, we shuffled the rows within the quadrant (“shuffled-rows”), while we shuffled the columns within the quadrant (“shuffled-columns”) in another. In the final region, we shuffled both the rows and columns within the quadrant (“shuffled-both”).
The attribution methods evaluated in this study are implemented in Captum (version 0.4.0).69 In both fixed quadrants and stochastic quadrants treatments, we calculated the fractional importance for 256 ECA rules with a test accuracy higher than 90%. For each of these ECA rules, we compute attribution scores for each attribution approach for a set of 25 images—13 generated by the CA model and 12 negative control—using batches of size 5. We then calculate the average over the high confidence [P(CA) ≥ 0.9] CA-generated images ( images) of the fractional importance for each quadrant as estimated by each attribution method.
For the hyperparameters of attribution methods, we use arrays of zeros as baselines for Integrated Gradients, Gradient SHAP, and occlusion. For Integrated Gradients specifically, we used the Gauss–Legendre algorithm with 200 time steps for the integration approximation. For Gradient SHAP, which requires random sampling to estimate gradients, we set the number of random samples to 5. Finally, for occlusion, the size of the patch for shading is set to 3 × 1.
SUPPLEMENTARY MATERIAL
The supplementary material 1.pdf includes the supplementary experimental results mentioned in the manuscript. The supplementary material 2.zip includes additional examples of attribution maps utilized for benchmarking attribution methods.
ACKNOWLEDGMENTS
This research was supported by the National Science Foundation Grant Nos. 1937123 and 2033604.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
W.L., C.Z., and L.A.N.A. conceived and designed the study. W.L. and F.A.O.S. performed the numerical simulations. W.L., C.Z., and L.A.N.A. performed the data analysis. W.L., C.Z., and L.A.N.A. created the figures. W.L., C.Z., F.A.O.S., and L.A.N.A. wrote, read, and approved the final version of the paper.
Weihua Lei: Conceptualization (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Validation (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). Cleber Zanchettin: Conceptualization (supporting); Formal analysis (supporting); Investigation (supporting); Methodology (equal); Validation (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). Flávio A. O. Santos: Conceptualization (supporting); Formal analysis (supporting); Investigation (supporting); Methodology (supporting); Validation (supporting); Visualization (supporting); Writing – original draft (supporting); Writing – review & editing (supporting). Luís A. Nunes Amaral: Conceptualization (equal); Formal analysis (equal); Funding acquisition (equal); Investigation (equal); Methodology (equal); Project administration (equal); Resources (equal); Supervision (equal); Validation (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal).
DATA AVAILABILITY
The synthetic dataset is generated with codes in the repository https://github.com/amarallab/benchmarkCNNs.
The codes for reproducing the results in this article are available in the repository https://github.com/amarallab/benchmarkCNNs.