Quantum computing and materials science: A practical guide to applying quantum annealing to the configurational analysis of materials

Using quantum computers for computational chemistry and materials science will enable us to tackle problems that are intractable on classical computers. In this paper, we show how the relative energy of defective graphene structures can be calculated by using a quantum annealer. This simple system is used to guide the reader through the steps needed to translate a chemical structure (a set of atoms) and energy model to a representation that can be implemented on quantum annealers (a set of qubits). We discuss in detail how different energy contributions can be included in the model and what their effect is on the final result. The code used to run the simulation on D-Wave quantum annealers is made available as a Jupyter Notebook. This Tutorial was designed to be a quick-start guide for the computational chemists interested in running their first quantum annealing simulations. The methodology outlined in this paper represents the foundation for simulating more complex systems, such as solid solutions and disordered systems.


I. INTRODUCTION
Quantum chemistry is considered to be one of the first fields that could benefit from the development of quantum hardware and algorithms. 1 The first applications of quantum computing to solve the electronic structure problem date back to 2005 2 and the field has since then received significant attention. [3][4][5][6][7][8][9][10][11][12][13] A detailed overview of the state-of-the-art methods developed in this field can be found in the following two review papers (and references therein): McArdle et al. 3 focus on using digital quantum computers to solve the electronic structure problem ab initio, while Cao et al. 4 is aimed at the expert computational chemist and highlights the advantages quantum computing brings to the field.
At this early technological stage, several strategies to employ quantum effects in computing are being explored. 14 The two most well-developed at present are quantum gate-based 15 and quantum annealing. 16 The former can be thought of as the quantum equivalent of classical computers, where the classical bits are replaced by quantum bits (qubits) and classical gates are replaced by quantum gates, which operate in discrete time. The latter relies on the quantum adiabatic theorem and requires the problem of interest to be mapped to a physical set of qubits, whose states are allowed to evolve by slowly changing the Hamiltonian parameters of the quantum annealer, at all times remaining in the instantaneous ground state. However, current quantum annealers experience coupling to their thermal environment beyond the 10-100 ns timescale. 17 Quantum annealers, therefore, behave in a way that is intermediate between the idealized quantum adiabatic theorem and inevitable experimental limitations.
From a chemical point of view, quantum gate-based algorithms have been employed mainly to solve the Fermion Hamiltonian that describes the motion of electrons in molecules and materials, i.e., the Shrödinger equation. Typically, the terms that account for electron correlation make this equation intractable to solve classically for all but the simplest systems. Thus, post-Hartree-Fock (post-HF) approaches have been developed such as configurational interaction (CI) and ideally applied in the full-CI limit. 18 Several algorithms have proved to be promising in this field. However, results are still limited due to the size and stability of available quantum hardwares. [3][4][5]19 Quantum annealing in its current hardware implementation solves problems that can be formulated as an Ising model. Unfortunately, the electronic Hamiltonian cannot be mapped to the Ising form without an exponential growth of the number of variables required to solve the problem, as discussed from a classical point of view in Xia et al. 20 This introduces an intrinsic limitation in using quantum annealers for solving this type of problem. Nevertheless, several algorithms to solve the electronic structure problem through quantum annealing have been explored for small systems. [8][9][10][21][22][23] Quantum annealing is suited to solving optimization and global minimum search problems. 24 Therefore, it finds applications in chemistry, where several configurations of the same structure are energetically viable. For example, in the analysis of the conformation of transition states, 25 configurations of dense polymers mixtures, 26 protein design, 27,28 screening of chemicals to discover new materials 29 and materials design 30 such as configurational analysis of doped materials. 31 Here, we will focus on this last application using vacancies in graphene as a test model to explore how to map a chemical system to an Ising Hamiltonian and to the quadratic unconstrained binary optimisation (QUBO) model. These models are the respective natural language for QA and chemistry and they both belong to the binary quadratic model (BQM) class of problems. The goal of this Tutorial is to present an easy-to-follow example of how to apply quantum annealing to solve a chemical problem, which can be formulated as follows: "given a periodic lattice of atoms, represented by a unit cell, atom coordinates, and periodic boundary conditions (PBCs), what is the lowest energy configuration (or configurations) when N vac sites are replaced by vacancies." This problem was studied by Carnevali et al. 31 using a discrete quadratic model, relying on two qubits per site, to represent the graphene structure. In this paper, a binary quadratic model, relying on a one-qubit-per-site implementation is used. Furthermore, the energy model introduced in by Carnevali et al. 31 is expanded in Sec. II G to better capture the chemistry of a more complex system.
In this paper, we are interested in providing a description of the procedure to build the energy model. The model system was purposefully chosen to be a simple test case. The energy model we discuss in Sec. II E describes the effect of the vacancy on the energy in a very simplified way. Nevertheless, by tuning the interaction between species (in this case carbon atoms and vacancies), the model can be extended, for example, to the simulation of solid solutions, spin configurations, and stoichiometric nanoparticles.
Despite the simplicity of the problem, it can be seen as the test model for the combinatorial optimization class of problems, which are often found in chemistry and materials science. The configuration space to explore is proportional to the binomial coefficient N sites N vac where N sites is the number of sites in the unit cell and N vac is the number of vacancies. For small values of N sites or N vac , the problem can be treated exactly by completing an exhaustive search, i.e., by calculating the energy of each possible outcome. However, as the size of the cell increases, the problem becomes quickly intractable (for example, for N sites ¼ 50 and N vac ¼ 10, there are more than 1 billion possible configurations). Various heuristic classical algorithms are available to tackle this type of problem, for example, Monte-Carlo methods, 32,33 random walks, 34 tree search optimization, 35 tabu algorithm, 36 and simulated annealing (SA). [37][38][39][40][41] However, these approaches suffer from limitations, such as getting stuck in local minima, sampling the same location more than once, and completely missing the best configurations because of the randomness of the sampling. A method to overcome these limitations is, therefore, required to analyze large configurational spaces encountered in materials science. Simulated annealing borrows ideas from the thermal annealing of metals. A fictional temperature, which decreases as the annealing evolves, is introduced as a tunable parameter. By reducing the temperature, the size of the accessible configurational space decreases during annealing. The process typically stops when either a minimum threshold on the energy or the maximum number of iterations, defined by the user, has been reached. 42 The progress of the SA can be determined by using sophisticated methods such as holding points and checking to see if the holding point is within a local basin. 43 Quantum annealing (QA) is based on the same principles as SA but uses an auxiliary quantum mechanical system to find the minimum energy of the system. Unlike the SA algorithm that uses classical search techniques, QA exploits two quantum mechanical phenomena: superposition and tunneling. 24 The former means that at the beginning of the anneal, the qubits represent all possible final states, corresponding to all combinations of 1 and 0 simultaneously (see Appendix A). The latter means that the system can move from one configuration to another by tunneling through the energy barrier. Therefore, it is possible for QA to escape deep local minima that trap other optimization techniques.
The hardware for the simulation reported in this paper is made commercially available by D-Wave 44 via their quantum cloud service Leap TM . The tools required to build the models and interact with Leap TM are distributed via the open-source project Ocean, hosted on the D-Wave GitHub page. 45 The Advantage_system5.2, which implements a Pegasus Quantum Processing Unit (QPU) Architecture, 46 was used for the results discussed in Sec. II.
The structure of this paper is chosen to guide the reader through (a) the mapping of the problem to the D-Wave hardware and (b) the building of the energy model. The methodology discussed here, together with the code made available at Ref. 47, gives the necessary resources to computational chemists and materials scientists interested in incorporating quantum annealing into their workflow. In Sec. II A, the analogies between simulations on classical and quantum computers are drawn. The Ising and QUBO models are introduced in Sec. II B. In Sec. II C, we discuss the workflow to map the vacancies in the graphene problem to the QA hardware and run the annealing. Section II D explores whether quantum annealing is needed for this type of simulation and what other methods could be used instead. In Sec. II E, the energy model is built first in terms of objectives and then, in Secs. II F and II G, constraints are added. The concept of qubit and its Bloch sphere representations are discussed in Appendix A. The mapping of the problem to the QPU hardware is presented in Appendix B. The idea of a graph representation of the chemical structure is introduced in Appendix C. The testing and tuning of the model parameters can be found in Appendix D. Considerations on the configuration multiplicity are discussed in Appendix E. Finally, the QA-simulated energy is compared to density functional theory (DFT) energies obtained by using the CRYSTAL 48 code and discussed in Appendix F.

II. THE MODEL AND RESULTS
A. Analogies with the "classical" quantum chemical approach The maturity of the technology used in classical computers means that the mapping of the materials chemistry problem to the hardware level is, generally, taken care of by the design of the highlevel programming language used to write the materials and molecular modeling software. On the other hand, being at the dawn of the application of quantum computing to chemistry, we are required to invest some time thinking about how to map the chemical system to the quantum hardware. In this section, we outline the differences between the classical and quantum computational materials chemistry workflows.
Computational chemists and materials scientists are familiar with translating a chemical system into a system of functions that can be implemented on a classical computer. A simplified typical workflow is summarized on the left side of Fig. 1. First, the system is defined, from a chemical point of view, in terms of the number and types of atoms, and their possible positions. Periodic boundary conditions (PBCs) are applied to the unit cell to model extended periodic systems. This step represents the first abstraction. The system then needs to be defined in terms of mathematical functions to be represented on a computer. Although there are, in theory, no constraints on the types of functions that can be used, the properties of some of them are better suited at describing electron distributions, e.g., Slater Type Orbitals (STOs) and Gaussian functions for molecules, and plane waves for crystals. These functions are parameterized in order to describe the position of the atoms and the corresponding electron distribution in the systems under study.
The next step involves selecting the electronic Hamiltonian. These Hamiltonians have been developed and implemented on classical computers for several decades. [48][49][50][51] They each tend to perform better for a specific class of structures or properties, but they have, in general, broad applicability. Therefore, typically, the user will select an out-of-the-box method (HF, 52 LDA, 49 B3LYP, 53,54 etc.) implemented in the code they are using and will not devote time to developing the energy model for each simulation. The outcome of the simulation will be the ground state energy of the system, the singleparticle energy levels and other properties, such as the optimized atomic geometry, vibrations or optical spectra, etc. Now we examine the quantum computational materials chemistry workflow. In the first step, analogous to the classical workflow, the chemical system is defined in terms of atomic numbers, atomic coordinates, and PBCs. Then, the structure needs to be mapped to the quantum hardware. Some recent methods are able to map a classical basis set (e.g., STO-3G) directly to hardware. As discussed in Sec. I, this approach requires a large number of qubits. 55 In this work, we will use a different approach: each site of the structure is mapped to a single logical qubit. The problem can, therefore, be encoded on a binary vector (whose length depends on both the system being investigated and the energy model, as discussed below). A simple mapping function (and reverse mapping function) is developed in Sec. II E to perform this task for our chosen example system. For a problem to be solved on a quantum annealer, it needs to be expressed in a binary quadratic model (BQM) form. This model is discussed in Sec. II B.

B. The ising and QUBO models
As discussed in Sec. II A, we need to map our problem to a binary quadratic problem to be able to solve it on a quantum annealer. Among the BQM class of problems, there are the Ising Hamiltonians (1) and the quadratic unconstrained binary optimisation (QUBO) method The linear term in the right-hand side of Eq. (2) originates from the fact that the x i variables are binary and, since 0 2 ¼ 0 and 1 2 ¼ 1, x 2 i can be replaced by x i . During the quantum annealing, the state of the qubit is represented by a vector that can have any value on a sphere, called the Bloch sphere (see Appendix A for details). At the end of the annealing, the projection of the vector on the z axis of the sphere is measured for all qubits used to map the problem and returned as the result of the calculation. In the Ising model, the variables s i can be either À1 or +1. The term h i is the bias 56 applied to the qubit i and J i,j the coupling between qubits i and j. In the QUBO model, the variables can be either 0 or 1. A problem formulated as an Ising model can easily be converted into a QUBO problem and vice versa by using s i ¼ (2x i À 1). In this work, the QUBO formulation of the problem was chosen because it allows for a matrix representation of the Hamiltonian that is more familiar to the computational chemist.
The challenge is to formulate the problem we are solving as a minimization function, where the Q matrix, also called the objective function, is designed in such a manner that the minimum of Eq. (3) corresponds to the lowest energy of the system of interest. Often, problems of practical interest are subject to constraints. However, as the QUBO acronym suggests, in the formulation of the model we are bound to unconstrained optimizations. Therefore, the constraint will be included in the model as a penalty function. This function is designed to increase the energy of the solutions that do not respect the constraint. The development of both objective functions and constraints is the main topic of Secs. II E and II F.

C. The vacancies in graphene workflow
For the calculation of the relative energy of vacancies in graphene, the quantum workflow can be summarized in the following steps: † step 1: build a graphene supercell large enough to contain the concentration of vacancies that we are interested in simulating.
In this paper, we consider all possible structures using a 3 Â 3 supercell of the two-atom primitive cell of graphene with up to three of the 18 carbon atoms removed to form the vacancies. This structure is shown in Fig. 2, where the labeling of the sites is defined, and can be found in the Jupyter Notebook available at Ref. 47. The pymatgen 57 library was used in this step because it allows for a python representation of the system that can be used with the Ocean tools; 45 † step 2: each site is mapped to a single qubit. This mapping is done via a vector x, with length N sites (in this case 18). The element x i at position i in the x vector will be 1 if the site i in the structure displayed in Fig. 2 is occupied by a C atom and 0 otherwise (vacancy); † step 3: build the BQM representation of the energy model. The details of the model used in this article are discussed in Sec. II B.
If a specific number of vacancies is chosen, then some additional constraints can be included in the model (this is discussed in detail in Sec. II F); † step 4: run the quantum annealer. For the problem to be solved on the quantum annealer, each variable needs to be mapped to the hardware. Because of the limited connectivity of currently available hardware, more than one physical qubit might be needed to encode one variable (for a detailed description of this process, see Appendix B). This mapping is called minor embedding and for the results reported in Secs. II E-II G, the default mapping performed by the D-Wave API was selected. The user is also required to define how many times the quantum annealing will be performed. In this paper, the default value of one thousand times was used. The output of this step will be one x vector for each time the quantum annealing was run. Typically, due to quantum fluctuations and external sources of noise, a distribution of energy and configurations will be returned. If the problem is properly formulated, the lowest energy configuration should correspond to the global minimum of the function. Some of the configurations might be degenerate in energy and, in the specific case discussed in this paper, some configurations might be symmetry equivalent. This will require some post-processing in order to group symmetry-equivalent structures. Finally, by reversing the mapping function, we can go from the configuration of the x vector to a chemical structure containing C atoms and vacancies. Some of the quantum annealing parameters, such as annealing time and chain strength, need to be optimized to improve the quality of the results. This parameter optimization is discussed in Appendix D.
D. Is quantum annealing needed for this problem?
For the graphene structure, we are interested in studying, for relatively small unit cells, an exhaustive search can be performed. In such an approach, the energy of all configurations containing a fixed number (N vac ) of vacancies is calculated. This can be achieved by building all the permutations of the x vector containing N vac zeros and calculating their energy using the QUBO model. In an even larger exercise, the whole vacancy space could be explored. All the x vectors for all configurations with the number of vacancies going from 0 to N sites are built and their energy is calculated. It is unlikely that structures with a large number of vacancies will be of interest since they correspond to non-chemically stable structures. However, if instead of studying vacancies, we were interested in solid solutions, then we might be interested in simulating high concentrations of both species. 58,59 The total number of possible permutations is 2 Nsites , while for a fixed number of vacancies N vac , the number of configurations can be calculated by using the binomial factor N sites The number of configurations to evaluate for a reasonablesized cell and number of vacancies quickly becomes too large to be explored via an exhaustive search. For example, more than 10 17 configurations exist for a 60-atom cell with a 40% level of substitution. For highly symmetrical structures, symmetry reduction can be used to drastically reduce the number of configurations. 60 In the following, we focus on a generic approach that does not require the system to belong to a particular symmetry group.
This kind of large configuration space can be explored by using classical global optimization heuristics, such as Monte-Carlo sampling, 32,33 and simulated annealing. [38][39][40][41] Currently, these heuristic approaches outperform QA. This is a hardware limitation and, as QPUs containing a larger number of more connected qubits are being developed, QA is expected to deliver much-improved performance than classical heuristics. This is because, as discussed in Sec. I, heuristic approaches suffer from several limitations. These limitations become more relevant for large configurational spaces characterized by relatively flat energy surfaces. Therefore, new optimization methods, such as QAs, are needed for the classes of problems encountered in materials science.

Number of bonds
In this section, we will focus on encoding the energy model for the graphene structure as a QUBO problem. The model we are using is based on the total number of bonds. Therefore, we define an arbitrary energy for the presence/absence of a bond caused by the introduction of a vacancy. The contributions to the energy of the system are based on the following assumptions: † the energy of the system only depends on the total number of bonds in the system; † if site i and site j are nearest neighbors and they are both occupied by a C atom, the energy of the system decreases by 1 arbitrary unit; † if site i and site j are nearest neighbors and one or both of them is vacant, the energy of the system is unchanged; † only nearest neighbors are included in the model. The interaction between sites i and site j does not contribute to the energy of the system if they are more than one bond length apart.
The model outlined above can be expressed as a QUBO matrix where A is the adjacency matrix of the structure (A and Q for the structure displayed in Fig. 2 is reported in the the Jupyter Notebook available at Ref. 47). Equation (5) defines Q as a sparse matrix with only 3N sites non-zero elements. This is an important aspect to keep in mind when developing the energy model as explained in detail in Appendix B.

Vacancy-free case
As a starting point, no constraint on the number of vacancies in the system will be imposed. Therefore, the lowest energy configuration is the one where the x vector satisfies the following expression: x argmin The result, in a Pandas dataframe 61 form, as returned by the Ocean API is reported in Fig. 3. The first 18 elements represent the  6) returns only one configuration: the one where no vacancies are present. This is an intuitive and reassuring result since the presence of vacancies increases the energy of the system. The average percentage of broken chains is also reported in Fig. 3. This is a measure of the quality of the annealing that depends on the way the problem is mapped to the hardware. This topic is discussed in detail in Appendix B.

F. Constraints and penalties
In the previous section, we observed how an unconstrained model (i.e., one that minimizes only objectives and is not subject to constraints) results in only one solution for the model problem. In this section, we want to model systems with a well-defined number of vacancies. Only vectors, x, that respect such a constraint will be considered acceptable solutions to the problem. If we are interested in an x vector that contains N vac vacancies, we can formulate the constraint as where the number of atoms is N C ¼ N sites À N vac . Equation (7) can be transformed into Eq. (8), To express the constraint as a penalty function, we look for a function of x whose minimum value is obtained when the constraint is respected. Results corresponding to configurations that do not respect the constraint may also be returned. However, if the model (objective and constraints) is well designed, these should be at noticeably higher energy and therefore both less likely to be returned by the annealer and easy to identify. By squaring Eq. (8), we obtain a function whose minimum value of zero occurs when the condition P Nsites The expression in Eq. (9b) can be simplified to that shown as (9c) because: † the term N 2 C is constant, so does not change the relative energy of solutions corresponding to different configurations; † the x 2 i term can be replaced by the linear equivalent, x i as in Eq. (2).
By combining the objective defined in Eq. (6) and the constraint defined in Eq. (9c), the following equation for the energy (U) of the system is Obtained: The elements of the resulting QUBO matrix can be expressed as The factor λ, also called the Lagrange parameter, determines the weight of the constraint with respect to the objective. The α/λ ratio defines how much the final energy should be affected by not respecting the constraint. There is no simple a priori formula for selecting the α and λ values and a tuning procedure is required (for details see Appendix D).
The QUBO model defined in Eq. (11) corresponds to an upper triangular matrix where the total number of off diagonal elements is For an 18-site model, this results in 135 off diagonal non-zero elements out of a total of 324 elements. Off diagonal terms introduce coupling between qubits. Due to the limited connectivity of the physical qubits in the D-Wave quantum annealer, the increased number of couplings will have an impact on the percentage of broken chains and the ratio of valid solutions (i.e., solutions that respect the constraint). This is discussed in Sec. II F 1 and in Appendix B.

Single-vacancy case
In this section, the QUBO model derived above will be used to simulate graphene structures containing one vacancy. The values of α ¼ 1 and λ ¼ 2 for Eq. (10) were selected following the tuning procedure outlined in Appendix D. The results of the quantum annealing are reported in Fig. 4. Only the configurations that respect the constraint are included.
Comparing the results reported in Figs. 3 and 4, we can draw some interesting conclusions regarding the effect that the constraint (and the increased number of couplings) has on the results. First, 18 different configurations, corresponding to the vacancy in each of the different sites are obtained. All these configurations result in the same energy. An average value of 1.06% broken chains was obtained in the results reported in Fig. 4 compared to 0.0% found in Fig. 3. The reason for this increase in broken chains is explained in detail in Appendix B. By summing the multiplicity of the different configurations, we find that only 73.7% of the times the annealing was run a configuration containing one vacancy was returned. These are the configurations that respect the constraint imposed in Eq. (7) and are defined as feasible. The vectors, x, representing the other configurations (not reported in the table) would contain either zero or more than one zeros and are discarded in the post-processing.
When only one vacancy is introduced, we know that all the 18 configurations will be symmetry equivalent and, therefore, the configurations reported in Fig. 4 can be reduced by symmetry to the structure reported in Fig. 5. In this straightforward example, this can be done manually based on the knowledge we have about the symmetry of the system. For more complex configurations, we used the spglib library 62 for symmetry analysis of periodic structures, as implemented in pymatgen. 57

Two and three vacancies
Graphene structures including two and three vacancies were studied using the same method described above for the one-vacancy structure. The QUBO matrix for this model is the same as the one The presence of more than one vacancy introduces an interesting effect on the energy distribution. Unlike the one-vacancy structure, where all the feasible configurations resulted in the same energy of À24.0 arbitrary units, in the two-vacancy structures, two energies and five symmetry non-equivalent structures are returned. These are summarized in Fig. 6 and depicted in Fig. 7. Structure a is the lowest energy structure. In this structure, the two vacancies are found in neighboring sites. This can be explained by using the model described in Sec. II E 1. When the two vacancies are next to each other, they lead to a total of five broken bonds. On the other hand, the vacancies in structures b-e are found in non-neighboring sites, which leads to a total of six broken bonds in these structures.
All these structures have the same energy because the energy model used only accounts for nearest-neighbor interactions. Therefore, whether the vacancies are next-nearest neighbors (as in structure b) or three bond lengths apart (as in structures d and e), the energy is unchanged according to the energy model used. Interestingly, structure b contains some single-coordinated atoms, which, from a chemical point of view, would be expected to destabilize the system and increase its energy. In Sec. II G, we address this aspect by adding a new objective to the energy model. When the QUBO model is constrained to return results containing three vacancies, 14 symmetry non-equivalent configurations and three energy values are returned. These are reported in Fig. 8. From the lowest to the highest energy, the corresponding structures have three vacancies in neighboring sites, two adjacent vacancies and one removed and three non-neighboring vacancies. One example per structure is depicted in Fig. 9.
In Fig. 10, three structures predicted by the model to have the same energy as structure f in Fig. 9 (À18:0 arbitrary units) are depicted. These configurations were selected specifically to highlight that some of the structures returned by the model will be

Journal of Applied Physics
TUTORIAL scitation.org/journal/jap highly unstable from a chemical point of view. Structure k contains singly bonded C atoms, structure m can be seen as broken into five-atom wide strips and structure n contains isolated carbon atoms. In this context, unstable refers to a configuration where one or more atoms, upon geometry relaxation, would move to occupy a neighboring vacancy site and result in a different configuration. For example, in structures n, the displacement of the isolated carbon atom results in a structure that is equivalent by symmetry to structure a (see Sec. F for more details). This shows the need for an improved energy model that is able to identify these types of structures. This topic is discussed in Sec. II G where the coordination of the atoms is included as an objective in the model.

G. Coordination
In this section, we refine the model described above by adding an extra term to the QUBO formulation in Eq. (11). The model used for the results reported in Secs. II E and II F focused only on the total number of broken bonds. However, as we have seen for the three-vacancy structures, it does not take into account the local character of the vacancies allowing for non-physical structures to be returned.
From a chemical point of view, we know that breaking a bond from a lower coordination atom will be more energy expensive than breaking a bond from the same atom in a higher coordination environment. In order to include this effect in the energy model, we need to start looking at whether the sites surrounding atom i are occupied by atoms or vacancies. This can be achieved by analyzing the nextnearest neighbors for each site. We, therefore, define the B matrix, which is analogous to the adjacency matrix A, but where the element B i,j = 1 if atoms i and j are next-nearest neighbors instead of nearest neighbors. The objective that we are interested in minimizing is where β is a scaling factor. The ratio between α and β will define how much the result is influenced by the coordination objective with respect to the number of broken bonds objective. By adding the coordination objective to Eq. (10) we obtain the following equation: and the corresponding QUBO matrix is In this work, we decided to define coordination as an objective rather than a constraint. This means that all the structures that respect the constraint on the number of vacancies defined in Eq. (11) will be considered acceptable. The coordination terms will only change the relative energy values to take into account the different coordination of the structure. Therefore, we expect that the FIG. 10. Structure of three unstable structures returned by the QUBO model summarized in Eq. (11) (threevacancy structure). These structures were specifically selected to have the same energy as structure f in Fig. 9. Gray circles and white circles represent C atoms and vacancies, respectively. The labeling is the same used in Fig. 8.
chemically unstable structures, such as the ones displayed in Fig. 10 will still be found, but they will have higher energy, for example, than structures b-e in Fig. 9.
Since in the QUBO formulation constraints are defined as penalty functions, the coordination objective can be transformed into a constraint by using the same procedure outlined in Sec. II F. Ensuring the ratio α=β is small enough and only considering valid the results whose energy is below a fixed threshold will result in structures not having, for example, single coordinated atoms. Whether to treat the coordination as an objective or constraint depends on the application and what is the goal of the simulation.
The results reported in Fig. 11 were obtained by using the coefficients α=1.0, β=0.05, and λ=2.0. The quantum annealing still returns 14 structures, but, unlike the results reported in Fig. 8, 7 different energy values are found. In Fig. 12 the 6 structures reported in Figs. 9 and 10 are depicted together with the energy calculated using Eq. (13). When including coordination as an objective, the model is able to identify structures based on the local environment of the vacancy and, therefore, allows for easy identification of the non-physical structures.

III. CONCLUSIONS
In this work, we drew the analogy between the classical and the quantum materials chemistry workflows. They both rely on the abstraction of the structure onto a set of functions and the calculation of the energy of the system by means of a Hamiltonian. The identification of the lowest energy configuration of vacancies in graphene structures was used as a test model. The mapping of the problem to a set of qubits was discussed and the energy model was built in terms of objectives to minimize and constraints to respect. This process was broken down into its different components in order to help the computational chemistry community to build energy Hamiltonians suitable for quantum annealing, i.e., running on a quantum computer. The model we discussed can be used to identify the lowest energy state of a system characterized by several states close in energy. These types of problems are often found in chemistry and materials science. In Appendix D we showed how the parameters used to build the model (α, λ and β values) and in particular, their ratios are crucial in obtaining significant results. These parameters may require some "chemical intuition" or other theoretical input to choose values in more complex problems where the ground-state solution is not immediately known or obvious. The results discussed in this paper aim at showing in a simple manner how quantum annealing is becoming an extra resource in the computational chemist's scientific toolbox. Furthermore, with the development of larger and more connected quantum hardware, these methods are believed to be able to tackle problems that cannot be solved by classical computers. The functions used to map the materials structures onto the D-Wave architecture and to perform the post-processing analysis developed for the results reported in this paper 47 represent the foundation for the development of a python library to be used by the materials chemistry community to directly access quantum annealing simulations much as the ArQTiC code 63 does for gate based quantum computing.

ACKNOWLEDGMENTS
This work was supported by the UKRI EPSRC Grant No. EP/ W00772X/2. We would like to acknowledge the USC-Lockheed Martin Quantum Computing Center for providing access to the D-Wave Advantage system. Via our membership of the UK's HEC Materials Chemistry Consortium, which is funded by EPSRC (EP/R029431 and EP/X035859), this work used the ARCHER2 UK National Supercomputing Service (http://www.archer2.ac.uk).

Conflict of Interest
The authors have no conflicts to disclose.

DATA AVAILABILITY
The data and software that support the findings of this study are openly available at https://github.com/cmc-ucl/quantum_ computing/tree/main/Quantum_computing_and_materials_science.

APPENDIX A: QUBITS AND THE BLOCH SPHERE
Classical bits of information are defined by a two-level classical state. On the other hand, quantum bits of information, called qubits, can be thought of as a two-state quantum-mechanical system. From a vectorial point of view, qubits are represented by a two-dimensional vector whose basis is and any other state can be obtained as a linear combination of j0i and j1i. The Bloch sphere 64,65 is used as a graphical representation of a two-state quantum-mechanical system. In the context of quantum computing, the Bloch sphere is used to represent the state of a qubit, by defining the north and south poles (aligned along the z axis) as corresponding to the standard basis vectors j0i and j1i.
During the calculation, the qubit can take any value on the surface of the Bloch sphere. Its projection on the z axis is proportional to the probability of measuring the state j0i or j1i. The x axis and y axis components are the phase of the qubit. This phase is responsible for some of the unique properties of qubits with respect to classical bits.
Throughout this paper, when talking about the value of the qubit measured at the end of the quantum annealing, we are referring to the projection of the qubit along the z axis, which is returned as a j0i or j1i vector. For the sake of simplicity, when reporting results, such as different configurations, the braket notation is dropped and the states j0i and j1i are reported as 0 and 1, respectively.

APPENDIX B: MINOR EMBEDDING AND MAPPING OF THE PROBLEM TO THE QPU
Once we have formulated a problem in terms of a binary quadratic model, this needs to be mapped to the quantum annealer hardware. For small problems, each variable of the model can be mapped to a single physical qubit. The term physical qubit refers to the physical quantum system used by the quantum processing unit (QPU) to perform the calculation.
Because of the limited connectivity of the current QPU topology in D-Wave annealers, for larger problems, it might not be possible to assign each variable to a single physical qubit. When this happens, two or more physical qubits can be used to represent a single variable. These qubits form a so-called chain. The process of assigning a variable to more than one physical qubit is referred to as minor-embedding and can either be performed manually by the user or heuristically by the D-Wave API through a set of functions called EmbeddingComposite, which rely on the minor-miner library. 66 Therefore, users do not need to worry about performing the minor embedding by hand for each problem. Long chains will impact the reliability of the results. One way to reduce the number of chains is to minimize the number of couplings in our Ising model or off diagonal elements in the QUBO formulation of the problem.
Since the physical qubits in the chain represent one variable, they should all return the same value (all spin up or all spin down). When this is not the case, the chain is said to be broken. Several strategies to recover the value of the variable represented by a broken chain are available. In this work, the majority vote technique was used. This technique assigns the value of the physical qubits that was predominant in the chain to all the qubits in the chain.
From a physical point of view, a chain is formed by applying a ferromagnetic coupling among qubits. The strength of such a magnetic field is called chain strength and it is one of the parameters that need to be tuned. If the chain strength is too strong compared to the fields used to implement the BQM, the problem part of the Hamiltonian becomes negligible and only trivial solutions are returned. On the other hand, if the chain strength is too small, too many broken chains will be found and the solution might become unreliable. It is, therefore, important to take into account the coupling strength defined by the BQM when setting the chain strength. There are several strategies to determine what value will return the best results. The default technique when using the D-Wave Ocean API is the uniform torque compensation (UTC) method. By using the root mean square of the coupling of the BQM, this method attempts to compensate for the torque that would break the chain. The chain strength is calculated as where N c i is the number of couplings (J i,j ) for atom i in the model. The prefactor σ can be selected by the user. Values of σ ranging from 0.1 to 1.0 were tested and are reported in Fig. 15 in Appendix D 2.
The D-Wave API allows the user to visualize, on a graphical interface, the mapping of the problem to the QPU. The images displayed in Fig. 13 are obtained from the D-Wave Problem Inspector. 67 In the top part of Fig. 13, the problem for the vacancyfree graphene structure, whose QUBO model is defined in Eq. (5) is displayed. The left panel shows the mapping of the problem to a set of qubits. Since this QUBO model only includes coupling between nearest neighbors atoms, this is equivalent to the graph representation of the problem discussed in Appendix C. The right panel in the same figure shows the mapping to a set of physical qubits on the QPU. Because of the limited number of coupled nodes, this problem was mapped to a set of 18 physical qubits and no chains are present.
In the bottom part of Fig. 13, the same mapping described above is displayed for the single-vacancy graphene structure. In the energy model for this structure, all the nodes are coupled to each other. This results in 153 edges (compared to the 27 in the vacancy-free case) in the mapping to the logical qubits (bottom left panel in Fig. 13). This problem cannot be mapped directly to the QPU architecture and qubit chains are formed. These are represented by the white lines connecting qubits in the bottom right panel in Fig. 13. Despite having the same number of sites in the structure as the vacancy-free case, due to the increased coupling in the model, to solve this problem, 48 physical qubits were used.

APPENDIX C: VACANCIES IN GRAPHENE AS A GRAPH PARTITIONING PROBLEM
In this section, we present a different approach to thinking about the problem of simulating vacancies in graphene. Graph representations are often used in the quantum computing community to represent a problem. A graph is a mathematical structure defined by nodes and edges (a line connecting two nodes). It is, therefore, easy to map such a structure to a set of qubits on a QPU. A chemical structure can easily be thought of as a graph where the nodes correspond to the atoms and the edges are the bonds between two atoms. However, this representation is not often used in the computational chemistry community. This is because for most applications, as discussed in Sec. II A, the atom types and coordinates can be directly fed into a computational chemistry code implemented on classical computers. In addition, the graph retains information about the connectivity of the system, but the information about the relative distances of the atoms is lost.
The graphene structure represented in Fig. 2 can be transformed into an 18-node graph (one node per atom) and results in 27 edges (3N sites =2 edges since each C atom is connected to three other C atoms). The resulting graph is a 3-regular graph.
When using QA to calculate the energy of a chemical system, the graph representation of the structure can be useful for two reasons: † it helps build the energy model in terms of pairwise interactions (see Table I); † it allows for an additional level of abstraction between the chemical structure and the mapping to the QPU. This can help to visualize the coupling required by the model and how that can be optimized to reduce the number of coupled qubits.
The energy model reported in Eq. (5) can be outlined in Table I, where x i ¼ 1 corresponds to site i being occupied by a C atom, x i ¼ 0 corresponds to site i being occupied by a vacancy. The results reported in the third column of Table I can be obtained by using a coupling J ¼ À1 and is only needed when two nodes of the graph are connected.
The problem of identifying the lowest energy configuration of C atoms and vacancies can be thought of as a graph partitioning problem. The goal of the partitioning is to assign some nodes to group 1 (the C atoms) and others to group 0 (the vacancies) in such a way that minimizes the energy of the system according to Table I. This representation can be useful when a more complicated energy model is used or when more than two species can occupy the sites in the structure.

APPENDIX D: TUNING THE MODEL PARAMETERS
The effect of different λ=α ratios, chain strength, and annealing time were tested. The metrics analyzed are: † ratio of broken chains (see Appendix B for details). These are calculated with respect to the number of times the quantum annealing was run and results that do not respect the constraint were included; † ratio of feasible solutions: how many solutions respect the constraint? † ratio states: of the feasible solutions, how many correspond to the ground state and how many correspond to higher energy states?

Ratio λ/α
Ratios of λ/α ranging from 1 to 10 were obtained by keeping α = 1 and varying the λ value. For these simulations, the chain strength was calculated by using the uniform torque compensation method, as discussed in Sec. D 2 with a prefactor of 0.3 [see Eq. (B1) for details]. This means that a different chain strength is calculated based on the different values of the couplings J i,j to ensure consistency in the results. Each point in the graphs in Fig. 14 represents the mean value of ten different quantum anneals (each run a thousand times). The error bars show the standard deviation for the ten runs.
Based on the results reported in Fig. 14, the ratio λ/α = 2 was used for all the simulations reported in this paper. This value was selected because it results in a low percentage of broken chains while delivering high constraints-respecting solutions. In addition, it ensures a high probability of finding the ground state (third column in Fig. 14). For all α=λ values, the probability of finding excited states is higher than the ground state probability. This can be explained in terms of the multiplicity of states. From a symmetry point of view, the states corresponding to excited states (nonneighboring vacancies) result in exponentially more structures than the ground state ones (neighboring vacancies).

Chain strength
When the problem cannot be directly mapped to the QPU architecture, chains of physical qubits are formed. This topic is discussed in Appendix B. The D-Wave Ocean API allows for different ways to select the chain strength. The default is the so-called Uniform Torque Compensation (UTC) method reported in Eq. (B1). This model uses the residue mean square of the coupling values. It is, therefore, important to use the same energy model for all simulations when testing the effect of the chain strength using this model. For the results reported in this section a λ/α=2 was used. The UTC method allows for the selection of a prefactor, which increases or decreases the chain strength values by a constant amount. Prefactor values ranging from 0.1 to 1.0 were tested. In Fig. 15 the metrics defined above are plotted against values of chain strength. The x axis reports the chain strength value rather than the prefactor that results in such value. The minimum and maximum values of the J i,j terms in the energy model are 3 and 4. The h i terms depend on the number of vacancies in the model [see Eq. (11)]. Values of h i ¼ À66, h i ¼ À64 and h i ¼ À58 were used for the one, two and three-vacancy structures. Following the tests discussed in this section, the prefactor value of 0.3 was chosen for all the simulations reported in this paper.

Annealing time
The D-Wave Ocean API allows the user to define the duration of the quantum annealing. Values ranging from 1 μs to 100 μs were tested. results are reported in Fig. 16. The default value of 20 μs was selected for all calculations because it ensures the highest rate of feasible solutions.

APPENDIX E: CONSIDERATIONS ON THE CONFIGURATION MULTIPLICITY
The Boltzmann distribution can be used, in chemistry and statistical mechanics, to calculate the probability (p i ) of finding the system in state i as a function of the state energy and the temperature, where ϵ i is the energy of state i, T is the temperature of the system, n is the total number of states, and k is the Boltzmann constant.
In the approach outlined in Eqs. (2), (5), (11), and (14), the energy of different configurations depends on the α and β values. These are optimized in order to correctly identify the energy order of different configurations. However, the absolute energy value calculated with this approach is not expected to be representative of the real energy of the structure. Therefore, the energies reported above would not yield sensible results when used in (E1).
When comparing the number of occurrences of different structures having the same number of vacancies and within the same energy bracket, the energy dependence of Eq. (E1) can be removed to obtain: where the superscript E was added to specify we are only interested in states with energy E. We can use Eq. (E2) to study the distribution of symmetry equivalent structures for the one-vacancy graphene structure. For this structure, 18 equivalent configurations are found by the quantum annealer. According to Eq. (E2), these configurations should be found the same number of times (5.55% of the times). In Fig. 17, the result of 100 different quantum annealings, each of them run 1000 times, resulting in a total of 100 000 runs is reported. The reason why it was not run directly 100 000 times in a single time is that we want to allow the minor embedding to map the logical qubits to different physical qubits. This is important because of the possible presence of unwanted biases within the quantum hardware that might bias the result.
The values reported in the "percentage occurrence" column only refer to the feasible solutions, i.e., the ones containing one vacancy. Therefore, their sum is 100%. This is different from the results in the main body of this paper that reported the % of occurrence with respect to all the solutions returned by the quantum annealer including the ones that do not respect the constraint on the number of vacancies. A standard deviation on the percentage of occurrence values of 0.267 with respect to the ideal 5.55% value was found.
The standard deviation was then calculated for different numbers of QA runs to study the dependence of the standard deviation on the number of runs. Results are reported in Fig. 18 where the standard deviation as a function of the increasing number of QA runs is displayed. The standard deviation decreases with the number of runs from 2.092 to 0.267 going from one single run to 100. After a first steep initial decrease of 1.394 going from 1 to 10 runs, the standard deviation decreases constantly and slowly with the increasing number of runs. These results suggest that the standard deviation tends to zero in the limit of an infinite number of runs and the 18 symmetry equivalent structures would be found with the same frequency.

APPENDIX F: COMPARING THE QA ENERGIES TO DFT
The energy of the structures reported in Figs. 3 to 8 was calculated using Density Functional Theory (DFT). All calculations were performed by using the periodic CRYSTAL17 code, which implements localized Gaussian-type basis functions. Thanks to the use of local basis functions instead of plane waves, on which most solid-state simulation codes rely, the graphene structure can be treated as a real 2D system. The PBE functional and a pob-TZVP basis set 68 were used. The truncation criteria for the Coulomb and exchange infinite lattice series were set to 8 (T1-T4) and 16 (T5). An 8 Â 8 Monkhorst-Pack grid was used to sample the reciprocal space. A 16 Â 16 Gilat-net grid, which is used to determine the Fermi energy for conductors, was used. A tolerance of 10 À9 Hartree/cell was selected as a criterion for both the Self Consistent Field (scf ) and geometry optimization energies.
The structure of the vacancy-free graphene was fully optimized (lattice parameters and atom positions). It was then used as the initial geometry for all the other structures containing vacancies. The lattice parameters were fixed at the vacancy-free values in an attempt to simulate a low concentration of vacancies, which would not be able to distort the lattice. In the geometry optimization calculations, the symmetry of the structure was removed by using the SYMMREMO keyword. The atom positions were not manually changed to break the symmetry. This means that the initial geometry for the single-vacancy structure is close to a local minimum and upon optimization, we did not find a Stone-Wales structure, but only a small displacement (0.05 Å) of the atoms surrounding the vacancy.
In Table II, the "E sp" column reports the single point energy on such initial geometry. Then, the atom coordinates of all structures were allowed to relax and their final energies are reported in the "E optgeom" column. For ease of comparing the relative energies, these were normalized with respect to the lowest energy within the structures containing the same number of vacancies.
The following observations can be drawn from Table II: † the energy per atom increases with the number of vacancies in the structures; † for all structures, the relative DFT single point energies agree with the QA calculations. It can, indeed, be observed that the structures where the vacancies are located on neighboring sites have the lowest energy within a fixed number of vacancies group. For example, among the two-vacancy structures, structures b-e have higher energy than structure a; † when the geometry is optimized (column "E optgeom"), the observation outlined in the precedent point does not always hold true. For example, among the three-vacancy structures, structures f and j, which are predicted by the QA to have the highest energy have, according to DFT, the same energy as the lowest energy structure a. This can be explained in terms of the heavy reconstruction these structures undergo. Structures f and j are depicted in Fig. 19 before and after geometry optimization. Both structures start from a sparse initial distribution of vacancies and get to a final structure where the vacancies can be found on three neighboring sites. In order to quantify such reconstruction, the standard deviation (std) on TABLE II. Comparison between the QA calculated energies and the DFT energy for the graphene structures including vacancies discussed in Sec. E to II G. The energies for the quantum annealing are reported as arbitrary units. The column labeled "E sp" refers to the energy of the non-optimized structure. In column "E optgeom," the energies of the geometry-optimized structures are reported. The values of ΔE are normalized with respect to the lowest energy structure containing the same number of vacancies. The standard deviation on the displacements of the atoms and the maximum displacement reported in the last two columns refer to the structure before and after geometry optimization.  Table II. Both structures f and j undergo much larger reconstruction than any other structure. Their std on the atomic displacements is one order of magnitude larger than that for the other structures.