We develop and implement two realizations of quantum graph neural networks (QGNN), applied to the task of particle interaction simulation. The first QGNN is a speculative quantum-classical hybrid learning model that relies on the ability to directly utilize superposition states as classical information to propagate information between particles. The second is an implementable quantum-classical hybrid learning model that propagates particle information directly through the parameters of RX rotation gates. A classical graph neural network (CGNN) is also trained in the same task. Both the Speculative QGNN and CGNN act as controls against the Implementable QGNN. Comparison between classical and quantum models is based on the loss value and accuracy of each model. Overall, each model had a high learning efficiency, in which the loss value rapidly approached zero during training; however, each model was moderately inaccurate. Comparing performances, our results show that the Implementable QGNN has a potential advantage over the CGNN. Additionally, we show that a slight alteration in hyperparameters in the CGNN notably improves accuracy, suggesting that further fine tuning could mitigate the issue of moderate inaccuracy in each model.
I. INTRODUCTION
Irregular graph-based machine learning is inherently difficult due to the lack of symmetry among the nodes and edges that constitute the graph.1 Convolutional neural networks (CNN) that thrive in the context of grid-based data are ineffective and inappropriate to use in this context, as they lack a straight-forward method of incorporating irregular data.1 This is exemplified in cases where a single node may have hundreds to thousands of edges, whereas its neighbor only has one. Recent pursuit of a neural network model that adapts to the complexities implicit in irregular graph-based systems, e.g., social networks,2 molecule structures,3 particle interactions,4 etc., has resulted in the development of a plethora of graph-based machine learning models, which are all defined under the general term “graph neural network” (GNN).5 Though containing a spectrum of deviations, the core feature of these models is that they exchange information between nodes via vector messages, a process dubbed message passing, and then update via neural network.1 It is this core feature that has led to the success of GNNs, as evidenced by their ability to excel at a variety of tasks including predictions ranging from traffic,6 to the chemical properties of molecules,3 knowledge graph reasoning,7 and particle simulation.4 In essence, the utility of GNNs encompasses a majority of learning problems that can be represented as a graph containing meaningful connections between nodes. The vagueness here is appropriate due to the extensive number of environments where GNNs are applicable.1,5
Quantum machine learning has likewise emerged recently with developments in quantum hardware sophistication and capacity8 and is itself a sub-field of machine learning that has garnered increased interest over the last decade. Being relatively new and having an excess of possible realizations, there is a lack of formal definition for the topic.9 In general, the process involves traditional learning via quantum information processing, where either parameters are optimized or a decision function is obtained.9 For the purposes of this study, the quantum machine learning aspects are reserved to encoding, processing, and decoding the data via parameterized quantum circuits (PQC), while the remaining parts of the algorithms including calculating loss function, backpropagation, etc. are done classically. Thus, the learning models presented here represent hybrid quantum-classical algorithms.
A general pursuit in quantum algorithms is quantum supremacy.10 Yet, prior to even attempts at such a lofty claim, it must be ascertained as to whether a quantum analog is obtainable for a classical algorithm. That is the main pursuit of this study; following confirmation of this, we consider performance comparisons between the classical and quantum cases. We begin by describing and implementing two quantum graph neural network (QGNN) learning models. In particular, the QGNN consists of three sections of parameterized quantum circuits (PQC): encoder, processor, and decoder. The encoder expands the initial superposition state by incorporating additional qubits into it, and the decoder pools that information from the larger number of qubits into the desired smaller amount (pooling from six qubits to two). The processor in between is responsible for the message passing, utilizing a quantum-based interaction network (IN) to send information between nodes.4 The variation in the two developed QGNNs is that the first is a speculative model, while the second is implementable. Specifically, the speculative model relies on being able to directly take and store the qubits' superposition amplitude states and to use them classically. This is not possible to directly implement on a quantum computer, as a superposition state can only be approximated via statistical analysis of numerous measurements.11 For the speculative circuit, the statistical analysis would have to be implemented seven times for each input vector, where seven is the number of sub-circuits that constitute the overall quantum circuit, excluding the decoder. Supposing 1000 measurements per approximation to reconstruct the superposition state, the speculative circuit would need to be run 7000 times per input vector. Considering the number of input vectors for this study is approximately 19 000, this would require 133 × 106 runs of the quantum computer for a single epoch. Furthermore, each run would require use of a subroutine to determine the sign of each superposition state's amplitude,11,12 in addition to needing a subroutine to generate the expectation value that constitutes the quantum circuit's relevant output.13 This is all very impractical for actual implementation on quantum hardware; thus, this model is dubbed speculative. Regardless, this quantum algorithm was designed to be the closest analog to the classical and, thus, acts as a control. The Implementable QGNN is fully realizable on quantum hardware. However, due to considerable gate depth, the results shown in this study were achieved via a quantum circuit simulator. Both the Speculative and Implementable QGNNs, which they will be referred to as throughout this paper, correlate the output of a particular qubit to its expectation value, following the application of a Pauli-Z observable to that particular qubit.14,15 Actual use of the Implementable QGNN on a quantum computer requires a non-trivial subroutine to approximate the observable's expectation value.13 However, as this is only necessary at the end of the circuit and only requires application for two qubits in this study, it is considered implementable. The utility of these learning models is examined under the context of particle interaction simulation, in particular, in the case of point particles falling under the influence of gravity within a box.
This study contains the following layout. Section II covers the learning algorithms. Section III covers the PQCs of the encoder, processor, and decoder. Additionally, it goes over the method of encoding the data, and it describes the progression of data from input to output for both QGNNs. Section IV covers the results of the QGNNs and classical graph neural network (CGNN). Section V concludes this study by considering the results and offering potential paths for future research.
II. LEARNING MODEL
A. Overview
The learning model implemented in this study is based on that used by Sanchez-Gonzalez et al.16 It includes three sections: the encoder, the processor, and the decoder. The encoder takes the initial vector input and expands it into a higher dimensional latent space. The processor then processes the expanded data through its interaction network (IN) for a select n number of steps. Each step corresponds to a particular node receiving a “message” from nodes n edges away. Finally, the decoder receives this processed data and outputs a prediction.16
B. Classical interaction network
The message passing property of the classical graph neural network (GNN) used in this project is obtained through use of the IN learning model. It is the same as that described by Battaglia et al.4 and is highly similar to the Graph Network (GN) learning model described by Sanchez-Gonzalez et al.16 A brief description of the classical IN is provided here, while a more in-depth analysis can be found in the work of Battaglia et al.4 Following this are the descriptions of the GNN quantum analogs and the quantum graph neural networks (QGNN). Appropriately, each QGNN has a unique quantum interaction network, each analogous to the classical.
C is then supplied to the learning function . Its column slices, ck [see Eq. (9)], are the input, and the total number of columns in C is the batch size. Just as predicted the new edge states, now predicts the new node states, P, which is a matrix the same size as O that contains the new node states.4 This is appropriate, as P will replace O in the case of multiple processors. Finally, either the algorithm repeats or the matrix P is supplied to the decoder. The outcome is dependent upon the number of processors in the GNN, with each processor corresponding to one complete run of the IN. For the case of multiple processors, the substitution and will be made, and the algorithm will be repeated with these updated O and Ra values. After cycling through all processors, the final P is given to the decoder, which will then output its prediction. The psuedocode for the full classical IN algorithm is shown in Fig. 1. Additionally, a complete explanation of this algorithm can be found in the work of Battaglia et al.4
To give a brief overview of the classical GNN implemented in this study, it has the overall same design as that utilized by Sanchez-Gonzalez et al.16 and is described as follows. The node encoder consists of a two layer multilayer perceptron (MLP), with the first layer being size 8 and the second layer being size 9. The edge encoder also consists of a two layer MLP, with the first layer being size 4 and the second layer being size 5. The first layers for these sections were explicitly chosen to correspond to the node and edge input data, which are of size 8 and 4, respectively. Concerning the processors, the node and edge processors each consist of a single perceptron layer of size 9 and 5, respectively. Finally, the decoder contains a single perceptron layer of size 2. After each perceptron layer in the encoders and processors, layer normalization is used.17 Additionally, each of these layers utilizes a ReLU activation function, while the decoder does not implement one.
C. Speculative QGNN interaction network
The next step is to either rerun the algorithm with the updated node and edge states, i.e., Eqs. (23) and (24), or have the update features proceed to the decoder. This decision is based on the chosen number of runs and which particular run the algorithm is on. For additional insight, Fig. 2 contains the psuedocode for the Speculative QGNN IN algorithm.
It should be noted that for the purposes of this project, only the updated node states were input into the decoder, while the updated edge states were only used in the node state update function and, thus, were confined to use within this algorithm, i.e., the processor. This is based on the similar process followed by Sanchez-Gonzalez et al.16 Additionally, in static graphs, the matrices Er and Es of the receiver and sender indices remain the same. However, this project relies on dynamic graphs, meaning Er and Es change with the time progression of the particle interactions. This progression is described by time steps, each one corresponding to the overall state of the system at a particular moment in time, represented by a graph. Thus, edges are uniquely constructed between nodes for each graph, which is achieved via a nearest neighbor algorithm within a particular “connectivity” radius used for each node at each time step, as implemented by Sanchez-Gonzalez et al.16
D. Implementable QGNN interaction network
The design of the Implementable QGNN requires further deviation from the initial classical GNN learning model. In particular, this is a result of only being able to utilize slices of larger matrices during a single run-through of the entire algorithm. This is required in order to maintain superposition in the quantum circuit. Whereas the classical GNN and the Speculative QGNN both rely on two marshaling functions, the Implementable QGNN has none. Instead, it applies only a single column of N, Er, Es, and Ea at a time, via a series of RX gates to the qubits corresponding to the edges. The resultant information in the edge state qubits is then transitioned to into the node state qubits via decoding unitaries. Overall, this results in a method of information propagation not based in direct matrix multiplication but instead in the application of rotation matrices. Furthermore, it adds the requirement of an additional layer of decoding unitaries. Note that all of this does neither suggest that the Implementable QGNN will necessarily be less effective nor that it is somehow a worse implementation of GNNs. Instead, these dissimilarities from the Sanchez-Gonzalez et al. GNN16 make the QGNN model studied here a novel attempt at quantum machine learning.
There are multiple steps in the Speculative QGNN learning model, i.e., a series of matrix multiplications. However, the Implementable QGNN learning model is more aptly defined by its unique design, in which there are no steps implemented outside the context of the quantum circuit. Thus, the entire algorithm is realized in a single quantum circuit and is best explained through examination of the quantum gates that constitute it. The full description of the Implementable QGNN can be found in Sec. III.
III. METHODS
A. Data encoding
Each learning model is trained on two datasets, derived from the same classical simulation (see Sec. IV A), to create the initial node and edge state vectors. The initial state vector for a particular node consists of its previous two velocities, and , and its normalized clipped distances to the boundaries, bi, where these distances are clipped by the connectivity radius.16 The boundaries were approximated based on the maximum and minimum positions observed in the simulation's dataset. Each velocity has vector length two, and the vector sum of the clipped distances is length four. Concatenating these features gives the initial node state vector its size of eight, i.e., . A vector length of 8 was chosen because values of are naturally easy to work with in a quantum circuit. Likewise, it is ideal to use a small number of qubits to avoid substantial training times and noise. The initial state vector for a particular edge consists of its relative positional displacements, dx and dy, i.e., the distance between the corresponding sender and receiver given a particular edge, and that distance's corresponding magnitude, D; the prior is vector length two, and the latter is vector length one. Requiring an input size of , a single layer of zero padding was added to each edge state vector. Concatenating these features gives the initial edge vector its size of four, i.e., . Outside of the zero padding, this composition of data is the same as that used by Sanchez-Gonzalez et al.16
An initial step in quantum machine learning is encoding classical data into qubits, which can be accomplished using various methods such as qubit encoding,20 tensor product encoding, and amplitude encoding.21 The third method was implemented using the AmplitudeEmbedding function available in Pennylane, the quantum-compatible python package used to realize this project's classical-quantum algorithms.22 Amplitude encoding consists of embedding classical input values into the amplitudes of a quantum state. This requires transforming the data from its classical format into that of a superposition state. A superposition state consists of values, with n being the number of qubits for a given quantum circuit. Thus, the criterion arises for the input data to be of size .
B. Parameterized quantum circuits
The Parameterized Quantum Circuits (PQC) used for the encoder and decoder are the same as those utilized in the Quantum Convolution Neural Network (QCNN) designed by Cong et al.,23 while the processor is the same as Circuit 15 designed by Hubregtsen et al.24 The QCNN PQCs are valuable in that they provide a method both for expanding data into a higher dimensional latent space and for pooling information into a desired number of qubits. Conversely, the value of Circuit 15 is more holistic, being a PQC that was proven to be moderately accurate in the context of classification, while retaining a low number of required parameters.
The encoder is the reverse QCNN used by Cong et al., which is equivalent to the multiscale entanglement renormalization ansatz (MERA).23 Thus, instead of pooling the information, it expands it to a higher dimensional latent space. This reverse QCNN is realized via the repeated application of a two qubit unitary, as shown in Fig. 3(a).25 This unitary consists of RX, RY, and RZ gates, with the learning parameters being the corresponding degrees of rotation. Note that the top and bottom RZ, RY, and RX gates labeled , respectively, share the same parameters between pairs. Therefore, even though there are 18 rotation gates, there are only 15 parameters in total.
With the encoder's unitary defined, the next step is the method of application. The unitary is applied sequentially to every pair of qubits in the circuit, shown in Figs. 3(b) and 3(c). Note the difference between Figs. 3(b) and 3(c) is simply that Fig. 3(b) is the encoder for the node input, which has an initial state vector length of 8, while Fig. 3(c) is the encoder for the edge input, which has an initial state vector length of 4. Thus, Fig. 3(b) requires three qubits to amplitude encode the data, while Fig. 3(c) needs two qubits. The encoder unitary uses the same parameters across qubit pairs during a single run, i.e., the same 15 parameters applied, for example, to the qubit pair 0 and 1 are applied to the qubit pair 1 and 2 as well; the parameters only change when they are updated via the optimization process.
Concerning the decoder, it is the QCNN used by Cong et al., which is equivalent to the reverse MERA.23 Thus, it pools the data into a select number of qubits. For this project, the data are pooled into the final two qubits. The QCNN is realized via the repeated application of the two qubit unitary shown in Fig. 4(c).25 This unitary consists of the application of RX, RY, RZ, and CNOT gates, with the degrees of rotation being learned parameters. The total number of parameters is 6. The decoder unitary is applied to pairs of qubits, with the information in the top qubit being pooled into the bottom qubit. For example, here, information from qubits 0 and 1 is pooled into qubits 2 and 3, respectively, as seen in Fig. 4(d). Similar to the encoder unitary, the decoder unitary's parameters are the same in each application to qubit pairs, prior to backpropagation.
The processor is based on the design presented by Hubregtsen et al.,24 shown in Fig. 4(a). It contains 16 gates but only eight parameters. It consists of two columns of RY gates, each followed by cascading CNOT gates. Though the encoder and decoder are utilized only once during the entire circuit, the processor is repeatedly applied based on the desired number of message passing steps. As described in Sec. II (also see Fig. 5), the algorithm requires the processor being applied a minimum of five times: three for edge processing and two for node processing. As is an inherent trait of GNNs, each repetition of the processor corresponds to a node's message being passed an additional node away. For example, having three repeated instances of the processor in the algorithm corresponds to a node “knowing” about its neighbors up to three edges away.1 However, this is in the classical sense, where the processor is a single multilayer mprceptron (MLP). In the context of the quantum algorithm used in this study, a single complete run of the interaction network can be considered a single use of the processor. This is why we treat the overall edge and node processors as decomposing into their corresponding processors, as shown in Eqs. (15) and (21). Thus, another more quantitative way to consider this situation is that each 5 uses of the processor unitary corresponds to the completion of a single message pass. Figure 5 gives a more intuitional sense of this, showing a complete run of the algorithm with the incorporation of the PQCs, containing only a single step of message passing. This figure will be explained in more detail in Sec. III C. Note, Table I lists the number of parameters and gates found in each QGNN model, given P amount of processors in the learning model. As mentioned in the figure, the Implementable QGNN does not have an obvious scaling relationship with qubit count. For parameters, increasing the qubit count means expanding either the node or edge sections of the circuit or both; however, this is subjective and depends upon the experimenter's choice. Likewise, for gate count, the implementation of the RX gates would shift with increased qubit count, such being that the original series of RX gates was explicitly chosen to efficiently distribute information over four qubits when given six parameters. If either of these values changed, the application of RX gates would need to be adjusted, with the manner of adjustment being arbitrary and, thus, based on the experimenter's choice. For example, given a case where there are six parameters to apply and a quantum circuit with a node section containing six qubits, the unitaries Ur, Us, and Ua of that circuit would only need a single column of RX gates to effectively propagate information. This is not true for other qubit amounts. Furthermore, with increasingly large numbers of qubits, it is unclear what form the RX unitaries would take once the qubit count has surpassed the RX gate count of each unitary, i.e., it is arbitrary how to apply a vector of length 4 to a quantum circuit with seven qubits.
C. Speculative QGNN integration
Referring to Fig. 5, our implemented method relies on only four qubits, with the output of each sub-circuit, i.e., node encoder, , etc., saved and input into the next sub-circuit, as designated by the algorithm, with the corresponding rotation parameters likewise saved. The algorithm relies on numerous steps of matrix multiplication to combine information for quantifying origin, target, and magnitude of effects. This is easily done classically, where the output at each sub-circuit can be stored until the entire corresponding matrix is formed. However, doing so in a quantum circuit context would either require directly implementing superposition amplitudes or utilizing additional qubits to store the information. In the case of the latter, with multiple runs of each sub-circuit, this would lead to a quickly increasing number of qubits. This would be highly impractical, if not impossible, to realize on quantum hardware and the quantum simulation software utilized in this project, i.e., Pennylane.26 Thus, the former is necessary, even though it forfeits implementability as explained in Sec. I. However, a QGNN that is fully implementable is realized in Sec. III D of this paper, though it requires deviating from the prior mentioned method of propagating information via matrix multiplication.
Figure 5 begins on the left with two encoders. The top is the node encoder, and the bottom is the edge encoder. Each encoder expands the data from its initial vector length (8 for node input and 4 for edge input) into a vector space of length 16. At this point, the interaction network is implemented. The expanded versions of N and Ea, as seen at the intersection of the encoders and processors in the figure, correspond to Eq. (12), i.e., the start of the algorithm. These superposition amplitudes are then saved, with N directly plugged into the next step of the algorithm: the series of matrix multiplications and circuit applications of the edge processors as described in Eqs. (13)–(16). This overall region of matrix multiplications and learning functions is the edge processor as indicated in the figure. The expanded Ea is included in the final matrix multiplication of the edge encoder, A3, whose output is then applied to . The node processor, as seen in the figure, first begins with the expanded version of N being applied to , whose output is then multiplied by A, the output of , and the original expanded N, to form P2. The final step in the node processor is to then apply P2 to , producing the updated node state P [see Eqs. (19)–(22)]. P is then applied to the decoder, which outputs on qubits 2 and 3 the predicted new vertical and horizontal accelerations of the particles.
D. Implementable QGNN integration
Examining the Implementable QGNN in Fig. 7, the quantum circuit is broken up into two parts: qubits 0–3 represent the node states and qubits 4–7 represent the edge states. It is important to first understand the implementation of the matrices N, Er, Es, and Ea as described in Eq. (12). Here, the algorithm requires that the transposes of Er, Es, and Ea are utilized, which is a consequence of the original shapes of these matrices. Er and Es are of size , and Ea is of size , which, for this project, means that each E matrix is of shape ; the extra dimension of padding that Nl had for the Speculative QGNN is not used here. This is a unique constraint of the Implementable QGNN, in which the Nl length must be either a factor of or equal to the number of particles in the simulation. This allows for compatibility, as similarly described below. However, the dimension of Ne is variable because it describes the number of edges per given time step. Additionally, N is of fixed shape , i.e., 8 × 3 for this project. Thus, for compatibility and consistency with applying the matrices in unison to the quantum circuit, the transposes of the E matrices were required. In short, each applied matrix has a column size of 3, meaning the quantum circuit is run a total of three times per time step, i.e., per set of N, , and , with the variability of dimension Ne absorbed by the rotation gates (explained below).
In the Speculative QGNN, adhering to the classical learning model, information is propagated via matrix multiplication of the matrices N, Er, Es, and Ea. Here, however, these matrices are applied directly to the quantum circuit via the rotation parameters of rotation gates. Figure 6(a) depicts a cascading series of RX gates; this entire cascade is treated as a unitary and is applied for each of the ET matrices. The choice of RX gate was arbitrary, though the application was purposefully kept uniform for equivalent incorporation of data. For a given RX unitary, the rotation values of its RX gates are determined by the row values of the given column in use. The RX unitaries for , and are represented, respectively, by the unitaries in Figs. 6(b)–6(d). Matrix N has row vector length 8. Thus, the implementation of matrix N in the quantum circuit requires eight RX gates, but only a single application for efficient information distribution among the qubits, as shown in Fig. 6(e), and its unitary representation shown in Fig. 6(f). The row vector length of the ET matrices, Ne, is variable. This variability is due to the calculation of edges per time step as achieved via a nearest neighbor algorithm applied to the particles in the simulation within a certain connectivity radius. For particles within the connectivity radius of each other, two directed edges will be generated between them; each particle influences the other. Regardless of connections, i.e., edges, with other particles, each particle is given a self-edge. In the context of this project's simulation, there are three particles contained within a box. If no particles are within the given interaction radius of one another, then there will only be three edges in that time step, each being a self-edge. However, if every particle is within the given interaction radius of every other particle, then there will be nine edges: three self-edges and three pairs of edges between particles. Though the upper bound is nine edges, this amount never occurred in any of the datasets used in this study. Instead, the max edge number observed was 5, meaning that the largest graph observed in the data consisted of two nodes, i.e., the greatest number of neighbors for any particular node was 1. Hence, for this project, the dimension of Ne is bound between 3 and 5. The upper bound is reflected in the number of RX gates constituting Ur, Us, and Ua; furthermore, an additional RX gate was added to allow for more complex datasets in future studies. Thus, there are a total of six RX gates applied for each of these unitaries. As a consequence of the variability in Ne, the number of rotation parameters can change between each time step. To account for this, the RX gate's rotation parameter assumes a value of zero when one is not provided. For example, if there exists only four edges, then the fourth and fifth RX gates' rotation parameters, c4 and c5 of the ET matrices, would be given a value of zero. Note that Table II contains the full list of parameters of each unitary used in the QGNNs. Furthermore, it indicates whether the unitary's parameters are trainable, i.e., whether they are optimized by the learning algorithm.
Unitary . | Parameter count . | Trainable . |
---|---|---|
UE | 15 | Yes |
UP | 8 | Yes |
UD | 6 | Yes |
Ur | 6 | No |
Us | 6 | No |
Ua | 6 | No |
UN | 8 | No |
Unitary . | Parameter count . | Trainable . |
---|---|---|
UE | 15 | Yes |
UP | 8 | Yes |
UD | 6 | Yes |
Ur | 6 | No |
Us | 6 | No |
Ua | 6 | No |
UN | 8 | No |
The full application of the Implementable QGNN is shown in Fig. 7. In particular, this figure demonstrates a one processor implementation of the QGNN, meaning there is only one step of message passing between nodes using this circuit. For additional steps of message passing, as in the prior GNN implementations, simply add copies of the processor following the original. Each n additional copy is equal to one additional run of a message passing algorithm, which corresponds to nodes learning about neighboring nodes n additional steps away. Note that these copies each have their own unique parameters. The algorithm begins by encoding the node state and edge state inputs into qubits 0–2 and 4–6, respectively. Immediately after this, the node and edge encoder unitaries, and , respectively, are applied to their corresponding sections, which are the same unitaries described by Fig. 3(a). Following this, the RX unitaries for N, , respectively, UN, Ur, Us, and Ua, are applied to edge section qubits. The edge section processor, , which is identical to that described in Fig. 4(a), is then applied to the same section. The application of the RX unitaries and processor is analog to Eqs. (13)–(16) of the Speculative QGNN learning model. As previously mentioned, a consequence of implementing all parts of the algorithm in a single circuit is that an extra section of decoder unitaries is required. They are used to transfer information from the edge section qubits to the node sections qubits. This requirement is inherent from the node feature state update being based upon the edge feature state update, the crux of the GNN algorithm. This transfer is done via the application of the decoder unitary, represented by , where t is the transition from edge to node. This decoder unitary has a set of parameters different from that of the last decoder unitary, , with f being the final application of decoder unitary. The next unitary is a reapplication of Ur; however, this time it is applied to the node section qubits. The use of Ur here is analogous to Eq. (17) of the Speculative QGNN learning model. The final unitary of the overall processor is the node section processor, . This unitary has unique parameters from that of the edge section processor. Likewise, the application of this unitary is analogous to Eqs. (11)–(22) for the Speculative QGNN. Ending the entire circuit, the decoder unitary is applied, and a measurement is made on qubits 2 and 3. The expectation value of these measurements is the predicted x and y direction accelerations of each particle. For a single time step, there are three cycles of this complete circuit, meaning the final output is a 3 × 2 matrix where the columns are the x and y accelerations, and the rows correspond to particular particles.
In place of pseudocode, which was used in describing the classical GNN and Speculative QGNN, a simple example of the Implementable QGNN is given in Fig. 8. In particular, Fig. 8(a) shows a simpler case than examined in our simulations: two particles. These two particles make up the entirety of this example's particles and are shown at an instance in which they are interacting. Each node and edge has its own associated vector value. Figure 8(b) shows the matrices N, , and constructed from the information contained in the graph. Figures 8(c) and 8(d) show the RX unitaries UN, Ur, Us, and Ua that implement the matrices depicted in Fig. 8(b). Note that only the first group of RX gates in the gate cascade which constitutes Ur, Us, and Ua is shown; this is for brevity. To complete the cascade, the values and corresponding gates would be adjusted, as previously shown in Fig. 6(a). Again, this example examines two particles, meaning there are two runs of the Implementable QGNN. Each run corresponds to a particular particle. This is why each matrix described in Fig. 8(b) has two columns. The first column of each matrix is utilized in the first run and corresponds to the first particle. Likewise, the second column is utilized in the second run and corresponds to the second particle. Only the RX unitaries are given in this example, as they are the only unique unitaries to the Implementable QGNN. The remaining portions of the Implementable QGNN are run as depicted in Fig. 7. Note that the length of the edge vectors is 2. As previously explained, this is for the sake of compatibility, which is evident here. In particular, the vector length of the edges must be a factor of or equal to the number of particles in the simulation. The latter case is shown here. However, in the case of the former, a simple degree of repetition would need to be incorporated. This could take the form of copying the matrix and concatenating it with itself until it is the desired size. This is why it would need to be a factor of the particle count: so that its contents would be fully exhausted in the application of the circuit.
IV. RESULTS
A. Overview
There were two groups of GNN models trained. The first consisted of GNNs with one processor, and the second consisted of GNNs with two processors. As each model progressed through its training, the calculated loss value following the application of each additional batch is shown in Fig. 9. In particular, Figs. 9(a)–9(c) and 9(d)–9(f) each show the loss value, whereas Figs. 9(g)–9(i) show the common logarithm of the loss. The loss function implemented was mean squared error (MSE), meaning the distance between the predicted acceleration of a given particle and its ground-truth acceleration was measured via taking its MSE value.27
The ground-truth data were obtained via a particle simulator found in the open source software Taichi Lang.28 The software simulated three particles interacting together in a box as they fell under the influence of gravity. The software provided the positions and velocities of each particle per given time step. The velocity contained in the node state vector input into the GNN models is approximated as the finite difference of the Taichi Lang positions, and the acceleration data that act as the labels of the GNN models are likewise approximated as the finite difference of the Taichi Lang velocities.16 Furthermore, the GNN predicted position of each particle was obtained via a Euler integrator that calculates the next position from the current acceleration output from the given GNN model, as implemented by Sanchez-Gonzalez et al.16 Likewise, based on Sanchez-Gonzalez et al., the optimizer implemented was Adam, featuring a learning rate of 0.01.16,29 A batch size of 4 was used, each of the four data points being a time step, with the respective loss value of each time step being averaged together to calculate the entire batch's loss value. The data points are randomized prior to each epoch. Note that the maximum number of processors used in this project was two. This was chosen with the amount of particles in mind. In particular, the simulation consisted of three particles. The maximum graph size possible contained nodes that had two-step neighbors, i.e., neighbors that are two edges away. Each processor corresponds to a single step; thus, the maximum number of processors needed was two. However, as previously mentioned in the Implementable QGNN Integration section, the largest graph observed in the datasets used in this particular study contained a largest graph size of two connected nodes.
The method of numerical comparison between different models is based on the reduction of loss value and the degrees of accuracy, as detailed in Secs. IV B and C. The loss value, MSE accuracy, and percent error accuracy are observed per batch as the models progress through the dataset. Greater reduction in loss value and higher degrees of accuracy correspond to better performance. This criteria is used to relatively rank the models tested in this study. Note that the accuracy measurements are based on the particle positions which are derived from the GNN outputs, i.e., the acceleration values of each particle in the simulation. Thus, the degrees of accuracy correspond directly to the particle positions, while the loss values correspond to the particle accelerations.
B. Learning efficiency
Figure 9(a) shows the entire loss value trajectory as training progressed for the GNNs with a single processor. As evident in this figure, there exists considerable overlap between the loss values for the classical GNN (CGNN), Implementable QGNN, and Speculative QGNN. Figures 9(b) and 9(c) show zoomed in views. The prior zooms into the y-axis, looking between a range of , while the latter zooms into the x-axis, looking between a range of . Figure 9(b) further demonstrates the considerable overlap between loss values, and only in Fig. 9(c), the difference become clear. The CGNN starts with the highest loss value, the Implementable QGNN starts in the middle, and the Speculative QGNN starts with the lowest. Additionally, the QGNNs approach a near zero loss value approximately five batches prior to the classical. Regardless, each GNN approaches this near zero amount within ten batches. It is difficult to compare the learning efficiencies of the 1 processor GNNs in this direct manner; thus, examination of their logarithmic plots was necessary, as shown in Fig. 9(h). It is evident that the Implementable QGNN reaches and maintains the lowest loss values, followed by the CGNN, and last by the Speculative QGNN. Considering each GNN approaches a near zero value within the first one percent of applied batches, using the logarithmic results as criteria for the comparison of learning efficiencies is appropriate. Thus, the Implementable QGNN is most efficient, the CGNN is second, and the Speculative QGNN is least. However, the de facto results show the learning efficiency of each 1 processor GNN model is highly similar, with a notable degree of overlap existing between loss values throughout training. Overall, each model is highly efficient in reducing the loss throughout training.
Figure 9(d) shows the entire loss value trajectory as training progresses for the 2 processor GNNs. Additionally, Figs. 9(e) and 9(f) show zoomed in views of the y-axis and x-axis as described in the 1 processor GNNs case. Likewise, analysis of these figures, in addition to the logarithmic plot of the 2 processor GNNs loss values, as seen in Fig. 9(i), will result in the same conclusions made for the 1 processor GNNs case. Specifically, in the context of learning efficiency, the Implementable QGNN is most efficient, the CGNN is second, and the Speculative QGNN is least efficient. However, once again, the de facto results show the learning efficiency of each 2 processor GNN model is highly similar, with a notable degree of overlap existing between loss values throughout training. Overall, each model is highly efficient in reducing the loss throughout training. These conclusions are based on the same observations as found in the 1 processor case.
Figure 9(g) shows the logarithmic plot of loss values for each GNN in both the 1 processor and 2 processor cases. As a general trend, it appears that the Implementable QGNNs have the highest learning efficiencies, the classical GNNs both have the middle, and the Speculative QGNNs both have the lowest. However, comparing model pairs, the 1 processor GNNs are more efficient than their 2 processor counterparts. This is explained by the observation made in the Implementable QGNN Integration section, which stated that the graphs generated in the ground-truth three-particle simulation consist mainly of one-step graphs, i.e., graphs containing nodes that only share edges with a single neighboring node. This explains the difference in efficiency because 2 processor GNNs used on graphs containing nodes with only one-step neighbors are redundant. In particular, consider that each additional processor corresponds to a particular node learning about a node an additional step away, i.e., message passing. In this situation, the second message pass would be redundant because there would be no two-step neighbor to learn about. Furthermore, the second message pass may cause an over mixing of the node states, decreasing the distinct features of each node, reducing useful information in the system.
C. Accuracy
Though it is positive to observe that each GNN has a high learning efficiency, it is equally important to observe the accuracy in the model predictions. To measure this, two methods were used: the first was taking the MSE of the predicted and target next position values of each particle, and the second was taking the percent error of these same values. The prior is based on the work of Sanchez-Gonzalez et al., in which they utilized the same means of accuracy measurement.16 The latter is based on the observation that the positions, predicted and actual, of each particle consist of considerably small numbers, a majority of the time being between . Thus, MSE measurements will already be a near zero value, meaning that the MSE method of estimating accuracy is impractical for visual comparison in the context of this study's results. Regardless, some useful information can still be obtained from viewing the MSE plot of each GNN.
Figures 10(g)–10(i) and 10(j)–10(l) show the MSE plots for the 1 processor and 2 processor GNNs, respectively. Figures 10(h) and 10(k) zoom into the y-axis, looking between a range of , while Figs. 10(i) and 10(l) zoom into the x-axis, looking between a range of . Similar to the learning rates for both 1 processor and 2 processor cases, the MSE value rapidly decreases to near zero values. Likewise, the CGNN begins with the highest MSE values and realigns with the Implementable QGNN and Speculative QGNN at approximately ten batches. However, here, the Implementable QGNN and Speculative QGNN immediately begin with near zero MSE values. Overall, the MSE values of the 1 processor GNNs overlap considerably, as in the MSE values of the 2 processor GNNs. Note that as a general trend, the 2 processor GNNs have a higher MSE value throughout training as compared to the 1 processor GNNs. Furthermore, they do take more batches to reach the same near zero MSE values as already reached by the 1 processor GNNs.
Figures 10(a)–10(c) and 10(d)–10(f) show the percent error plots for the 1 processor and 2 processor GNNs, respectively. Figures 10(b) and 10(e) moderately zoom into the y-axis, looking between a range of , while Figs. 10(c) and 10(f) highly zoom into the y-axis, looking between a range of . Note that the graphs of percent error are averages, with the average percent error calculated, i.e., updated, at every batch, and the resulting average percent error plotted. Examining the 1 processor GNNs, it is immediately clear that they deviate from the results of the learning efficiency comparisons. In particular, here, the Speculative QGNN has the highest accuracy throughout training, followed by the Implementable QGNN and last by the CGNN. This is in direct contrast to the Implementable QGNN having the greatest learning efficiency, followed by the CGNN, and last by the Speculative QGNN. However, examining the 2 processor GNNs, they do not have this deviation but instead follow the pattern established by their learning efficiencies. This difference is a possible consequence of the occasional redundancy of the 2 processor GNNs in the case of this three-particle simulation, as previously described. Likewise, this is also a possible consequence of the increase in parameters with the inclusion of an additional processor, which does not increase the parameter count equally in each GNN model. Regardless, as shown in Fig. 11(a), in both the 1 processor and 2 processor cases, the percent error of the Implementable QGNN, Speculative QGNN, and CGNN all decrease at comparable rates, leveling off in close proximity approximately at 110% error for the 1 processor GNNs and at 112% error for the 2 processor GNNs. Note that the high degree of inaccuracy shown in these measurements must be considered in the context of all other results. Thus, these measurements do not indicate that the performances of these models are fruitless; for a full analysis, see the Performance and Hyperparmeters section.
It is worth comparing percent error of all the GNNs together, as shown in Fig. 11(a). Here, there is no particular pattern to the accuracy rankings. The Speculative QGNN with 1 processor performs the best, while its 2 processor counterpart performs the worst. The Implementable QGNNs perform second and third best, with both the 1 and 2 processor cases performing approximately the same. Likewise, the CGNNs perform fourth and fifth best, with the 2 processor case performing slightly better overall than the 1 processor case. Regardless, these differences ultimately are rather minute, with each GNN having a difference in percent error accuracy within at most approximately 10% of each other.
We find a similar situation, testing the trained models on the validation dataset, which is approximately 30% the size of the training dataset. The resulting percent error measurements of running each model on this validation data set can be seen in Fig. 12; furthermore, the time progression of sampled position predictions made by each model while using this data set can be seen in Fig. 13. In particular, Fig. 13(a) shows the percent error in predictions for all models while running the validation dataset. The accuracy rankings given by the training results are comparable to the outcomes here. Figure 11(b) demonstrates this; zooming in on the y-axis, the 1 processor Speculative QGNN performs the best, while its 2 processor counterpart performs the worst. In this case, however, the performance of the remaining models is nearly indistinguishable, being almost completely overlapped. For completeness, Figs. 13(c)–13(f) show the zoomed in views of the Implementable QGNN and CGNN cases. In particular, Fig. 13(c) shows the 1 processor GNNs case, and Fig. 13(e) shows the same except with the CGNN results omitted to prove they overlap with the results of the Implementable QGNN. Likewise, Fig. 13(e) shows the 2 processor GNNs case, and Fig. 13(f) shows the same except with the CGNN results omitted to prove they overlap with the results of the Implementable QGNN.
The accuracy in the context of the validation dataset is notably worse based on the percent error measurements. However, this is not surprising, as the percent error accuracy was initially poor throughout training. Additionally, the overall performance of these models is less similar than their performances during training. In particular, for the final half of percent error results, the 1 processor Speculative QGNN resides at approximately 350%, both cases of the Implementable QGNNs and CGNNs reside at approximately 400%, and the 2 processor Speculative QGNN resides at approximately 500%.
D. Performance and hyperparmeters
In determining the performance of the GNNs, it is necessary to consider their varying measurements of accuracy. In particular, their MSE measurements, combined with observing the constant overlap of particles in Fig. 12(a), would suggest that their accuracies are high. These combined observations indicate that each model is capable of following the general trend of the ground-truth. However, this must also be considered in the context of the percent error measurements and Fig. 12(b), the zoomed in view of the rightmost particle in Fig. 12(a.iii). The method of simulating particle interactions is via generating time steps with a small time increment between consecutive steps. For the ground-truth simulation, this time increment was 0.0001, meaning particle movement behaves approximately to this scale. Figure 12(b) shows that the vertical distance between the closest particle is 0.001 (arbitrary units). This difference is ten times greater than the time increment. Thus, this large difference indicates that percent error measurements are also correct, meaning each model has notable degree of inaccuracy. Considering the conclusions based on both the MSE error and percent error measurements, the GNN models are, hence, moderately inaccurate. To be precise, they able to approximate the general trend of particle interactions while being a non-negligible percent off.
The moderate inaccuracy and high learning efficiency of each model suggest that they are able to quickly identify some simple features in the data and accurately make predictions based on them, which results in the high learning efficiency. However, simultaneously, there are more complex variables at work, for which the models are completely inept at determining, resulting in an overall moderate degree of inaccuracy. We found that these complexities are related to the nature of the problem: particles in a box interacting under the influence of gravity. If pushed in the x-direction, there are no forces on the particles except for collisions with boundaries or other particles. If they are falling, bouncing off another particle, or doing some combination of these actions, this is a complex behavior. This is exemplified in Fig. 13, where the sudden peaks in the percent error correspond to particles falling under the influence of gravity and particles colliding, whereas the remaining portions of the graphs correspond to particles rolling. This is further confirmed in the raw data, where it was observed that the x-direction values of the particles tended to be considerably more accurate than the y-direction values [see also Fig. 12(b)]. Additionally, this suggests that the percent error in the training plateaued around 100% as an effect of the x-direction values being accurate, while the y-direction values were incorrect to a notable magnitude.
The moderate inaccuracy of the models does not condemn them as a whole. Rather rudimentary neural network structures were used throughout this project, with similarly simple learning rates. Any increasingly advanced techniques, such as dropout and decaying learning rates, were avoided. This lack in optimization of hyperparameters is likely a large contributor to the current issues with these models. This notion is further supported by Fig. 11, which shows the percent error trajectories of the 2 processor CGNN for the learning rate of 0.01 compared to various learning rates. Proceeding down Fig. 11's legend, the first two variations follow the Adam learning rate algorithm described by Kingma and Ba,29 with the relevant variables given in the legend accounting for the difference in performance. The third variation is simply a decrease in the learning rate's magnitude. As shown by the first learning rate variation, a simple adjustment of this parameter already results in an increase in accuracy. As mentioned previously, the models implemented in this study were purposely kept simple for the sake of ease and efficiency in implementation. Thus, the basic learning rate of 0.01 was used throughout training and testing.
As described at the beginning of this study, following proof of a quantum GNN analog, the results of the CGNN and Speculative QGNN were obtained to compare against the Implementable QGNN. It is promising that the Implementable QGNN has a greater learning efficiency than both the CGNNs and Speculative QGNNs. Likewise, during training, the Implementable QGNN's accuracy performance appears to offer advantage over the CGNNs, in addition to the Speculative QGNN (ideal quantum-classical GNN analog) in situations containing redundancy. A purely theoretical approach to proving such advantage is difficult to obtain; however, an empirical approach can suffice. Specifically, the Implementable QGNN's superiority is supported when the number of parameters in each model is considered. Examining the best set of performances, i.e., the 1 processor GNNs, the Implementable QGNN has 58 parameters, the Speculative QGNN has 76 parameters, and the CGNN has 431 parameters. Thus, the 1 processor Implementable QGNN not only has a greater learning efficiency and degree accuracy than the 1 processor CGNN during training, but also gives this performance having approximately the number of parameters contained in the 1 processor CGNN. However, the identical performances of the CGNN and Implementable QGNN when tested on validation data suggest that this question requires further experimentation to reach any definite conclusion. Furthermore, this study was completed using a quantum circuit simulator and, thus, would have to be implemented on actual quantum hardware to determine any real advantage.
V. CONCLUSION
The aim of this project was to construct quantum analogs to the classical graph neural network, as based on the work of Sanchez-Gonzalez et al.16 That goal was realized via two quantum graph neural networks: one that was speculative and one that was implementable. These QGNNs were compared alongside the CGNN in the task of particle interaction simulation. For simplicity, the case of three particles contained within a box was used to generate the training data; likewise, the most basic form of these GNNs was implemented. Two sets of GNNs were tested. The first contained a single processor, and the second contained two processors. Overall, the models proved capable of learning simple characteristics in the data, resulting in a high learning efficiency. However, they were unable to determine the more complex behaviors that were simultaneously occurring, resulting in a moderate inaccuracy in predictions. These conclusions are evident in the discrepancy between the predictions in the x and y values, in which the x value predictions tended to be more accurate because they are governed by simpler behaviors.
In addition to the successful realization of QGNNs, the results of this study suggest that the Implementable QGNN could have an advantage over CGNNs in learning efficiency and accuracy. However, further testing is required to confirm this. Furthermore, it is likely that the overall moderate inaccuracy in predictions is not wholly a fault of the models but perhaps a result of not fine tuning the hyperparameters. This leads to a path for a potential future study. In particular, future research should implement these models under a variety of hyperparameters, observing the consequences on learning efficiency and accuracy.
ACKNOWLEDGMENTS
The views expressed are those of the authors and do not reflect the official guidance or position of the United States Government, the Department of Defense, the United States Air Force, or the Griffiss Institute.
The appearance of external hyperlinks does not constitute endorsement by the United States Department of Defense of the linked websites, or the information, products, or services contained therein. The Department of Defense does not exercise any editorial, security, or other control over the information you may find at these locations.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
Benjamin Collis: Conceptualization (equal); Data curation (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Software (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). Saahil Patel: Conceptualization (equal); Formal analysis (equal); Methodology (equal); Software (equal); Visualization (equal). Daniel Koch: Conceptualization (supporting); Methodology (supporting); Writing – review & editing (equal). Massimiliano Cutugno: Software (supporting). Laura Wessing: Funding acquisition (equal); Project administration (equal); Resources (equal); Supervision (equal). Paul M. Alsing: Funding acquisition (equal); Project administration (equal); Resources (equal); Supervision (equal).
DATA AVAILABILITY
The data that support the findings of this study are available from the corresponding author upon reasonable request.