With the increasing prominence of data security in cloud storage, we propose a practical and robust cloud storage scheme, which uses quantum random numbers as encryption keys, disperses the keys using Shamir’s secret sharing scheme, applies erasure coding to the ciphertext, and securely transmits the data through quantum key distribution protected networks to the distributed clouds. This system offers several key advantages, including quantum-level security, fault tolerance, and storage space saving. To validate its feasibility, we conduct comprehensive experimental tests covering essential functionalities such as encryption/decryption, key preservation, and data storage. By successfully demonstrating the effectiveness of our proposal, we aim to accelerate the application of quantum technology in cloud storage.
I. INTRODUCTION
As the explosion of big data continues unabated, cloud storage has evolved into a critical network service.1 Its appeal lies in facilitating seamless multi-device synchronization, enabling data backup, sharing, and collaboration in both personal and business scenarios. However, as more individuals and businesses entrust sensitive information to cloud services, data security and privacy becomes paramount. The potential consequences of data breaches or unauthorized access are severe, ranging from financial losses to reputational damage. With a multitude of cyber threats and sophisticated attack techniques, robust security measures are essential to protect against these risks.
One promising strategy to fortify cloud storage security harnesses the potential of quantum communication technology. Quantum communication, notably quantum key distribution (QKD), offers secure communication modalities underpinned by the fundamental principles of quantum mechanics.2–5 After undergoing laboratory testing6–11 and verification in various field environments,12–15 QKD proves to be applicable to the realm of information security and has gradually found widespread applications.16–21 To address the issue of secure cloud storage, a single-password authentication-secret sharing scheme based on QKD networks and Shamir secret sharing has been experimentally validated.22 A complementary approach that introduces an information-theoretic secure integrity protection scheme with third-party timestamp verification resolves the issue of data owners deceiving end-users.23 A distributed secure storage system based on this scheme holds promise for the secure storage of human genome data and for routine diagnosis based on whole-genome sequencing.24 However, the schemes’ dependence on the Shamir method for data security reveals information when obtaining shares beyond the threshold. Furthermore, the pursuit of information-theoretic security imposes substantial costs, such as the significant storage requirements of the Shamir method and the hurdles in meeting high-bandwidth requirements with joint QKD and one-time pad for data transmission.
Addressing these constraints, our study presents a quantum-secure, fault-tolerant distributed cloud storage system. The data to be stored are encrypted utilizing symmetric cryptography with quantum random numbers as keys. These keys are dispersed and saved using Shamir secret sharing,25 and the ciphertext is distributed and stored on different cloud servers using erasure coding, as proposed in Ref. 26. Our proposal offers the advantages of quantum security, fault tolerance, and saving storage space. Moreover, QKD networks can be employed for transmitting data blocks distributed among various cloud servers, providing a dedicated virtual private network at the physical layer for data isolation between cloud storage nodes. Experimental demonstrations show that this scheme is very suitable for implementation on the already-installed QKD metropolitan area network.
II. PROTOCOL
In our scheme, the user encrypts data to be stored using symmetric cryptography, with quantum random numbers generated as keys. The encryption keys are dispersed using Shamir secret sharing, with some parts saved locally and others saved together with the stored data on the cloud. Upon receiving the ciphertext, the server employs erasure coding to divide it into multiple data blocks, which are then stored in various cloud storage nodes. Leveraging QKD networks, the data blocks are securely transmitted to different cloud storage nodes, guaranteeing both the security of the transmission process and the isolation between the individual data blocks. In the specific execution process, our scheme can be divided into two phases, as shown in Fig. 1: the upload phase, where the user preprocesses data and uploads them to the server for cloud storage, and the download phase, where the user retrieves the data from the server and recovers them to their original form locally.
Schematic of our proposal, which can be divided into the upload phase and the download phase. EC SERVER represents the erasure coding server.
Schematic of our proposal, which can be divided into the upload phase and the download phase. EC SERVER represents the erasure coding server.
A. Upload phase
The upload phase includes both client-side operations and server-side operations. The encryption of the data to be stored and the protection of the key are mainly completed on the client-side, while the server-side is mainly responsible for performing erasure coding on the encrypted ciphertext and storing it in different cloud storage nodes. The main process is shown in Fig. 2.
Operation flow during the upload phase. EC: erasure coding, D: data to be stored, ENC: encryption, MAC: media access control, QRNG: quantum random number generator, SS: Shamir secret sharing, QKD: quantum key distribution, Sym-Crypto: symmetric cryptography.
Operation flow during the upload phase. EC: erasure coding, D: data to be stored, ENC: encryption, MAC: media access control, QRNG: quantum random number generator, SS: Shamir secret sharing, QKD: quantum key distribution, Sym-Crypto: symmetric cryptography.
We first introduce the key’s operation at the client side. Two strings of quantum random bit sequences, K and R, are first obtained from a quantum random number generator. K is used as the encryption key, while R is used as the authentication key as well as the key encryption key to K. KR is generated by XORing K and R. Utilizing specific rules, the key R is processed to generate a position n1. The user adopts the Shamir secret sharing (SS) method to protect the security of the key R. The key R is split according to Shamir’s secret sharing scheme, aiming to reduce the risk of a locally stored key being stolen through attack. Let us assume that the parameter selection used is SS (3, 4), which means that the threshold k = 3 and the total number of shares n = 4. According to the SS scheme, key R is split into secret shares R1, R2, R3, and R4. Three or more shares are required to reconstruct the key R, while any number of shares below three is insufficient for key R reconstruction. The user securely stores a designated number of shares locally to prevent attackers from acquiring a sufficient number of shares to reconstruct key R. For example, the user keeps two secret shares R1 and R2, while R3 and R4 are saved externally. The user XORs secret shares R1 and R2 to generate R12 and then generates a position n2 from R12 using specific rules. R4 and n2 are combined to form data R′4, where R′4 = (R4 ‖ n2). M14 is generated by performing integrity verification on R′4 using R1, and M24 is generated by performing integrity verification on R′4 using R2. M14, M24, and R′4 are combined to form data B, where B = (R′4 ‖ M14 ‖ M24).
For the data operation at the client side, The ciphertext E is generated by symmetrically encrypting the plaintext D using the key K. These symmetric algorithms are widely regarded as quantum-secure. Subsequently, KR is inserted into ciphertext E at position n1 to form the modified ciphertext E′. R3 is inserted into ciphertext E′ at position n2 to form data E″. The user performs integrity verification on data E″ using key R, calculates a value M, and adds it to the end of data E″. The data N are formed as N = (E″ ‖ M). The user sends data N and data B to the erasure coding server. Data N are processed by the server and stored in multiple cloud storages, while data B are locally backed up by the erasure coding server. When sending data N and data B to the erasure coding server, the user uses encrypted transmission and the QKD network provides the key to ensure quantum security during transmission. In the current stage of large-scale QKD networks, trusted classical repeaters are essential components. For proof-of-principle experiments, user data can be transmitted to cloud storage nodes via point-to-point QKD links without the need for repeaters. The user deletes K, R, R3, R4, n1, and n2 and only keeps R1 and R2. The security of secret shares R1 and R2 must be ensured, and at least one of R1 and R2 must not be lost.
At the server side, the erasure coding server receives data N and data B sent by the user. The backup data B are locally stored by the erasure coding server. Data N are divided into blocks according to the erasure coding scheme. Assume that the erasure coding parameter selection is EC (3, 5), which means the threshold is 3 and there are 5 blocks in total. Data N will be divided into Di, i = 1, 2, 3, where N = (D1 ‖ D2 ‖ D3). The erasure coding server applies the erasure coding matrix to the data blocks Di, i = 1, 2, 3 and generates redundant blocks Cj, j = 1, 2. Finally, the five data blocks are randomly sent to the distributed cloud storage server cluster for dispersed storage. It is better to store each data block in different storage nodes. Similarly, during the data exchange process between the erasure coding server and the cloud storage nodes, QKD enables effective isolation among different storage nodes, besides enhancing security through encryption.
B. Download phase
The download phase also includes server-side operations and client-side operations. The server-side mainly retrieves data blocks from the cloud storage nodes and reconstructs them through erasure coding operations. The decryption of data into plaintext is performed on the client-side. The main process is shown in Fig. 3.
Operation flow during the download phase. EC: erasure coding, D: data to be stored, DEC: decryption, MAC: media access control, QRNG: quantum random number generator, SS: Shamir secret sharing, QKD: quantum key distribution, Sym-Crypto: symmetric cryptography.
Operation flow during the download phase. EC: erasure coding, D: data to be stored, DEC: decryption, MAC: media access control, QRNG: quantum random number generator, SS: Shamir secret sharing, QKD: quantum key distribution, Sym-Crypto: symmetric cryptography.
On the server side, after receiving a user request, the erasure coding server randomly applies for data blocks from each cloud storage node, with a minimum number of blocks equal to the threshold. After receiving enough data blocks, the corresponding inverse matrix is used to recover the data blocks Di, i = 1, 2, 3, which are then merged into data N, where N = (D1 ‖ D2 ‖ D3). The data N are then encrypted using the QKD network and sent back to the client. If the user also requests data B, the erasure coding server sends the locally saved data B back to the user. At the client side, the received data N are parsed into ciphertext E″ and the verification value M, where N = (E″ ‖ M). The user uses the secret shares R1 and R2 remaining in hand to generate R12 by XORing them together and generates the position n2 using R12. The user extracts secret share R3 from ciphertext E″ at position n2 and reconstructs the original secret R using Shamir (3, 4) secret sharing scheme (k = 3, n = 4) with R1, R2, and R3. The user performs integrity verification on ciphertext E″ using key R, calculates a value M′, and compares it with data M. If they are the same, the next step is executed. If not, the operation is terminated and an error is reported. The user generates the position n1 using R, extracts KR from ciphertext E″ at position n1, and XORs KR with R to restore key K. Ciphertext E″ is then filtered to remove data R3 starting from position n2, forming ciphertext E′. Ciphertext E″ is filtered again to remove data KR starting from position n1, forming ciphertext E. The user uses key K to decrypt ciphertext E and recovers plaintext data D in order.
Our solution exhibits excellent fault tolerance, ensuring data availability even in the case of partial damage to data blocks stored in the cloud through erasure coding. Furthermore, our scheme embraces fault tolerance for local key preservation, acknowledging the possibility of key fragment loss or corruption by users. In our approach, users are required to securely store a certain number of key fragments (R1 and R2). Despite potential loss or damage of one fragment (either R1 or R2), our system remains capable of data recovery and decryption, preserving data accessibility. A detailed procedural description can be found in the Appendix.
III. EXPERIMENT TEST AND RESULTS
We built a minimal test system to verify the functionality and performance of our proposal. In our system, both erasure coding processing and distributed cloud storage ran normally. If the number of selected pieces is above the threshold, the original data could be reconstructed, while selecting fewer pieces resulted in a system error and no valid output. Instead of real-time quantum random number generation, we pre-loaded random numbers to the testing system, which will not affect our testing results for the protocol performance. The commercial quantum random number generator (QRNG) devices27,28 can easily meet the requirements of our system in terms of bit rate for practical applications. We have tested both the processing time and storage space, which the users may be concerned about. The results are depicted in Fig. 4.
Experimental results. (a) The measured processing time of EC (3, 5) under different input data sizes. (b) The measured processing time for the fixed 100 MB data file under different selections of erasure coding parameters. (c) The measured storage space occupancy for storing files of various data sizes using different erasure coding parameters.
Experimental results. (a) The measured processing time of EC (3, 5) under different input data sizes. (b) The measured processing time for the fixed 100 MB data file under different selections of erasure coding parameters. (c) The measured storage space occupancy for storing files of various data sizes using different erasure coding parameters.
The total processing time includes transmission time and erasure coding processing time. The transmission time depends on the user’s network bandwidth, while the erasure coding processing time mainly depends on the system performance. The processing time of erasure coding can be further divided into fragmentation delay and reconstruction delay. We tested the processing time of EC (3, 5) for files of different sizes, the results of which are shown in part (a) of Fig. 4. As the file size increases, the processing time increases linearly. The faster speed during reconstruction is evidently matched with the smaller scale of data being processed. Transferring a 100 MB file over a network with a typical user bandwidth of 100 Mbps takes about 8 s (much longer considering actual network conditions), so the additional time for erasure coding processing (a few seconds) is acceptable.
Our proposal does not limit the selection of erasure coding parameters. In our testing, we measured the processing time of erasure coding under these different parameter conditions, selecting parameters of EC (2, 3), EC (3, 5), and EC (4, 6) based on the system configuration. The results are shown in part (b) of Fig. 4. Processing a data file of 100 MB with EC (2, 3), EC (3, 5), and EC (4, 6) takes about 3 s for fragmentation and about 2 s for reconstruction. The choice of these erasure coding parameters does not significantly affect the processing time. However, if the slicing parameters for erasure coding further increased, the processing time is expected to be longer.
The storage space required for erasure coding storage is also measured, as shown in part (c) of Fig. 4. Different data sizes, ranging from 4 to 100 MB, with erasure coding parameters of EC (2, 3), EC (3, 5), and EC (4, 6), are tested. The results indicate that the storage space required for our scheme will not exceed twice the size of the original file, regardless of the erasure coding parameters used. Our scheme is a very space-saving solution. It is worth noting that the overall space requirement is mainly dominated by the EC method used for storing the ciphertext, despite utilizing the secret sharing method for storing the keys due to their significantly smaller size.
IV. DISCUSSION
In Sec. III, the test results indicate that our solution is capable of achieving data storage or recovery within a reasonable time frame and significantly reducing storage space requirements, which aligns with the theoretical expectations. It is worth noting that our scheme uses the key generated by QKD to protect the data transmission link to the cloud storage server, achieving secure isolation between data blocks and forming a complete isolation with the dispersed storage in the cloud. This isolation helps prevent attackers from maliciously collecting multiple pieces of data, further reducing the risk of information leakage. Quantum communication networks based on QKD have been established and gradually moved to large-scale application. Quantum metropolitan networks offer secure key services within a radius of 45 km, covering cities.20,21 They achieve a secure key rate of up to 49.5 kbps. Quantum intercity networks connect cities separated by thousands of kilometers,19 even including those in different countries,29 and enable cross-continental quantum communication through the relay of quantum satellites.30 The secure key rate of a quantum intercity network reaches 80 kbps. The quantum encryption device supports an encryption bandwidth of up to 12 Gbps.31 Assuming AES256 is used for encryption and the key updates 100 times every second,31 which is much higher than the classical case, our solution can support a transmission bandwidth of 37.5 Gbps, effectively meeting the current demands of cloud storage. Therefore, our proposal is very suitable for implementation on established quantum communication networks.
In the following, we will elucidate the theoretical merits of our proposal by undertaking a comparative analysis of three distinct cloud storage schemes: the commonly-used storage mirroring (SM) scheme32 the Shamir secret-sharing based (SS) scheme,25 and our erasure code-based (EC) scheme.
From the data confidentiality perspective. Assuming the source data are encrypted and the encryption algorithm is secure and cannot be cracked by methods such as quantum computing, the security of the three schemes is the same. If the data source is not encrypted, there are differences in the confidentiality of the three schemes. Let the probability of data leakage at the single storage node be p (p ≪ 1). In the SM scheme, the risk of complete data leakage is p multiplied by the number of copies, which means the risk will increase at least 2p. For the SS scheme and the EC schemes, the risk of complete data leakage is, where k is the threshold and n is the total number of data blocks. In the case of a small p, SS and EC schemes have an advantage. Considering that each data block in the SS scheme has information-theoretic security that does not leak partial information and that the EC scheme without source encryption will leak partial information for each data block, the range from the data confidentiality perspective is SS scheme > EC scheme > SM scheme.
From the data availability perspective. Let the probability of damage to single storage point be q (q ≪ 1). In the SM scheme, the probability of data unavailability is q^(copy number). For the SS scheme and the EC scheme, the probability of data unavailability is, meaning the sum of all possible cases where more than n-k storage points are damaged. Therefore, the robustness of the SM scheme is stronger than that of SS and EC schemes. Hence, the range from the data availability perspective is SM scheme > EC scheme = SS scheme.
From the storage space requirement perspective. The total storage is equal to data size × copy number (copynumber ≥ 2) in the SM scheme. In the SS scheme, the total storage is equal to data size × share number (sharenumber ≥ 3). For the EC scheme, the total storage data size. As mentioned earlier, the space required for storing the keys is negligible. Therefore, the range from the data storage space requirement perspective is EC scheme < SM scheme < SS scheme.
The results of our comparative analysis are summarized in Table I as follows: The EC scheme we propose stands out as a well-balanced solution in terms of performance while ensuring optimal storage efficiency. This aspect holds significant importance in the big data era with exponential growth in data size.
The comparison of three different cloud storage schemes. SM: storage mirroring scheme, SS: Shamir secret-sharing based scheme, and EC: erasure code-based scheme.
. | Data confidentiality . | Data availability . | Storage efficiency . |
---|---|---|---|
SM | ★ | ★★★ | ★★ |
SS | ★★★ | ★★ | ★ |
EC | ★★ | ★★ | ★★★ |
. | Data confidentiality . | Data availability . | Storage efficiency . |
---|---|---|---|
SM | ★ | ★★★ | ★★ |
SS | ★★★ | ★★ | ★ |
EC | ★★ | ★★ | ★★★ |
V. CONCLUSION
In summary, we propose a quantum-secure fault-tolerant distributed cloud storage system. The user encrypts the data using symmetric cryptographic techniques and quantum random numbers as the key. The key is dispersed and saved using Shamir’s secret sharing mechanism. The ciphertext is distributed and stored on different cloud servers using erasure coding, with the transmission link secured and isolated with QKD. This solution offers the advantages of quantum security, fault tolerance, and saving storage space. We have established a minimal system to test and validate the feasibility of our proposal. The results demonstrate that our solution can successfully execute within an acceptable time for users but significantly reduce the storage space requirements for data. Our work takes a significant step toward the application of quantum technology in secure cloud storage.
Quantum secret sharing (QSS), as a convergence of quantum and cryptographic security technologies, is currently a prominent research area in quantum information technology.33 QSS schemes not only ensure the security of secret splitting but also guarantee the secure distribution of secrets. In scenarios necessitating multi-user key sharing, the integration of QSS may present an effective solution, which we consider for future research.
ACKNOWLEDGMENTS
We would like to thank Dr. Qiang Huang for his selfless support in both methodology and experimentation. We also thank Professor Xiongfeng Ma from Tsinghua University and Dr. Min-Han Li from CAS Quantum Network Co. Ltd. for helpful discussions. This work was supported by the Key R & D Plan of Shandong Province (Grant No. 2020CXGC010105) and the China Postdoctoral Science Foundation (Grant No. 2021M700315). J.C. acknowledges the support from the National Natural Science Foundation of China (Grant No. 12174216).
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
C.-L.M., D.-D.L., and Y.L. contributed equally to this paper.
Chun-Li Ma: Methodology (lead); Visualization (equal). Dong-Dong Li: Investigation (equal); Visualization (equal); Writing – Original draft preparation (lead); Funding acquisition (equal). Yalin Li: Investigation (lead); Methodology (equal); Data curation (supporting). Yinghao Wu: Software (equal). Song-Yan Ding: Investigation (equal). Jun Wang: Investigation (equal); Resources (equal). Pei-Yuan Li: Methodology (equal); Data curation (equal). Song Zhang: Data curation (equal). Junjie Chen: Investigation (equal); Writing – review & editing (equal); Funding acquisition (supporting). Xiaoxing Zhang: Investigation (equal); Resources (equal). Jia-Yong Wang: Investigation (equal); Resources (equal). Jin Li: Investigation (equal). Qiang Li: Software (equal); Methodology (supporting). Zhi-Tong Chen: Methodology (supporting). Lei Zhou: Resources (equal). Mei-Sheng Zhao: Supervision (equal); Investigation (equal); Writing – review & editing (equal). Yong Zhao: Supervision (lead); Conceptualization (lead); Resources (lead); Funding acquisition (lead); Investigation (equal); Writing – review & editing (equal).
DATA AVAILABILITY
The data that support the findings of this study are available from the corresponding author upon reasonable request.
APPENDIX: SPECIAL CASE FOR THE DOWNLOAD CASE
In this appendix, we present the data recovery and decryption procedure in the event of loss or damage of either R1 or R2. The user-side operational steps, as depicted in Fig. 5, are described below. Let us assume that R1 is lost without loss of generality. The user initiates a request to the EC server, seeking the retrieval of data N and locally stored backup data B. Received data B are then parsed into three components: R′4, M14, and M24, represented as B = (R′4 ‖M14 ‖M24). To ensure data integrity, the user validates R′4 using the remaining secret share R2. If M24 does not match the computed value M24′, the operation is terminated for failure. The received data N are deconstructed into encrypted data E″ and its MAC value M. R′4 is deconstructed into R4 and position n2, which can be used to extract R3 from the ciphertext E″. Using the Shamir (3, 4) scheme with R2, R3, and R4, it is possible to reconstruct the correct key R. Data integrity of E″ is further verified using the key R. If the computed value M′ dose not match M, the operation is terminated for failure. Utilizing specific rules, the key R is processed to generate position n1. Then the key data KR are extracted from the ciphertext E′ at position n1. The key K is reconstructed by performing an XOR operation between KR and R. The ciphertext E is formed by sequentially removing the data R3 at position n2 and removing the data KR at position n1 from E″. The user successfully decrypts the ciphertext E with K, recovering the original plaintext data D.
Operation flow during the download phase in R1 lost or damaged situation.