Data Deduplication is defined as an elimination process of redundant duplicated data; this process is stratified by using a unique value that been represented by a chunk of data, which is referenced by the original file that contains this chunk. Data Deduplication techniques have been mainly applied in the cloud-based systems in order to decrease the space size of the data storage, and enhance the connection bandwidth. In this paper, we have introduced a data deduplication optimization technique that been applied to the data storage of cloud-based systems. The proposed technique optimizes data deduplication by implementing Source-Based and In-Line based techniques. The Source-Based method is stratified at the source that contains the data, on the other hand, the In-Line method is stratified at the RAM that contains the data momentarily before the writing process of the I/O. Moreover, the proposed technique applies a Content-Based chunking algorithm with Variable Chunking utilization by using Rabin Karp Rolling Hash (RKRH). RKRH is a data chunking algorithm that breaks data files into different segments sizes. Generally, the proposed technique process is based on calculating the hash value of data chunk which considered as a fingerprint. Afterward, the chunk availability process is applied in order to identify the existence of this chunk in the storage; therefore, if this chunk does not exist in the storage a reference to this chunk is added and store the hash value as a key in the storage. The proposed technique includes data chunk compression to reduce the data redundancy in the same chunk. Practically, the proposed technique provides a data deduplication ratio of 33 percent and an average upload latency of five seconds. Finally, the proposed approach utilized with any data file type as a byte stream.
Skip Nav Destination
,
Article navigation
27 October 2020
PROCEEDINGS OF THE 2020 2ND INTERNATIONAL CONFERENCE ON SUSTAINABLE MANUFACTURING, MATERIALS AND TECHNOLOGIES
17–18 July 2020
Coimbatore, India
Research Article|
October 27 2020
Cloud based industrial file handling and duplication removal using source based deduplication technique Available to Purchase
Samer O. Majed;
Samer O. Majed
a)
1,2
Department of Computer Science, College of Science, Al-Nahrain University
, Baghdad, Iraq
a)Corresponding Author: [email protected]
Search for other works by this author on:
Sawsan K. Thamer
Sawsan K. Thamer
b)
1,2
Department of Computer Science, College of Science, Al-Nahrain University
, Baghdad, Iraq
Search for other works by this author on:
Samer O. Majed
1,a)
Sawsan K. Thamer
1,b)
1,2
Department of Computer Science, College of Science, Al-Nahrain University
, Baghdad, Iraq
a)Corresponding Author: [email protected]
AIP Conf. Proc. 2292, 030012 (2020)
Citation
Samer O. Majed, Sawsan K. Thamer; Cloud based industrial file handling and duplication removal using source based deduplication technique. AIP Conf. Proc. 27 October 2020; 2292 (1): 030012. https://doi.org/10.1063/5.0030989
Download citation file:
Pay-Per-View Access
$40.00
Sign In
You could not be signed in. Please check your credentials and make sure you have an active account and try again.
Citing articles via
Effect of coupling agent type on the self-cleaning and anti-reflective behaviour of advance nanocoating for PV panels application
Taha Tareq Mohammed, Hadia Kadhim Judran, et al.
Design of a 100 MW solar power plant on wetland in Bangladesh
Apu Kowsar, Sumon Chandra Debnath, et al.
With synthetic data towards part recognition generalized beyond the training instances
Paul Koch, Marian Schlüter, et al.
Related Content
Comparative analysis of various deduplication approaches and a method thereof
AIP Conf. Proc. (October 2024)
Data storage security issues
AIP Conf. Proc. (August 2024)
Analysis of data storage technology in cloud computing environment
AIP Conf. Proc. (October 2024)
Cloud storage system for mobile cloud computing using blockchain
AIP Conf. Proc. (November 2023)
A secure cloud based data chunking and file encryption system
AIP Conf. Proc. (April 2025)