Data Deduplication is defined as an elimination process of redundant duplicated data; this process is stratified by using a unique value that been represented by a chunk of data, which is referenced by the original file that contains this chunk. Data Deduplication techniques have been mainly applied in the cloud-based systems in order to decrease the space size of the data storage, and enhance the connection bandwidth. In this paper, we have introduced a data deduplication optimization technique that been applied to the data storage of cloud-based systems. The proposed technique optimizes data deduplication by implementing Source-Based and In-Line based techniques. The Source-Based method is stratified at the source that contains the data, on the other hand, the In-Line method is stratified at the RAM that contains the data momentarily before the writing process of the I/O. Moreover, the proposed technique applies a Content-Based chunking algorithm with Variable Chunking utilization by using Rabin Karp Rolling Hash (RKRH). RKRH is a data chunking algorithm that breaks data files into different segments sizes. Generally, the proposed technique process is based on calculating the hash value of data chunk which considered as a fingerprint. Afterward, the chunk availability process is applied in order to identify the existence of this chunk in the storage; therefore, if this chunk does not exist in the storage a reference to this chunk is added and store the hash value as a key in the storage. The proposed technique includes data chunk compression to reduce the data redundancy in the same chunk. Practically, the proposed technique provides a data deduplication ratio of 33 percent and an average upload latency of five seconds. Finally, the proposed approach utilized with any data file type as a byte stream.

1.
J.
Gantz
and
D.
Reinsel
, “
The digital universe in 2020: Big data, bigger digital shadows, biggest growth in the far east
,”
IDC iView: IDC Analyze the Future
,
2012
.
2.
D. T.
Meyer
and
W. J
Bolosky
, “
A study of practical deduplication
,”
ACM Trans. Storage
, vol.
7
, no.
4
, pp.
1
20
,
2012
, doi:.
3.
Opendedup. [online]. Available: http://opendedup.org/
2016
.
4.
P.
Kavitha
, "
A Survey on Lossless and Lossy Data Compression Methods
,"
2016
.
5.
K.
Eshghi
and
H. K.
Tang
, "
A framework for analyzing and improving content-based chunking algorithms
,"
Hewlett-Packard Labs Technical Report TR
, vol.
30
,
2005
.
6.
A.
Muthitacharoen
,
B.
Chen
and
D.
Mazieres
, "
A low-bandwidth network file system
," in
ACM SIGOPS Operating Systems Review
,
2001
.
7.
T. R.
Nisha
,
S.
Abirami
and
E.
Manohar
, "
Experimental Study on Chunking Algorithms of Data Deduplication System on Large Scale Data
," in
Proceedings of the International Conference on Soft Computing Systems: ICSCS 2015
, Volume
2
,
L. P.
Suresh
and
B. K.
Panigrahi
, Eds.,
New, Delhi: Springer India
, pp.
91
98
,
2016
.
8.
Y.
Fu
,
H.
Jiang
,
N.
Xiao
,
L.
Tian
and
F.
Liu
, "
AA-Dedupe: An application-aware source deduplication approach for cloud backup services in the personal computing environment," in Cluster Computing (CLUSTER
),
2011 IEEE International Conference on
,
2011
.
9.
K.
Srinivasan
,
T.
Bisson
,
G. R.
Goodson
and
K.
Voruganti
, "
iDedup: latency-aware, inline data deduplication for primary storage
.," in
FAST
,
2012
.
10.
D.
Lanterna
and
A.
Barili
, "
Forensic analysis of deduplicated file systems
,"
Digital Investigation
, vol.
20
, pp.
S99
--
S106
,
2017
.
11.
X.
Du
,
W.
Hu
,
Q.
Wang
and
F.
Wang
, "
ProSy: A similarity based inline deduplication system for primary storage
," in
Networking, Architecture and Storage (NAS), 2015 IEEE International Conference on
,
2015
.
12.
S.
Quinlan
and
S.
Dorward
, "
Venti: A New Approach to Archival Storage
.," in
FAST
,
2002
.
13.
Y.
Zhang
,
D.
Feng
,
H.
Jiang
,
W.
Xia
,
M.
Fu
,
F.
Huang
and
Y.
Zhou
, "
A fast asymmetric Extremum content defined Chunking algorithm for data Deduplication in backup storage systems
,"
IEEE Transactions on Computers
, vol.
66
, pp.
199
211
,
2017
.
14.
D.
Datta
,
S.
Mishra
, and
S. S.
Rajest
, “
Quantification of tolerance limits of engineering system using uncertainty modeling for sustainable energy
,”
International Journal of Intelligent Networks
, vol.
1
, pp.
1
8
,
2020
.
15.
E.
Kruus
,
C.
Ungureanu
and
C.
Dubnicki
, "
Bimodal Content Defined Chunking for Backup Streams
.," in
Fast
,
2010
.
16.
J.
Wei
,
J.
Zhu
and
Y.
Li
, "
Multi-modal Content Defined Chunking for Data Deduplication
,"
Usenix Fast'14 poster session
,
2014
.
17.
S.
Mkandawire
, "
Improving backup and restore performance for deduplication-based cloud backup services
,"
2012
.
18.
R.
Rajasekaran
,
F.
Rasool
,
S.
Srivastava
,
J.
Masih
, and
S. S.
Rajest
, “
Heat Maps for Human Group Activity in Academic Blocks
,”
EAI/Springer Innovations in Communication and Computing
, pp.
241
251
,
2020
.
19.
H. K.
Meena
, “
The Meaning and Methods of Drain of Wealth in Colonial India
,”
International Research Journal of Management, IT & Social Sciences
, vol.
3
, no.
3
, May
2016
.
20.
H.
Anandakumar
,
K.
Umamaheswari
, and
R.
Arulmurugan
, “
A Study on Mobile IPv6 Handover in Cognitive Radio Networks
,”
Lecture Notes on Data Engineering and Communications Technologies
, pp.
399
408
, Sep.
2018
.
21.
K.
Jayasudha
and
M. G.
Kabadi
, “
Soft tissues deformation and removal simulation modelling for virtual surgery
,”
International Journal of Intelligence and Sustainable Computing
, vol.
1
, no.
1
, p.
83
,
2020
.
22.
J.
Meman
, M.,
Solanki
,
A.
and
Parmar
,
A.
Design, Modeling and Analysis of Structural Strength of Cylinder and Cylinder Head of 4-stroke (10 H.P.) C.I. Engine - A Review
.
International Journal of Advanced Engineering, Management and Science
, vol.
2
, no.
4
, pp.
156
162
,
2016
.
23.
Alabdullah
,
T. T. Y.
,
Nor
,
M. I.
, and
E.
Ries
,
The Determination of Firm Performance in Emerging Nations: Do Board Size and Firm Size Matter
.
Management
Vol.
5
, no.
3
, pp.
57
66
,
2018
.
24.
Jinda
,
P.
, &
Toshniwal
,
S.
Effect of Cavity in Substrate on Multiband Circular Micro-strip Patch Antenna
.
International Journal Of Advanced Engineering Research And Science
, vol.
1
, no.
7
, pp.
81
85
,
2014
.
This content is only available via PDF.
You do not currently have access to this content.