Various systems and applications involve a large volume of duplicate items. Based on high data redundancy in real world datasets, data deduplication can reduce storage capacity and improve the utilization of network bandwidth. However, chunks of existing deduplications range in size from 4KB to over 16KB, existing systems are not applicable to the datasets consisting of short records. In this paper, we propose a new framework called SF-Dedup which is able to implement the deduplication process on a large set of Mobile Internet records, the size of records can be smaller than 100B, or even smaller than 10B. SF-Dedup is a short fingerprint, in-line, hash-collisions-resolved deduplication. Results of experimental applications illustrate that SH-Dedup is able to reduce storage capacity and shorten query time on relational database.
Skip Nav Destination
Article navigation
5 June 2017
APPLIED MATHEMATICS AND COMPUTER SCIENCE: Proceedings of the 1st International Conference on Applied Mathematics and Computer Science
27–29 January 2017
Rome, Italy
Research Article|
June 05 2017
Content-level deduplication on mobile internet datasets
Ziyu Hou;
Ziyu Hou
a)
1School of Computer Science and Engineering,
Beihang University
, Beijing, China
, 100191
Search for other works by this author on:
Xunxun Chen;
Xunxun Chen
2National Computer Network Emergency Response Technical Team/
Coordination Center of China
, Beingjing, China
, 100029
Search for other works by this author on:
Yang Wang
Yang Wang
2National Computer Network Emergency Response Technical Team/
Coordination Center of China
, Beingjing, China
, 100029
Search for other works by this author on:
a)
Corresponding author: [email protected]
AIP Conf. Proc. 1836, 020086 (2017)
Citation
Ziyu Hou, Xunxun Chen, Yang Wang; Content-level deduplication on mobile internet datasets. AIP Conf. Proc. 5 June 2017; 1836 (1): 020086. https://doi.org/10.1063/1.4982026
Download citation file:
Citing articles via
Inkjet- and flextrail-printing of silicon polymer-based inks for local passivating contacts
Zohreh Kiaee, Andreas Lösel, et al.
Effect of coupling agent type on the self-cleaning and anti-reflective behaviour of advance nanocoating for PV panels application
Taha Tareq Mohammed, Hadia Kadhim Judran, et al.
Students’ mathematical conceptual understanding: What happens to proficient students?
Dian Putri Novita Ningrum, Budi Usodo, et al.
Related Content
Proof of cipher text ownership based on convergence encryption
AIP Conference Proceedings (August 2017)
Efficient proof of ownership for cloud storage systems
AIP Conference Proceedings (August 2017)
Proof of storage for multiuser environment in cloud by using blockchain technology
AIP Conf. Proc. (October 2020)
Smart safety helmet for underground workers
AIP Conf. Proc. (February 2021)
Cloud based health observance system by using empirical and dependable SVM classification
AIP Conf. Proc. (August 2023)