Enhanced Secure Deduplication of Textual Data in Cloud Environments With DEDUCT: Integrating Advanced NLP, Machine Learning, and Cryptographic Authentication
Enhanced Secure Deduplication of Textual Data in Cloud Environments With DEDUCT: Integrating Advanced NLP, Machine Learning, and Cryptographic Authentication
B. Yamini Priyanka1, A.S. Gagan Reddy2, D. Jeswanth3, K. Sasi Kumar4,N. Mohan5
1Assistant Professor, Dept of Information Technology, SV College of Engineering, Tirupathi, India.
2 B.Tech , Dept of Information Technology, SV College of Engineering, Tirupathi, India.
3B.Tech , Dept of Information Technology, SV College of Engineering, Tirupathi, India.
4B.Tech , Dept of Information Technology, SV College of Engineering, Tirupathi, India.
5B.Tech , Dept of Information Technology, SV College of Engineering, Tirupathi, India.
Email: 1yaminipriyanka.b@svce.edu.in, 2gaganreddyas@gmail.com,
3jashwanthroyal56@gmail.com , 4sasikumarkattamanchi@gmail.com ,
5mohanneerugatti9@gmail.com
Corresponding Author/Guide : B. Yamini Priyanka, Assistant Professor
Abstract-Secure deduplication of textual data incloud environments is a process designed to optimize storage efficiency by eliminating redundant copies oftext data while simultaneously ensuring dataconfidentiality and security. The approach involvesdividing large text files into smaller tokens or chunks,then identifying and removing duplicates either fully identical or near-identical segments before storing data on the cloud. DEDUCT represents a secure deduplication framework for textual data in cloud environments, combining client-side preprocessing with cloud-based pointer deduplication to reduce storage and communication overhead while preserving confidentiality. As the existing system,DEDUCT employs tokenization, transformation,CRC fingerprinting, and encryption to identify and securely store unique textual segments. Despiteeffective storage savings and security guarantees,challenges remain including client resource constraints,adaptiveadversarial threats, and optimization for near-duplicate detection. The proposed system enhances DEDUCT by integrating advanced NLP and machine learning techniques,cryptographic authentication structures, and energy efficient optimizations. This advancement promisesimproved deduplication accuracy, stronger data integrity, reduced client overhead, heightened security, and scalability, enabling efficient and secure textual data management across diverse cloud applications and resource-constrained devices Keywords- Secure deduplication, tokens or chunk, client-side preprocessing, DEDUCT, energy-efficient optimizations