International Scientific Journal of Engineering and Management

An International Scholarly || Multidisciplinary || Open Access || Indexing in all major Database & Metadata
The journal follows the UGC Guidelines and is evaluated for inclusion in the Web of Science
ISSN: 2583-6129

Impact Factor: 7.839

Data Duplication Detection and Removal System Using Machine Learning

  • Version
  • Download 9
  • File Size 371.72 KB
  • File Count 1
  • Create Date 28 January 2026
  • Last Updated 28 January 2026

Data Duplication Detection and Removal System Using Machine Learning

 

ANSH BALGOTRA

Department of Information technology, Maharaja Agrasen Institute of Technology, New Delhi, India anshbalgotra@gmail.com

 

 

Abstract— The problem of missing data is a critical issue in various domains, as it can lead to inaccurate analysis and flawed decision-making. Traditional methods for handling missing values have been replaced by machine learning techniques, which offer more efficient solutions. Research in this area has explored various approaches to data imputation, analyzing their strengths and limitations. A systematic literature review of studies from 2016 to 2021 identified key factors influencing the effectiveness of thesemethods, providing valuable insights for researchers and data analysts. In parallel, the rapid expansion of data storage and processing has led to challenges in managing large -scale information, particularly in deduplication. Duplicate data, originating from multiple sources, complicates storage efficiency and retrieval accuracy. Cloud service providers have adopted data deduplication techniques to optimize storage costs and bandwidth usage. However, the conflict between encryption for security and deduplication efficiency presents a challenge. To address this, hybrid chunking methods, such as the Two Threshold Two Divisor (TTTD) and Dynamic Prime Coding (DPC) algorithm, have been proposed. These techniques improve deduplication performance while balancing security requirements. Furthermore, entity resolution plays a crucial role in information integration, aiming to consolidate and organize data from diverse sources. Deduplication, as a key step in this process, enhances data quality by identifying and eliminating redundant records. Research in this domain spans machine learning, data mining, and information retrieval, focusing on both supervised and unsupervised approaches. By analyzing various methodologies, researchers can refine existing techniques to improve accuracy, processing speed, and computational efficiency. Overall, advancements in machine learning, deduplication, and entity resolution contribu te to more effective data management, addressing challenges in missing data imputation, secure deduplication, and large-scale information integration.

 

Keywords— Missing Data, Data Quality, Machine Learning, Processing Speed, Computational Efficiency, Structured Data, Unstructured Data, Database Management, Encryption, Accuracy, Performance


Download

Author's Blog

What is the difference between a Research Paper and a Review Paper?

A research paper and a review paper are both scholarly documents, but they serve different purposes and have different characteristics....
Read More
Author's Blog

What is DOI?

A Digital Object Identifier (DOI) is a unique alphanumeric string that is used to identify and provide a persistent link...
Read More
Author's Blog

What do you need to do during production of your Research Paper?

During the production of a research paper, the following steps need to be taken: conducting research, organizing and analyzing data,...
Read More
Author's Blog

What are the advantages of publishing a research paper?

Publishing a research paper can have many advantages for researchers, including: Career advancement, professional recognition, opportunities for collaboration, increased visibility,...
Read More
Author's Blog

Ways to Support your Academic Wellbeing which preparing the Research Paper/Article

To support your academic wellbeing while publishing a research paper, it's important to set realistic goals, manage your time effectively,...
Read More
Author's Blog

How to improve your Research Paper writing Skills?

Read extensively: One of the best ways to improve your research paper skills is to read extensively in your field...
Read More
Author's Blog

Is DOI compulsory to publish a research paper in a Journal?

DOI is not strictly required to publish a research paper, but it is highly recommended. Basically, the International Scientific Journal...
Read More
Author's Blog

In what ways does research paper give weight to career development?

Publishing a research paper can give weight to a researcher's career development in several ways, such as: establishing oneself as...
Read More
Author's Blog

How to develop a Research Paper from Scratch

Developing a research paper involves several steps including: choosing a topic, conducting background research, formulating a research question or hypothesis,...
Read More
Author's Blog

How Plagiarism report plays crucial role in Research Paper Publication?

Plagiarism is a major concern in the academic and research community, as it undermines the integrity of the research and...
Read More