A Comparative Study of Database Indexing Techniques for Large Scale Applications
A Comparative Study of Database Indexing Techniques for Large Scale Applications
Authors:
V Siri¹, Harshachandra V²
¹Professor, Department of Computer Science and Engineering, St. Martin’s Engineering College, Hyderabad, India vemulasiricse@smec.ac.in
²Student, Department of Computer Science and Engineering, St. Martin’s Engineering College, Hyderabad, India harshachandra.v@gmail.com
Abstract
The amount of data in systems is growing quickly, making it hard to access the information we need. Many organizations have databases with billions of records. They need to find the best way to index their data so that their systems can respond quickly, handle many users, and use resources efficiently. This study examines common methods for indexing data in databases, such as B+ Trees, Hash Indexes, Log-Structured Merge Trees, Bitmap Indexes, Inverted Indexes, and a new approach called Learned Index structures. We looked at how these methods work, what they consist of, and how well they perform in various situations like e-commerce websites, healthcare systems, financial databases, and cloud environments. We considered how well they answer questions, how much data they can manage, how much space they occupy, how they perform when many users access them simultaneously, and how they handle different types of data. We reviewed 20 studies from 2021 to 2025. Our findings showed that different indexing methods vary in effectiveness based on data volume, user access patterns, and system configurations. We learned that there isn't a single best indexing method. Instead, combining methods that consider the workload and using machine learning to enhance the indexes seems to be the best approach for large applications. This review provides insights for those who design databases and systems to help them make informed decisions about indexing their data.
Keywords: Database indexing, B+ Tree, Hash index, LSM Tree, Learned index, Query optimization, Large-scale databases, Distributed systems, Query performance