Agentic LLM Pipeline for Natural Language to SQL
Agentic LLM Pipeline for Natural Language to SQL
Authors:
Jaya Prakash S K1
1 PES University, Hosur Rd, Konappana Agrahara, Banglore, 560100, Karnataka, India
jayaprakash.s.krishnappa@gmail.com
https://github.com/JayaprakashSKJay/agentic-nl2sql-pipeline.git
Abstract. The Natural Language to SQL (NL2SQL) task has improved incredibly fast due to the emergence of large language models, but it is not easy to deploy it reliably on real relational databases. Direct one-shot generating systems are usually unable to execute complex joins, nested clauses, and schema grounding, and often give very little information as to the reasons a query was generated. In this research, I have concentrated on agentic NL2SQL pipeline which approaches SQL generation not as a one-step text generation process, but as a sequence of reasoning steps to decide.The proposed system integrates multilingual input processing, voice-to-text assistance, schema-aware search and structured query reformulation. It also encompasses guarded SQL generation, validation, iterative refinement and optional adaptive learning through the successful executions. This distribution is useful in real-time relational database querying and benchmark testing on the Spider data. It has been demonstrated experimentally that the better workflow is an improvement in both accuracy of execution, its robustness and interpretability compared to a single-pass pipeline, especially on more challenging queries and schemas with non-trivial join structure. Besides the benchmark analysis, the system has been tested on a live HR-style relational database on realistic join, aggregation, ranking and filter-based questions. The retrieval-based grounding, stage reasoning and validation-conscious correction are the reliable way for natural-language interfaces to relational databases
Keywords: NL2SQL, large language models, schema retrieval, query decomposition, iterative refinement, multilingual querying, voice interface, Spider benchmark.