Agentic LLM Pipeline for Natural Language to SQL

Notification

Announcement!

ISJEM Invites papers for various areas like engineering, Management, Science & other multi discplinary subjects. Please submit your paper for review.

ISJEM assigns a digital object identifier (DOI) to each published paper, making it easier for the paper to be cited in various major databases like Google Scholar, ResearchGate, Academia.edu, etc…

ISJEM takes 24–48 hours to publish a research paper. Within 24 hours, the submitted paper will be reviewed and notified of its status, and it will be published once the processing fee is successfully received.

Agentic LLM Pipeline for Natural Language to SQL

Version

File Size 656.83 KB

Downloads 1

Files 1

Published 26 May 2026

Updated 26 May 2026

Agentic LLM Pipeline for Natural Language to SQL

Authors:

Jaya Prakash S K¹

¹ PES University, Hosur Rd, Konappana Agrahara, Banglore, 560100, Karnataka, India
jayaprakash.s.krishnappa @gmail.com
https://github.com/JayaprakashSKJay/agentic-nl2sql-pipeline.git

Abstract. The Natural Language to SQL (NL2SQL) task has improved incredibly fast due to the emergence of large language models, but it is not easy to deploy it reliably on real relational databases. Direct one-shot generating systems are usually unable to execute complex joins, nested clauses, and schema grounding, and often give very little information as to the reasons a query was generated. In this research, I have concentrated on agentic NL2SQL pipeline which approaches SQL generation not as a one-step text generation process, but as a sequence of reasoning steps to decide.The proposed system integrates multilingual input processing, voice-to-text assistance, schema-aware search and structured query reformulation. It also encompasses guarded SQL generation, validation, iterative refinement and optional adaptive learning through the successful executions. This distribution is useful in real-time relational database querying and benchmark testing on the Spider data. It has been demonstrated experimentally that the better workflow is an improvement in both accuracy of execution, its robustness and interpretability compared to a single-pass pipeline, especially on more challenging queries and schemas with non-trivial join structure. Besides the benchmark analysis, the system has been tested on a live HR-style relational database on realistic join, aggregation, ranking and filter-based questions. The retrieval-based grounding, stage reasoning and validation-conscious correction are the reliable way for natural-language interfaces to relational databases

Keywords: NL2SQL, large language models, schema retrieval, query decomposition, iterative refinement, multilingual querying, voice interface, Spider benchmark.