Data Automation Pipeline: A Kernel-Centric Neuro-Symbolic Architecture for Autonomous Data Science
- Version
- Download 8
- File Size 829.26 KB
- File Count 1
- Create Date 21 March 2026
- Last Updated 21 March 2026
Data Automation Pipeline: A Kernel-Centric Neuro-Symbolic Architecture for Autonomous Data Science
Kajjam Hariprasad
Department of CSE Jyothishmathi Institute of
Technology and Science Karimnagar, Telangana,
India prasadkajjam5@gmail.com
Shivanath Hanumakonda
Department of CSE Jyothishmathi Institute of
Technology and Science Karimnagar, Telangana, India
222.6A7shivanath@gmail.com
Pravalika Daivala
Department of CSE Jyothishmathi Institute of
Technology and Science Karimnagar, Telangana, India
22.688pravalikadaivala@gmail.com
Vamshi Kadavergula
Department of CSE Jyothishmathi Institute of
Technology and Science Karimnagar, Telangana, India
22271a66b7vamshi@gmail.com
Abstract—The proliferation of Large Language Models (LLMs) has catalyzed a paradigm shift from static code completion to- ward autonomous agentic execution. Despite demonstrable proficiency in generating syntactically valid Python, con- temporary Co-Pilot systems remain fundamentally decou- pled from the runtime state of the environments they serve, producing generation errors that neither static analysis nor model scaling can eliminate. This paper introduces DataCur- sor, a hybrid neuro symbolic architecture that resolves this limitation through a Kernel Centric Architecture (KCA) in which a persistent, stateful Jupyter Kernel functions as the authoritative symbolic oracle for all neural reasoning steps. A Context Extraction Pipeline (CEP) continuously harvests live kernel state—variable bindings, DataFrame schemas, and cell output traces—and packages the results into a struc- tured context object that prefixes every LLM generation request. A Dual-Loop Control System (DLCS) pairs a de- terministic symbolic execution loop with an adaptive neural recovery loop; whenever execution raises an exception, the outer loop re-conditions the generator on the error trace and produces a revised artifact. External tool integration is governed by the Model Context Protocol (MCP), provid- ing process-isolated, hot-swappable satellite capabilities. A formal control-theoretic characterization of the DLCS is pre- sented alongside a design validation and architectural robust-ness analysis. These results position runtime state injection as a practically deployable and theoretically grounded foun- dation for thenext generation of autonomous data science tooling.Index Terms—Autonomous Data Science, Neuro-Symbolic AI, Large Language Models, Jupyter Kernel, Model Context Proto- col, Agentic Systems, Dual-Loop Control, Runtime Context In-jection.
Download