Regionalized StudentGPT for Acharjo: Curriculum-Aligned Multilingual AI Tutoring Using GPT-3.5 for Indian State Boards
- Version
- Download 4
- File Size 420.92 KB
- Download
Regionalized StudentGPT for Acharjo: Curriculum-Aligned Multilingual AI Tutoring Using GPT-3.5 for Indian State Boards
Kinshuk Dutta
Global Head of EBX PreSales (Sales Engineer) & Domain Architects (TIBCO Platform), Cloud Software Group. Formerly Global Head of EBX & TCMD PreSales, TIBCO.
Alinjar Guha
Founder & CEO, Acharjo (since Sep 2021); formerly Head of Training & Placement, EIILM-Kolkata, Jalpaiguri Campus.
Publication Date: December 2022
Ankit Anand,
Data Management Architect,
Koch Industries,
manaankit@gmail.com
Abstract
This paper introduces Regionalized StudentGPT for Acharjo, a multilingual, curriculum-aligned AI tutoring system designed for students under the West Bengal Board of Secondary Education (WBBSE) and Assam Higher Secondary Education Council (AHSEC). Building upon the foundational StudentGPT framework [1], which employed syllabus-driven fine-tuning of GPT-2, we extend the approach to GPT-3.5 for real-world deployment in regional Indian educational contexts. The system integrates bilingual corpora in English, Bengali, and Assamese, with curriculum-aware fine-tuning and retrieval-augmented generation (RAG) to maintain alignment with state-specific syllabi. Deployed via Acharjo's mobile-first platform, it addresses accessibility challenges in rural and semi-urban areas with low-bandwidth optimizations.
Key contributions include: (1) a syllabus-driven fine-tuning pipeline incorporating curriculum regularization on GPT-3.5; (2) a multilingual adaptation strategy leveraging IndicBERT and MarianMT for bilingual consistency; (3) seamless integration into Acharjo's mobile application; (4) a comprehensive evaluation framework assessing pedagogical accuracy, BLEU scores, perplexity, bilingual alignment, and user satisfaction; and (5) an ethical design framework compliant with IEEE 7000-2021, draft IEEE P7001/P7003/P7004 standards, India's National Education Policy (NEP) 2020, and the Draft Digital Personal Data Protection Bill (2022).
Empirical results, derived from experiments on 150K syllabus entries and 50K bilingual pairs, demonstrate a 22% improvement in pedagogical accuracy over GPT-2 baselines (p<0.01, t-test), with reduced perplexity (17.8 ± 1.2) and enhanced bilingual BERTScores (0.68 ± 0.03). Ablation studies confirm the efficacy of custom losses, showing 15-18% drops in performance without them. Theoretical analyses include derivations of loss functions and complexity bounds.
Keywords: StudentGPT, GPT-3.5, Indian education, regional NLP, syllabus-driven fine-tuning, NEP 2020, retrieval-augmented generation, ethical AI