AI-Powered Medical Scribe System: Real-Time Clinical Documentation Using Large Language Models and Automatic Speech Recognition
AI-Powered Medical Scribe System: Real-Time Clinical Documentation Using Large Language Models and Automatic Speech Recognition
Authors:
Mrs. Syamala Kumari M.
Yadla Murali Krishna, Barla Bhuvanesh Kiran, Majji Sai Nikhil, Tammina Deekshit Kumar.
Department of Information Engineering and Computational Technology, MVGR College of Engineering (A), Vizianagaram, Andhra Pradesh, India
Abstract — Clinical documentation is one of the most persistent bottlenecks in modern healthcare. Physicians spend an estimated 4.5 hours per day on electronic health record documentation, reducing time available for direct patient care. This work presents the AI Medical Scribe System, a full-stack application that automates the generation of structured clinical notes from real-time doctor- patient conversations.
The system captures audio through a browser-based interface and streams binary audio chunks over WebSocket to a FastAPI backend. The Whisper Large v3 model, accessed through the Groq inference API, performs real-time speech-to-text transcription. The resulting transcript is then processed by the LLaMA 3.1 8B Instant large language model, which extracts clinically relevant information and generates structured SOAP (Subjective, Objective, Assessment, Plan) notes in JSON format.
The application is built on a modern full-stack architecture: Next.js 16 for the frontend, FastAPI with Uvicorn for the backend, and MongoDB Atlas for cloud-based persistent storage. Additional features include patient record management, searchable session history, editable SOAP editor, custom clinical section support, clinic profile customization with branding, and professional PDF report generation using jsPDF. JWT-based authentication with bcrypt password hashing ensures secure multi-user access.
End-to-end testing across 33 structured test cases demonstrated a 100% pass rate across all seven functional modules. The system achieves its objective of reducing documentation time while maintaining clinical accuracy. Its cloud-only AI inference architecture eliminates the need for local GPU hardware, making it practically deployable in small clinic and individual practitioner settings without infrastructure investment.
Keywords: Automatic Speech Recognition, Clinical Documentation, FastAPI, Large Language Model, Medical NLP, Real-Time Transcription, SOAP Notes, WebSocket.