AI-Powered Medical Scribe System: Real-Time Clinical Documentation Using Large Language Models and Automatic Speech Recognition

Notification

Announcement!

ISJEM Invites papers for various areas like engineering, Management, Science & other multi discplinary subjects. Please submit your paper for review.

ISJEM assigns a digital object identifier (DOI) to each published paper, making it easier for the paper to be cited in various major databases like Google Scholar, ResearchGate, Academia.edu, etc…

ISJEM takes 24–48 hours to publish a research paper. Within 24 hours, the submitted paper will be reviewed and notified of its status, and it will be published once the processing fee is successfully received.

AI-Powered Medical Scribe System: Real-Time Clinical Documentation Using Large Language Models and Automatic Speech Recognition

Version

File Size 476.31 KB

Downloads 1

Files 1

Published 12 April 2026

Updated 12 April 2026

AI-Powered Medical Scribe System: Real-Time Clinical Documentation Using Large Language Models and Automatic Speech Recognition

Authors:

Mrs. Syamala Kumari M.

Yadla Murali Krishna, Barla Bhuvanesh Kiran, Majji Sai Nikhil, Tammina Deekshit Kumar.

Department of Information Engineering and Computational Technology, MVGR College of Engineering (A), Vizianagaram, Andhra Pradesh, India

Abstract — Clinical documentation is one of the most persistent bottlenecks in modern healthcare. Physicians spend an estimated 4.5 hours per day on electronic health record documentation, reducing time available for direct patient care. This work presents the AI Medical Scribe System, a full-stack application that automates the generation of structured clinical notes from real-time doctor- patient conversations.

The system captures audio through a browser-based interface and streams binary audio chunks over WebSocket to a FastAPI backend. The Whisper Large v3 model, accessed through the Groq inference API, performs real-time speech-to-text transcription. The resulting transcript is then processed by the LLaMA 3.1 8B Instant large language model, which extracts clinically relevant information and generates structured SOAP (Subjective, Objective, Assessment, Plan) notes in JSON format.

The application is built on a modern full-stack architecture: Next.js 16 for the frontend, FastAPI with Uvicorn for the backend, and MongoDB Atlas for cloud-based persistent storage. Additional features include patient record management, searchable session history, editable SOAP editor, custom clinical section support, clinic profile customization with branding, and professional PDF report generation using jsPDF. JWT-based authentication with bcrypt password hashing ensures secure multi-user access.

End-to-end testing across 33 structured test cases demonstrated a 100% pass rate across all seven functional modules. The system achieves its objective of reducing documentation time while maintaining clinical accuracy. Its cloud-only AI inference architecture eliminates the need for local GPU hardware, making it practically deployable in small clinic and individual practitioner settings without infrastructure investment.

Keywords: Automatic Speech Recognition, Clinical Documentation, FastAPI, Large Language Model, Medical NLP, Real-Time Transcription, SOAP Notes, WebSocket.