AI YouTube Automation: A Seven-Stage End-to-End Pipeline for Autonomous Video Generation and Publishing with RAG-Enhanced Scripting

Notification

Announcement!

ISJEM Invites papers for various areas like engineering, Management, Science & other multi discplinary subjects. Please submit your paper for review.

ISJEM assigns a digital object identifier (DOI) to each published paper, making it easier for the paper to be cited in various major databases like Google Scholar, ResearchGate, Academia.edu, etc…

ISJEM takes 24–48 hours to publish a research paper. Within 24 hours, the submitted paper will be reviewed and notified of its status, and it will be published once the processing fee is successfully received.

AI YouTube Automation: A Seven-Stage End-to-End Pipeline for Autonomous Video Generation and Publishing with RAG-Enhanced Scripting

Version

File Size 378.24 KB

Downloads 3

Files 1

Published 8 April 2026

Updated 8 April 2026

AI YouTube Automation: A Seven-Stage End-to-End Pipeline for Autonomous Video Generation and Publishing with RAG-Enhanced Scripting

Authors:

Adit Jaywant Ghodke, Sahil Arun Sawant, Anurag Devprakash Singh, Aban Khalid Khan

Department of Computer Science and Engineering
Universal College of Engineering, Mumbai, Maharashtra, India
ghodkejaywantadit@gmail.com, sahilsawant736@gmail.com, anuragsingh7250@gmail.com, abankhanak44@gmail.com

March 2026

Abstract: Content creation for video platforms such as YouTube remains a resource-intensive process, demanding expertise across writing, audio production, video editing, and distribution. Existing automation tools address individual stages of this workflow in isolation, leaving creators to manually integrate outputs across tools. This paper presents AI YouTube Automation, a fully automated, seven-stage pipeline that transforms a single text prompt into a published YouTube video without human intervention at any intermediate stage. The system combines Retrieval-Augmented Generation (RAG) for factually grounded script generation using the Groq API (Llama 3.3 70B), Microsoft Edge-TTS for neural narration, per-segment stock footage retrieval from the Pexels API, background music mixing, AI thumbnail generation, and autonomous YouTube upload via the YouTube Data API v3. The pipeline was developed using the Vibe Coding methodology — a structured AI-assisted development approach — and comprises ten Python modules totalling 758 lines of core logic. Across 15 or more real-world production runs, the system achieved a mean end-to-end execution time of 258.9 seconds for a 90-second video, with successful YouTube uploads in every run. A key architectural contribution is a graceful degradation mechanism that maintains pipeline continuity despite partial failures in auxiliary services. The system represents a practical and reproducible approach to fully autonomous educational video production at scale.

Keywords: video automation, retrieval-augmented generation, large language models, text-to-speech synthesis, YouTube API, content generation pipeline, Vibe Coding, segment-matched video assembly, graceful degradation, Groq AI