Autonomous Navigation AGV Using Vision-Language Models for Natural Language Guided Indoor Navigation
Autonomous Navigation AGV Using Vision-Language Models for Natural Language Guided Indoor Navigation
Yashwanth Aradhya Dr. Vishwanath Koti
Department of Robotics and Artificial Intelligence Department of Robotics and Artificial Intelligence
M S Ramaiah Institute of Technology, M S Ramaiah Institute of Technology,
Bangalore – 560054, India Bangalore – 560054, India
Yash.aradhya140@gmail.com vkoti675@msrit.edu
Abstract:
Autonomous Guided Vehicles (AGVs) have become increasingly important in warehouse automation, industrial logistics, healthcare transportation, and smart manufacturing systems. Traditional AGV navigation techniques primarily rely on predefined routes, magnetic strips, guide wires, QR-code markers, and coordinate-based path planning methods. Although these approaches provide reliable navigation in structured environments, they often lack the flexibility required for dynamic indoor environments where obstacles and layouts continuously change. Furthermore, conventional AGV systems provide limited capability for understanding human instructions expressed in natural language.
Recent advancements in Artificial Intelligence, Computer Vision, Natural Language Processing, and Vision-Language Models have enabled the development of intelligent robotic systems capable of understanding semantic instructions and performing autonomous decision-making. Vision-Language Models establish relationships between visual observations and textual instructions, allowing robots to identify navigation targets and interpret human commands in a more natural manner.
This paper presents an Autonomous Navigation AGV integrated with Vision-Language Models for natural language-guided indoor navigation. The proposed system utilizes a Raspberry Pi 5 embedded platform integrated with a camera module, ultrasonic sensors, motor driver circuitry, and DC gear motors. Natural Language Processing techniques are employed to interpret navigation commands, while the Vision-Language Model combines visual perception and language understanding to identify target objects and destinations within the environment. Obstacle detection and avoidance are achieved through ultrasonic sensing to ensure safe navigation.
Experimental evaluation was conducted under multiple indoor navigation scenarios involving command interpretation, target identification, obstacle avoidance, and autonomous movement. Performance analysis demonstrated high command recognition accuracy, reliable obstacle avoidance capability, and effective navigation performance. The proposed system provides a low-cost and scalable solution for intelligent indoor navigation and contributes toward the development of next-generation AI-enabled robotic transportation systems.
Keywords: Autonomous Guided Vehicle, Vision-Language Model, Natural Language Processing, Computer Vision, Autonomous Navigation, Raspberry Pi 5, Obstacle Avoidance, Artificial Intelligence, Human-Robot Interaction.