Autonomous Navigation AGV Using Vision-Language Models for Natural Language Guided Indoor Navigation

Notification

Announcement!

ISJEM Invites papers for various areas like engineering, Management, Science & other multi discplinary subjects. Please submit your paper for review.

ISJEM assigns a digital object identifier (DOI) to each published paper, making it easier for the paper to be cited in various major databases like Google Scholar, ResearchGate, Academia.edu, etc…

ISJEM takes 24–48 hours to publish a research paper. Within 24 hours, the submitted paper will be reviewed and notified of its status, and it will be published once the processing fee is successfully received.

Autonomous Navigation AGV Using Vision-Language Models for Natural Language Guided Indoor Navigation

Version

File Size 563.46 KB

Downloads 0

Files 1

Published 16 June 2026

Updated 16 June 2026

Autonomous Navigation AGV Using Vision-Language Models for Natural Language Guided Indoor Navigation

Yashwanth Aradhya Dr. Vishwanath Koti

Department of Robotics and Artificial Intelligence Department of Robotics and Artificial Intelligence

M S Ramaiah Institute of Technology, M S Ramaiah Institute of Technology,
Bangalore – 560054, India Bangalore – 560054, India

Yash.aradhya140@gmail.com vkoti675@msrit.edu

Abstract:

Autonomous Guided Vehicles (AGVs) have become increasingly important in warehouse automation, industrial logistics, healthcare transportation, and smart manufacturing systems. Traditional AGV navigation techniques primarily rely on predefined routes, magnetic strips, guide wires, QR-code markers, and coordinate-based path planning methods. Although these approaches provide reliable navigation in structured environments, they often lack the flexibility required for dynamic indoor environments where obstacles and layouts continuously change. Furthermore, conventional AGV systems provide limited capability for understanding human instructions expressed in natural language.

Recent advancements in Artificial Intelligence, Computer Vision, Natural Language Processing, and Vision-Language Models have enabled the development of intelligent robotic systems capable of understanding semantic instructions and performing autonomous decision-making. Vision-Language Models establish relationships between visual observations and textual instructions, allowing robots to identify navigation targets and interpret human commands in a more natural manner.

This paper presents an Autonomous Navigation AGV integrated with Vision-Language Models for natural language-guided indoor navigation. The proposed system utilizes a Raspberry Pi 5 embedded platform integrated with a camera module, ultrasonic sensors, motor driver circuitry, and DC gear motors. Natural Language Processing techniques are employed to interpret navigation commands, while the Vision-Language Model combines visual perception and language understanding to identify target objects and destinations within the environment. Obstacle detection and avoidance are achieved through ultrasonic sensing to ensure safe navigation.

Experimental evaluation was conducted under multiple indoor navigation scenarios involving command interpretation, target identification, obstacle avoidance, and autonomous movement. Performance analysis demonstrated high command recognition accuracy, reliable obstacle avoidance capability, and effective navigation performance. The proposed system provides a low-cost and scalable solution for intelligent indoor navigation and contributes toward the development of next-generation AI-enabled robotic transportation systems.

Keywords: Autonomous Guided Vehicle, Vision-Language Model, Natural Language Processing, Computer Vision, Autonomous Navigation, Raspberry Pi 5, Obstacle Avoidance, Artificial Intelligence, Human-Robot Interaction.