Sentinel – Scene Analysis using DETR Transform Model
- Version
- Download 11
- File Size 424.32 KB
- File Count 1
- Create Date 19 March 2026
- Last Updated 19 March 2026
Sentinel – Scene Analysis using DETR Transform Model
G. Pranavi
Assistant Professor
Dept. of Computer Science and Engineering Jyothishmathi
Institute of Technology and Science (JNTUH)
Karimnagar, Telangana, India
gunda.pranavi@gmail.com
B. Harish
UG Student
Dept. of Computer Science and Engineering Jyothishmathi
Institute of Technology and Science (JNTUH)
Karimnagar, Telangana, India
bhukyaharish903@gmail.com
V. Saketh
UG Student
Dept. of Computer Science and Engineering Jyothishmathi
Institute of Technology and Science (JNTUH)
Karimnagar, Telangana, India
sakethvasam8@gmail.com
Mandal Nikhitha
UG Student
Dept. of Computer Science and Engineering Jyothishmathi Institute
of Technology and Science (JNTUH)
Karimnagar, Telangana, India
226684nikitha@gmail.com
B. Keerthana
UG Student
Dept. of Computer Science and Engineering Jyothishmathi Institute
of Technology and Science (JNTUH)
Karimnagar, Telangana, India
keerthanaboga4@gmail.com
Abstract—This project presents a deep learning–based sys- tem for real-time object detection in dynamic environments, developed usingthe DETR (Detection Transformer) model and implemented with Streamlit as the frontend interface. The system accepts input from both image files and live camera streams, enabling accurate object etection in static images as well as continuous video processing. The architecture integrates ResNet- 50 as the backbone network for feature extraction, providing robust visual feature representations prior to transformer-based analysis. In the proposed framework, input images or video frames are first processed through ResNet-50 to extract high-level feature maps. These feature representations are enhanced with positional encoding and passed into the transformer encoder–decoder structure of DETR. The encoder captures global contextual dependencies using multihead self-attention mechanisms, while the decoder predicts a fixed set of object queries corresponding to class labels and bounding box coordinates. Unlike conventional object detection approaches that rely on anchor boxes and non-maximum suppression, the proposed deep learning model formulates detection as a set prediction problem and employs bipartite matching loss to ensure accurate one-to-one object correspondence. The system is deployed with a Streamlit-based frontend to provide an interactive and user-friendly interface for uploading images, processing video streams, and visualizing detection results in real time. Experimental evaluation demonstrates that the integration of deep convolutional feature extraction, transformer- based globalreasoning, and an interactive frontend provides an efficient, scalable, and practical solution for real-time environ- mental object detection applications.
Index Terms—Deep Learning, DETR, ResNet-50, Real-TimeObject Detection, Video Processing, Streamlit, Transformer.
Download