Sentinel – Scene Analysis using DETR Transform Model

Notification

Announcement!

ISJEM Invites papers for various areas like engineering, Management, Science & other multi discplinary subjects. Please submit your paper for review.

ISJEM assigns a digital object identifier (DOI) to each published paper, making it easier for the paper to be cited in various major databases like Google Scholar, ResearchGate, Academia.edu, etc…

ISJEM takes 24–48 hours to publish a research paper. Within 24 hours, the submitted paper will be reviewed and notified of its status, and it will be published once the processing fee is successfully received.

Sentinel – Scene Analysis using DETR Transform Model

Version

File Size 424.32 KB

Downloads 71

Files 1

Published 19 March 2026

Updated 19 March 2026

Sentinel – Scene Analysis using DETR Transform Model

G. Pranavi
Assistant Professor
Dept. of Computer Science and Engineering Jyothishmathi
Institute of Technology and Science (JNTUH)
Karimnagar, Telangana, India
gunda.pranavi@gmail.com

B. Harish
UG Student
Dept. of Computer Science and Engineering Jyothishmathi
Institute of Technology and Science (JNTUH)
Karimnagar, Telangana, India
bhukyaharish903@gmail.com

V. Saketh
UG Student
Dept. of Computer Science and Engineering Jyothishmathi
Institute of Technology and Science (JNTUH)
Karimnagar, Telangana, India
sakethvasam8@gmail.com

Mandal Nikhitha
UG Student
Dept. of Computer Science and Engineering Jyothishmathi Institute
of Technology and Science (JNTUH)
Karimnagar, Telangana, India
226684nikitha@gmail.com

B. Keerthana
UG Student
Dept. of Computer Science and Engineering Jyothishmathi Institute
of Technology and Science (JNTUH)
Karimnagar, Telangana, India
keerthanaboga4@gmail.com

Abstract—This project presents a deep learning–based sys- tem for real-time object detection in dynamic environments, developed usingthe DETR (Detection Transformer) model and implemented with Streamlit as the frontend interface. The system accepts input from both image files and live camera streams, enabling accurate object etection in static images as well as continuous video processing. The architecture integrates ResNet- 50 as the backbone network for feature extraction, providing robust visual feature representations prior to transformer-based analysis. In the proposed framework, input images or video frames are first processed through ResNet-50 to extract high-level feature maps. These feature representations are enhanced with positional encoding and passed into the transformer encoder–decoder structure of DETR. The encoder captures global contextual dependencies using multihead self-attention mechanisms, while the decoder predicts a fixed set of object queries corresponding to class labels and bounding box coordinates. Unlike conventional object detection approaches that rely on anchor boxes and non-maximum suppression, the proposed deep learning model formulates detection as a set prediction problem and employs bipartite matching loss to ensure accurate one-to-one object correspondence. The system is deployed with a Streamlit-based frontend to provide an interactive and user-friendly interface for uploading images, processing video streams, and visualizing detection results in real time. Experimental evaluation demonstrates that the integration of deep convolutional feature extraction, transformer- based globalreasoning, and an interactive frontend provides an efficient, scalable, and practical solution for real-time environ- mental object detection applications.
Index Terms—Deep Learning, DETR, ResNet-50, Real-TimeObject Detection, Video Processing, Streamlit, Transformer.