A Modular Architecture for Scalable Multilingual Natural Language Processing

Notification

Announcement!

ISJEM Invites papers for various areas like engineering, Management, Science & other multi discplinary subjects. Please submit your paper for review.

ISJEM assigns a digital object identifier (DOI) to each published paper, making it easier for the paper to be cited in various major databases like Google Scholar, ResearchGate, Academia.edu, etc…

ISJEM takes 24–48 hours to publish a research paper. Within 24 hours, the submitted paper will be reviewed and notified of its status, and it will be published once the processing fee is successfully received.

A Modular Architecture for Scalable Multilingual Natural Language Processing

Version
Download 55
File Size 512.51 KB
File Count 1
Create Date 20 November 2025
Last Updated 20 November 2025

A Modular Architecture for Scalable Multilingual Natural Language Processing

1st Ravi Deo
Dept. of Computer Science and Engg.

Sharda University, India

ravideo9021@gmail.com

2nd Yash Bhadauria
Dept. of Computer Science and Engg.

Sharda University, India

bhadauriayash056@gmail.com

3rd Prof(Dr) V Sathyasuntharam
Dept. of Computer Science and Engg.

Sharda University, India

sathiya4196@gmail.com

Abstract—The exponential growth of global digital content has precipitated an urgent demand for Natural Language Processing (NLP) systems capable of operating seamlessly across a multi- tude of languages. However, the prevailing paradigm in NLP development remains predominantly monolingual, necessitating the construction of disparate, resource-intensive, and often in- consistent pipelines for individual languages. This fragmented approach is inherently inefficient, economically burdensome, and imposes severe scalability constraints, thereby exacerbating the “digital language divide.” This paper proposes a novel, unified Modular Multilingual NLP (MM-NLP) architecture designed to streamline cross-lingual analysis through a cohesive, high- throughput workflow. The proposed system integrates a hier- archical automatic language detection mechanism powered by fastText, a dynamic routing layer for language-specific tokeniza- tion, and a massive shared multilingual transformer backbone (XLM-RoBERTa) utilizing cross-lingual transfer learning. By establishing a unified high-dimensional vector space for textual representations, the pipeline enables task-specific heads—such as sentiment analysis and named entity recognition—to be applied agnostically across diverse linguistic inputs. We present comprehensive experimental results demonstrating that our zero- shot transfer approach achieves 88% of the performance of fully supervised monolingual models while reducing computational overhead by 75% and deployment complexity by an order of magnitude. Furthermore, we introduce a standardized fairness evaluation module utilizing scikit-learn metrics to detect and mitigate cross-lingual performance disparities.

Index Terms—Natural Language Processing, Multilingualism, Modular Architecture, Transformers, Zero-Shot Learning, Cross- lingual Transfer, Scalability, Computational Sustainability.

Download