A Modular Architecture for Scalable Multilingual Natural Language Processing
- Version
- Download 15
- File Size 512.51 KB
- File Count 1
- Create Date 20 November 2025
- Last Updated 20 November 2025
A Modular Architecture for Scalable Multilingual Natural Language Processing
1st Ravi Deo
Dept. of Computer Science and Engg.
Sharda University, India
ravideo9021@gmail.com
2nd Yash Bhadauria
Dept. of Computer Science and Engg.
Sharda University, India
bhadauriayash056@gmail.com
3rd Prof(Dr) V Sathyasuntharam
Dept. of Computer Science and Engg.
Sharda University, India
sathiya4196@gmail.com
Abstract—The exponential growth of global digital content has precipitated an urgent demand for Natural Language Processing (NLP) systems capable of operating seamlessly across a multi- tude of languages. However, the prevailing paradigm in NLP development remains predominantly monolingual, necessitating the construction of disparate, resource-intensive, and often in- consistent pipelines for individual languages. This fragmented approach is inherently inefficient, economically burdensome, and imposes severe scalability constraints, thereby exacerbating the “digital language divide.” This paper proposes a novel, unified Modular Multilingual NLP (MM-NLP) architecture designed to streamline cross-lingual analysis through a cohesive, high- throughput workflow. The proposed system integrates a hier- archical automatic language detection mechanism powered by fastText, a dynamic routing layer for language-specific tokeniza- tion, and a massive shared multilingual transformer backbone (XLM-RoBERTa) utilizing cross-lingual transfer learning. By establishing a unified high-dimensional vector space for textual representations, the pipeline enables task-specific heads—such as sentiment analysis and named entity recognition—to be applied agnostically across diverse linguistic inputs. We present comprehensive experimental results demonstrating that our zero- shot transfer approach achieves 88% of the performance of fully supervised monolingual models while reducing computational overhead by 75% and deployment complexity by an order of magnitude. Furthermore, we introduce a standardized fairness evaluation module utilizing scikit-learn metrics to detect and mitigate cross-lingual performance disparities.
Index Terms—Natural Language Processing, Multilingualism, Modular Architecture, Transformers, Zero-Shot Learning, Cross- lingual Transfer, Scalability, Computational Sustainability.
Download