Neural Network Powered Social Media Forensics
- Version
- Download 2
- File Size 390.02 KB
- File Count 1
- Create Date 3 June 2025
- Last Updated 3 June 2025
Neural Network Powered Social Media Forensics
Authors:
- HARISH
MASTER OF COMPUTER APPLICATION
Dr.M. G. R. EDUCATIONAL AND RESEARCH INSTITUTE
ABSTRACT: Cyberbullying has become a significant challenge on social media platforms due to the widespread and unregulated nature of user-generated content. The detection of cyberbullying is particularly difficult because the language used online is often informal, ambiguous, filled with slang, abbreviations, emojis, and highly dependent on context. Traditional rule-based systems and machine learning models such as Naive Bayes, SVM, and Decision Trees, while foundational, have shown limited effectiveness as they rely on shallow feature representations like bag-of-words and TF-IDF, which lack semantic understanding. Deep learning approaches, such as CNNs and LSTMs, have improved performance by learning from word sequences and context to some extent, but they still struggle with long-range dependencies and are computationally expensive. To overcome these limitations, this project proposes a fine-grained cyberbullying detection approach using DistilBERT, a compact and efficient version of BERT (Bidirectional Encoder Representations from Transformers). DistilBERT retains over 95% of BERT’s language understanding capabilities while being 40% smaller and 60% faster, making it suitable for real- time applications. By leveraging DistilBERT’s deep contextual embedding power, the model is expected to accurately classify not only the presence of cyberbullying but also its specific type— such as threats, hate speech, insults, or sexual harassment—offering a more nuanced understanding of online abuse. The system is trained and evaluated on a large-scale, annotated tweet dataset, where data preprocessing involves cleaning, normalization, tokenization, and careful treatment of elements like hashtags, mentions, and emojis to retain their semantic significance. The model training process uses the Hugging Face Transformers library, with performance evaluated using accuracy, precision, recall, and F1 score. Compared to traditional and deep learning baselines, this DistilBERT-based approach is hypothesized to achieve superior classification results, demonstrating the strength of transformer-based architectures in handling complex, real-world language tasks. Ultimately, this research contributes to the development of faster, more accurate, and context-aware cyberbullying detection systems that can be integrated into social media platforms to ensure safer digital interactions
KEYWORDS: Cyberbullying Detection, DistilBERT, Natural Language Processing (NLP), Deep Learning, Social Media, Transformer Models.
Download