Toxic to Positive Comment Rewriting using Supervised Fine-Tuning and Direct Preference Optimization
Toxic to Positive Comment Rewriting using Supervised Fine-Tuning and Direct Preference Optimization
M. Vardhan
Dept. of Computer Science Engineering
RGUKT Basar
Basar, India B200242
P. Divya
Dept. of Computer Science
Engineering RGUKT Basar
Basar, India B200535
G. Krishna Reddy
Dept. of Computer Science Engineering
RGUKT Basar
Basar, India B200596
Abstract—The increased use of harmful, offensive, and disre-spectful language online can be attributed to the rapid growth of social media platforms. Although many existing systems focus on detecting and removing harmful content and toxic comments, this approach does not always encourage constructiveText detoxification is a challenging Natural Language Pro- cessing (NLP) task that requires controlled text generation and contextual understanding. In this study, we propose a transformer-based system for rewriting toxic comments that utilizes both Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO).