Multi-Source Fake Review Detection using Deep Learning (LSTM)
Multi-Source Fake Review Detection using Deep Learning (LSTM)
Authors:
Maimoon Shaik
Department of AI
FET, Jain Deemed-to-be University, Bengaluru-562112 maimoonabc@gmail.com
Janaki Kandasamy
Department of CSE (AI)
FET, Jain Deemed-to-be University, Bengaluru-562112 k.janaki@jainuniversity.ac.in
Abstract—With an increasing amount of user contributions on the Internet, such as users' content, there appears a very large network where one can find all sorts of reviews, articles, and feedbacks of any kind. What emerges in connection with such a network is the emergence of fake reviews and feedbacks that mislead Internet users. Solving the problem associated with such a phenomenon is the research focus of the current study. For solving the problem of discovering fake reviews, the use of multiple source networks with the help of the LSTM neural network will play a crucial role. In particular, the algorithm used in such a case would employ text sequences as input data. In the present case, however, it will include only 5,000 texts from the movie review database of users on the IMDb website. Positive ones would be considered as real reviews, while the negative ones will be regarded as Fake Reviews. The universal LSTM prediction wrapper is applied consistently for each source. The model demonstrates a classification accuracy of around 85% for the unknown IMDb dataset test samples. In an unexpected way, genuine ratings for real news from sites such as BBC News (average confidence score = 0.984) and technical forums like GitHub (average confidence score = 0.937) are relatively similar; on the other hand, there is a great difference between individual user posts coming from resources such as Hacker News and StackOverflow. The sensitivity of the algorithm to the text length can be noted for the NPR RSS feed because of the text padding with small pieces of texts, which leads to high uncertaint
Keywords—LSTM, fake review detection, deep learning, natural language processing, text classification, IMDb, multi-source scraping, sentiment analysis, recurrent neural networks, opinion spam, cross-domain detection