Improving Classification Efficiency of Fake News using Semi-Supervised Method
Abstract Online News media which is more accessible, cheaper, and faster to consume, is also of questionable quality as there is less moderation. Anybody with a computing device and internet connection can take part in creating, contributing, and spreading news in online portals. Social media has intensified the problem further. Due to the high volume, velocity, and veracity, online news content is beyond traditional moderation, also known as moderation through human experts. So different machine learning method is being tested and used to spot fake news. One of the main challenges for fake-news classification is getting labeled instances for this high volume of real-time data. In this study, we examined how semi-supervised machine learning can help to decrease the need for labeled instances with an acceptable drop of accuracy. The accuracy difference between the supervised classifier and the semi-supervised classifier is around 0.05 while using only five percent of label instances of the supervised classifier. We tested with logistic regression, SVM, and random forest classifier to prove our hypothesis.