scholarly journals Machine Learning Fake News Classification with Optimal Feature Selection

Author(s):  
Muhammad Fayaz ◽  
Atif Khan ◽  
Muhammad Bilal ◽  
Sanaullah Khan

Abstract Nowadays, information is published in newspapers and social media while transmitted on radio and television about current events and specific fields of interest nationwide and abroad. It becomes difficult to explicit what is real and what is fake due to the explosive growth of online content. As a result, fake news has become epidemic and immensely challenging to analyze fake news to be verified by the producers in the form of data process outlets not to mislead the people. Indeed, it is a big challenge to the government and public to debate the situation depending on case to case. For the purpose several websites were developed for this purpose to classify the news as either real or fake depending on the website logic and algorithm. A mechanism has to be taken on fact-checking rumors and statements, particularly those that get thousands of views and likes before being debunked and refuted by expert sources. Various machine learning techniques have been used to detect and correctly classified of fake news. However, these approaches are restricted in terms of accuracy. This study has applied a Random Forest (RF) classifier to predict fake or real news. For this prpose, twenty-three (23) textual features are extracted from ISOT Fake News Dataset. Four best feature selection techniques like Chi2, Univariate, information gain and Feature importance are used for selecting fourteen best features out of twenty-three. The proposed model and other benchmark techniques are evaluated on the dataset by using best features. Experimental findings show that, the proposed model outperformed state-of-the-art machine learning techniques such as GBM, XGBoost and Ada Boost Regression Model in terms of classification accuracy.

2018 ◽  
Vol 7 (4.5) ◽  
pp. 654
Author(s):  
M. S. Satyanarayana ◽  
Aruna T.M ◽  
Divyaraj G.N

Accidents have become major issue in Developing countries like India now a day. As per the Surveys 60% of the accidents are happening due to over speed. Though the government has taken so many initiatives like Traffic Awareness & Driving Awareness Week etc.., but still the percentage of accidents are not getting reduced. In this paper a new technique has been introduced to reduce the percentage of accidents. The new technique is implemented using the concept of Machine Learning [1]. The Machine Learning based systems can be implemented in all vehicles to avoid the accidents at low cost [1]. The main objective of this system is to calculate the speed of the vehicle at three various locations based on the place where the vehicle speed must be controlled and if the speed is greater than the designated speed in that road then the vehicle automatically detects the problem and same will be intimated to the driver to control the speed of the vehicle. If the speed is less or equal to the designated speed in that road then the vehicle will be passed without any disturbance. The system will be giving beep sound along with color indication to driver in each and every scenario. The other option implemented in this system is if the driver is driving the vehicle in the night and if he feel drowsy the system detects it immediately and alarm sound will be initiated to wake up the driver. This system though it won’t avoid 100% accidents at least it will reduce the percentage of accidents. This system is not only to avoid accidents it will also intelligently control the speed of the vehicles and creates awareness amongst the drivers.  


2021 ◽  
Author(s):  
◽  
Cao Truong Tran

<p>Classification is a major task in machine learning and data mining. Many real-world datasets suffer from the unavoidable issue of missing values. Classification with incomplete data has to be carefully handled because inadequate treatment of missing values will cause large classification errors.    Existing most researchers working on classification with incomplete data focused on improving the effectiveness, but did not adequately address the issue of the efficiency of applying the classifiers to classify unseen instances, which is much more important than the act of creating classifiers. A common approach to classification with incomplete data is to use imputation methods to replace missing values with plausible values before building classifiers and classifying unseen instances. This approach provides complete data which can be then used by any classification algorithm, but sophisticated imputation methods are usually computationally intensive, especially for the application process of classification. Another approach to classification with incomplete data is to build a classifier that can directly work with missing values. This approach does not require time for estimating missing values, but it often generates inaccurate and complex classifiers when faced with numerous missing values. A recent approach to classification with incomplete data which also avoids estimating missing values is to build a set of classifiers which then is used to select applicable classifiers for classifying unseen instances. However, this approach is also often inaccurate and takes a long time to find applicable classifiers when faced with numerous missing values.   The overall goal of the thesis is to simultaneously improve the effectiveness and efficiency of classification with incomplete data by using evolutionary machine learning techniques for feature selection, clustering, ensemble learning, feature construction and constructing classifiers.   The thesis develops approaches for improving imputation for classification with incomplete data by integrating clustering and feature selection with imputation. The approaches improve both the effectiveness and the efficiency of using imputation for classification with incomplete data.   The thesis develops wrapper-based feature selection methods to improve input space for classification algorithms that are able to work directly with incomplete data. The methods not only improve the classification accuracy, but also reduce the complexity of classifiers able to work directly with incomplete data.   The thesis develops a feature construction method to improve input space for classification algorithms with incomplete data by proposing interval genetic programming-genetic programming with a set of interval functions. The method improves the classification accuracy and reduces the complexity of classifiers.   The thesis develops an ensemble approach to classification with incomplete data by integrating imputation, feature selection, and ensemble learning. The results show that the approach is more accurate, and faster than previous common methods for classification with incomplete data.   The thesis develops interval genetic programming to directly evolve classifiers for incomplete data. The results show that classifiers generated by interval genetic programming can be more effective and efficient than classifiers generated the combination of imputation and traditional genetic programming. Interval genetic programming is also more effective than common classification algorithms able to work directly with incomplete data.    In summary, the thesis develops a range of approaches for simultaneously improving the effectiveness and efficiency of classification with incomplete data by using a range of evolutionary machine learning techniques.</p>


Text mining utilizes machine learning (ML) and natural language processing (NLP) for text implicit knowledge recognition, such knowledge serves many domains as translation, media searching, and business decision making. Opinion mining (OM) is one of the promised text mining fields, which are used for polarity discovering via text and has terminus benefits for business. ML techniques are divided into two approaches: supervised and unsupervised learning, since we herein testified an OM feature selection(FS)using four ML techniques. In this paper, we had implemented number of experiments via four machine learning techniques on the same three Arabic language corpora. This paper aims at increasing the accuracy of opinion highlighting on Arabic language, by using enhanced feature selection approaches. FS proposed model is adopted for enhancing opinion highlighting purpose. The experimental results show the outperformance of the proposed approaches in variant levels of supervisory,i.e. different techniques via distinct data domains. Multiple levels of comparison are carried out and discussed for further understanding of the impact of proposed model on several ML techniques.


2017 ◽  
Author(s):  
Vinicius Da S. Segalin ◽  
Carina F. Dorneles ◽  
Mario A. R. Dantas

AA well-known challenge with long running time queries in database environments is how much time a query will take to execute. This prediction is relevant for several reasons. For instance, by knowing that a query will take longer to execute than desired, one resource reservation mechanism can be performed, which means reserving more resources in order to execute this query in a shorter time in a future request. In this research work, it is presented a proposal in which the use of an advance reservation mechanism in a cloud database environment, considering machine learning techniques, provides resource recommendation. The proposed model is presented, in addition to some experiments that evaluate benefits and the efficiency of this enhanced proposal.


Inventions ◽  
2020 ◽  
Vol 5 (4) ◽  
pp. 57
Author(s):  
Attique Ur Rehman ◽  
Tek Tjing Lie ◽  
Brice Vallès ◽  
Shafiqur Rahman Tito

The recent advancement in computational capabilities and deployment of smart meters have caused non-intrusive load monitoring to revive itself as one of the promising techniques of energy monitoring. Toward effective energy monitoring, this paper presents a non-invasive load inference approach assisted by feature selection and ensemble machine learning techniques. For evaluation and validation purposes of the proposed approach, one of the major residential load elements having solid potential toward energy efficiency applications, i.e., water heating, is considered. Moreover, to realize the real-life deployment, digital simulations are carried out on low-sampling real-world load measurements: New Zealand GREEN Grid Database. For said purposes, MATLAB and Python (Scikit-Learn) are used as simulation tools. The employed learning models, i.e., standalone and ensemble, are trained on a single household’s load data and later tested rigorously on a set of diverse households’ load data, to validate the generalization capability of the employed models. This paper presents a comprehensive performance evaluation of the presented approach in the context of event detection, feature selection, and learning models. Based on the presented study and corresponding analysis of the results, it is concluded that the proposed approach generalizes well to the unseen testing data and yields promising results in terms of non-invasive load inference.


Technologies ◽  
2020 ◽  
Vol 8 (4) ◽  
pp. 64
Author(s):  
Panagiotis Kantartopoulos ◽  
Nikolaos Pitropakis ◽  
Alexios Mylonas ◽  
Nicolas Kylilis

Social media has become very popular and important in people’s lives, as personal ideas, beliefs and opinions are expressed and shared through them. Unfortunately, social networks, and specifically Twitter, suffer from massive existence and perpetual creation of fake users. Their goal is to deceive other users employing various methods, or even create a stream of fake news and opinions in order to influence an idea upon a specific subject, thus impairing the platform’s integrity. As such, machine learning techniques have been widely used in social networks to address this type of threat by automatically identifying fake accounts. Nonetheless, threat actors update their arsenal and launch a range of sophisticated attacks to undermine this detection procedure, either during the training or test phase, rendering machine learning algorithms vulnerable to adversarial attacks. Our work examines the propagation of adversarial attacks in machine learning based detection for fake Twitter accounts, which is based on AdaBoost. Moreover, we propose and evaluate the use of k-NN as a countermeasure to remedy the effects of the adversarial attacks that we have implemented.


Sign in / Sign up

Export Citation Format

Share Document