Hoax News Classification using Machine Learning Algorithms

Hoax news on social media has had a dramatic effect on our society in recent years. The impact of hoax news felt by many people, anxiety, financial loss, and loss of the right name. Therefore we need a detection system that can help reduce hoax news on social media. Hoax news classification is one of the stages in the construction of a hoax news detection system, and this unsupervised learning algorithm becomes a method for creating hoax news datasets, machine learning tools for data processing, and text processing for detecting data. The next will produce a classification of a hoax or not a Hoax based on the text inputted. Hoax news classification in this study uses five algorithms, namely Support Vector Machine, Naïve Bayes, Decision Tree, Logistic Regression, Stochastic Gradient Descent, and Neural Network (MLP). These five algorithms to produce the best algorithm that can use to detect hoax news, with the highest parameters, accuracy, F-measure, Precision, and recall. From the results of testing conducted on five classification algorithms produced shows that the NN-MPL algorithm has an average of 93% for the value of accuracy, F-Measure, and Precision, the highest compared to five other algorithms, but for the highest Recall value generated from the algorithm SVM which is 94%. the results of this experiment show that different effects for different classifiers, and that means that the more hoax data used as training data, the more accurate the system calculates accuracy in more detail.

Download Full-text

Earthquake Prediction using Machine Learning Algorithm

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.e9110.018620 ◽

2020 ◽

Vol 8 (6) ◽

pp. 4684-4688

Keyword(s):

Machine Learning ◽

Structural Damage ◽

Data Science ◽

Learning Algorithm ◽

Economic Loss ◽

Machine Learning Algorithms ◽

Training Data ◽

Support Vector ◽

Science Data ◽

Data Set

Per the statistics received from BBC, data varies for every earthquake occurred till date. Approximately, up to thousands are dead, about 50,000 are injured, around 1-3 Million are dislocated, while a significant amount go missing and homeless. Almost 100% structural damage is experienced. It also affects the economic loss, varying from 10 to 16 million dollars. A magnitude corresponding to 5 and above is classified as deadliest. The most life-threatening earthquake occurred till date took place in Indonesia where about 3 million were dead, 1-2 million were injured and the structural damage accounted to 100%. Hence, the consequences of earthquake are devastating and are not limited to loss and damage of living as well as nonliving, but it also causes significant amount of change-from surrounding and lifestyle to economic. Every such parameter desiderates into forecasting earthquake. A couple of minutes’ notice and individuals can act to shield themselves from damage and demise; can decrease harm and monetary misfortunes, and property, characteristic assets can be secured. In current scenario, an accurate forecaster is designed and developed, a system that will forecast the catastrophe. It focuses on detecting early signs of earthquake by using machine learning algorithms. System is entitled to basic steps of developing learning systems along with life cycle of data science. Data-sets for Indian sub-continental along with rest of the World are collected from government sources. Pre-processing of data is followed by construction of stacking model that combines Random Forest and Support Vector Machine Algorithms. Algorithms develop this mathematical model reliant on “training data-set”. Model looks for pattern that leads to catastrophe and adapt to it in its building, so as to settle on choices and forecasts without being expressly customized to play out the task. After forecast, we broadcast the message to government officials and across various platforms. The focus of information to obtain is keenly represented by the 3 factors – Time, Locality and Magnitude.

Download Full-text

Machine Learning Algorithms in Fraud Detection: Case Study on Retail Consumer Financing Company

Asia Pacific Fraud Journal ◽

10.21532/apfjournal.v6i2.216 ◽

2021 ◽

Vol 6 (2) ◽

pp. 213

Author(s):

Nadya Intan Mustika ◽

Bagus Nenda ◽

Dona Ramadhan

Keyword(s):

Machine Learning ◽

Random Forest ◽

Historical Data ◽

Learning Algorithm ◽

Fraud Detection ◽

Machine Learning Algorithms ◽

Training Data ◽

Support Vector ◽

Random Forest Algorithm ◽

Data Set

This study aims to implement a machine learning algorithm in detecting fraud based on historical data set in a retail consumer financing company. The outcome of machine learning is used as samples for the fraud detection team. Data analysis is performed through data processing, feature selection, hold-on methods, and accuracy testing. There are five machine learning methods applied in this study: Logistic Regression, K-Nearest Neighbor (KNN), Decision Tree, Random Forest, and Support Vector Machine (SVM). Historical data are divided into two groups: training data and test data. The results show that the Random Forest algorithm has the highest accuracy with a training score of 0.994999 and a test score of 0.745437. This means that the Random Forest algorithm is the most accurate method for detecting fraud. Further research is suggested to add more predictor variables to increase the accuracy value and apply this method to different financial institutions and different industries.

Download Full-text

Efficient detection of hacker community based on twitter data using complex networks and machine learning algorithm

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-210458 ◽

2021 ◽

pp. 1-17

Author(s):

Ahmed Al-Tarawneh ◽

Ja’afer Al-Saraireh

Keyword(s):

Machine Learning ◽

Complex Networks ◽

Nearest Neighbor ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

K Nearest Neighbor ◽

Efficient Detection ◽

Suggested Keywords

Twitter is one of the most popular platforms used to share and post ideas. Hackers and anonymous attackers use these platforms maliciously, and their behavior can be used to predict the risk of future attacks, by gathering and classifying hackers’ tweets using machine-learning techniques. Previous approaches for detecting infected tweets are based on human efforts or text analysis, thus they are limited to capturing the hidden text between tweet lines. The main aim of this research paper is to enhance the efficiency of hacker detection for the Twitter platform using the complex networks technique with adapted machine learning algorithms. This work presents a methodology that collects a list of users with their followers who are sharing their posts that have similar interests from a hackers’ community on Twitter. The list is built based on a set of suggested keywords that are the commonly used terms by hackers in their tweets. After that, a complex network is generated for all users to find relations among them in terms of network centrality, closeness, and betweenness. After extracting these values, a dataset of the most influential users in the hacker community is assembled. Subsequently, tweets belonging to users in the extracted dataset are gathered and classified into positive and negative classes. The output of this process is utilized with a machine learning process by applying different algorithms. This research build and investigate an accurate dataset containing real users who belong to a hackers’ community. Correctly, classified instances were measured for accuracy using the average values of K-nearest neighbor, Naive Bayes, Random Tree, and the support vector machine techniques, demonstrating about 90% and 88% accuracy for cross-validation and percentage split respectively. Consequently, the proposed network cyber Twitter model is able to detect hackers, and determine if tweets pose a risk to future institutions and individuals to provide early warning of possible attacks.

Download Full-text

An IoT-Focused Intrusion Detection System Approach Based on Preprocessing Characterization for Cybersecurity Datasets

Sensors ◽

10.3390/s21020656 ◽

2021 ◽

Vol 21 (2) ◽

pp. 656

Author(s):

Xavier Larriva-Novo ◽

Víctor A. Villagrá ◽

Mario Vega-Barbas ◽

Diego Rivera ◽

Mario Sanz Rodrigo

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

High Performance ◽

Learning Algorithm ◽

Detection System ◽

Machine Learning Algorithms ◽

Statistical Characteristics ◽

Detection Techniques ◽

Traffic Characteristics ◽

Benchmark Datasets

Security in IoT networks is currently mandatory, due to the high amount of data that has to be handled. These systems are vulnerable to several cybersecurity attacks, which are increasing in number and sophistication. Due to this reason, new intrusion detection techniques have to be developed, being as accurate as possible for these scenarios. Intrusion detection systems based on machine learning algorithms have already shown a high performance in terms of accuracy. This research proposes the study and evaluation of several preprocessing techniques based on traffic categorization for a machine learning neural network algorithm. This research uses for its evaluation two benchmark datasets, namely UGR16 and the UNSW-NB15, and one of the most used datasets, KDD99. The preprocessing techniques were evaluated in accordance with scalar and normalization functions. All of these preprocessing models were applied through different sets of characteristics based on a categorization composed by four groups of features: basic connection features, content characteristics, statistical characteristics and finally, a group which is composed by traffic-based features and connection direction-based traffic characteristics. The objective of this research is to evaluate this categorization by using various data preprocessing techniques to obtain the most accurate model. Our proposal shows that, by applying the categorization of network traffic and several preprocessing techniques, the accuracy can be enhanced by up to 45%. The preprocessing of a specific group of characteristics allows for greater accuracy, allowing the machine learning algorithm to correctly classify these parameters related to possible attacks.

Download Full-text

CAFD: Context-Aware Fault Diagnostic Scheme towards Sensor Faults Utilizing Machine Learning

Sensors ◽

10.3390/s21020617 ◽

2021 ◽

Vol 21 (2) ◽

pp. 617

Author(s):

Umer Saeed ◽

Young-Doo Lee ◽

Sana Ullah Jan ◽

Insoo Koo

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Diagnostic System ◽

Machine Learning Algorithms ◽

Support Vector ◽

Context Aware ◽

Sensor Faults ◽

Training Time ◽

Low Intensity ◽

Fault Diagnostic

Sensors’ existence as a key component of Cyber-Physical Systems makes it susceptible to failures due to complex environments, low-quality production, and aging. When defective, sensors either stop communicating or convey incorrect information. These unsteady situations threaten the safety, economy, and reliability of a system. The objective of this study is to construct a lightweight machine learning-based fault detection and diagnostic system within the limited energy resources, memory, and computation of a Wireless Sensor Network (WSN). In this paper, a Context-Aware Fault Diagnostic (CAFD) scheme is proposed based on an ensemble learning algorithm called Extra-Trees. To evaluate the performance of the proposed scheme, a realistic WSN scenario composed of humidity and temperature sensor observations is replicated with extreme low-intensity faults. Six commonly occurring types of sensor fault are considered: drift, hard-over/bias, spike, erratic/precision degradation, stuck, and data-loss. The proposed CAFD scheme reveals the ability to accurately detect and diagnose low-intensity sensor faults in a timely manner. Moreover, the efficiency of the Extra-Trees algorithm in terms of diagnostic accuracy, F1-score, ROC-AUC, and training time is demonstrated by comparison with cutting-edge machine learning algorithms: a Support Vector Machine and a Neural Network.

Download Full-text

Drill-Core Mineral Abundance Estimation Using Hyperspectral and High-Resolution Mineralogical Data

Remote Sensing ◽

10.3390/rs12071218 ◽

2020 ◽

Vol 12 (7) ◽

pp. 1218

Author(s):

Laura Tuşa ◽

Mahdi Khodadadzadeh ◽

Cecilia Contreras ◽

Kasra Rafiezadeh Shahi ◽

Margret Fuchs ◽

...

Keyword(s):

Machine Learning ◽

High Resolution ◽

Ore Deposits ◽

Machine Learning Algorithms ◽

Training Data ◽

Support Vector ◽

Drill Core ◽

Data Types ◽

Mineralogical Characterization ◽

Core Samples

Due to the extensive drilling performed every year in exploration campaigns for the discovery and evaluation of ore deposits, drill-core mapping is becoming an essential step. While valuable mineralogical information is extracted during core logging by on-site geologists, the process is time consuming and dependent on the observer and individual background. Hyperspectral short-wave infrared (SWIR) data is used in the mining industry as a tool to complement traditional logging techniques and to provide a rapid and non-invasive analytical method for mineralogical characterization. Additionally, Scanning Electron Microscopy-based image analyses using a Mineral Liberation Analyser (SEM-MLA) provide exhaustive high-resolution mineralogical maps, but can only be performed on small areas of the drill-cores. We propose to use machine learning algorithms to combine the two data types and upscale the quantitative SEM-MLA mineralogical data to drill-core scale. This way, quasi-quantitative maps over entire drill-core samples are obtained. Our upscaling approach increases result transparency and reproducibility by employing physical-based data acquisition (hyperspectral imaging) combined with mathematical models (machine learning). The procedure is tested on 5 drill-core samples with varying training data using random forests, support vector machines and neural network regression models. The obtained mineral abundance maps are further used for the extraction of mineralogical parameters such as mineral association.

Download Full-text

A sentiment analysis system for social media using machine learning techniques: Social enablement

Digital Scholarship in the Humanities ◽

10.1093/llc/fqy037 ◽

2018 ◽

Vol 34 (3) ◽

pp. 569-581 ◽

Cited By ~ 1

Author(s):

Sujata Rani ◽

Parteek Kumar

Keyword(s):

Machine Learning ◽

Social Media ◽

Sentiment Analysis ◽

Media Analysis ◽

Training Data ◽

Machine Learning Techniques ◽

Support Vector ◽

Analysis Tool ◽

Data Set ◽

Learning Techniques

Abstract In this article, an innovative approach to perform the sentiment analysis (SA) has been presented. The proposed system handles the issues of Romanized or abbreviated text and spelling variations in the text to perform the sentiment analysis. The training data set of 3,000 movie reviews and tweets has been manually labeled by native speakers of Hindi in three classes, i.e. positive, negative, and neutral. The system uses WEKA (Waikato Environment for Knowledge Analysis) tool to convert these string data into numerical matrices and applies three machine learning techniques, i.e. Naive Bayes (NB), J48, and support vector machine (SVM). The proposed system has been tested on 100 movie reviews and tweets, and it has been observed that SVM has performed best in comparison to other classifiers, and it has an accuracy of 68% for movie reviews and 82% in case of tweets. The results of the proposed system are very promising and can be used in emerging applications like SA of product reviews and social media analysis. Additionally, the proposed system can be used in other cultural/social benefits like predicting/fighting human riots.

Download Full-text

Automatic Grading of Stroke Symptoms for Rapid Assessment Using Optimized Machine Learning and 4-Limb Kinematics: Clinical Validation Study (Preprint)

10.2196/preprints.20641 ◽

2020 ◽

Author(s):

Eunjeong Park ◽

Kijeong Lee ◽

Taehwa Han ◽

Hyo Suk Nam

Keyword(s):

Machine Learning ◽

Medical Staff ◽

Machine Learning Algorithms ◽

Training Data ◽

Support Vector ◽

Grading System ◽

Operating Characteristics ◽

Stroke Patients ◽

Motor Weakness ◽

Automatic Grading

BACKGROUND Subtle abnormal motor signs are indications of serious neurological diseases. Although neurological deficits require fast initiation of treatment in a restricted time, it is difficult for nonspecialists to detect and objectively assess the symptoms. In the clinical environment, diagnoses and decisions are based on clinical grading methods, including the National Institutes of Health Stroke Scale (NIHSS) score or the Medical Research Council (MRC) score, which have been used to measure motor weakness. Objective grading in various environments is necessitated for consistent agreement among patients, caregivers, paramedics, and medical staff to facilitate rapid diagnoses and dispatches to appropriate medical centers. OBJECTIVE In this study, we aimed to develop an autonomous grading system for stroke patients. We investigated the feasibility of our new system to assess motor weakness and grade NIHSS and MRC scores of 4 limbs, similar to the clinical examinations performed by medical staff. METHODS We implemented an automatic grading system composed of a measuring unit with wearable sensors and a grading unit with optimized machine learning. Inertial sensors were attached to measure subtle weaknesses caused by paralysis of upper and lower limbs. We collected 60 instances of data with kinematic features of motor disorders from neurological examination and demographic information of stroke patients with NIHSS 0 or 1 and MRC 7, 8, or 9 grades in a stroke unit. Training data with 240 instances were generated using a synthetic minority oversampling technique to complement the imbalanced number of data between classes and low number of training data. We trained 2 representative machine learning algorithms, an ensemble and a support vector machine (SVM), to implement auto-NIHSS and auto-MRC grading. The optimized algorithms performed a 5-fold cross-validation and were searched by Bayes optimization in 30 trials. The trained model was tested with the 60 original hold-out instances for performance evaluation in accuracy, sensitivity, specificity, and area under the receiver operating characteristics curve (AUC). RESULTS The proposed system can grade NIHSS scores with an accuracy of 83.3% and an AUC of 0.912 using an optimized ensemble algorithm, and it can grade with an accuracy of 80.0% and an AUC of 0.860 using an optimized SVM algorithm. The auto-MRC grading achieved an accuracy of 76.7% and a mean AUC of 0.870 in SVM classification and an accuracy of 78.3% and a mean AUC of 0.877 in ensemble classification. CONCLUSIONS The automatic grading system quantifies proximal weakness in real time and assesses symptoms through automatic grading. The pilot outcomes demonstrated the feasibility of remote monitoring of motor weakness caused by stroke. The system can facilitate consistent grading with instant assessment and expedite dispatches to appropriate hospitals and treatment initiation by sharing auto-MRC and auto-NIHSS scores between prehospital and hospital responses as an objective observation.

Download Full-text

An analytical survey on the role of machine learning algorithms in case of intrusion detection

ACCENTS Transactions on Information Security ◽

10.19101/tis.2020.517002 ◽

2020 ◽

Vol 5 (19) ◽

pp. 32-35

Author(s):

Anand Vijay ◽

Kailash Patidar ◽

Manoj Yadav ◽

Rishi Kushwah

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Intrusion Detection ◽

Intrusion Detection System ◽

Detection System ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Handling Mechanism ◽

The Impact

In this paper an analytical survey on the role of machine learning algorithms in case of intrusion detection has been presented and discussed. This paper shows the analytical aspects in the development of efficient intrusion detection system (IDS). The related study for the development of this system has been presented in terms of computational methods. The discussed methods are data mining, artificial intelligence and machine learning. It has been discussed along with the attack parameters and attack types. This paper also elaborates the impact of different attack and handling mechanism based on the previous papers.

Download Full-text

Fault detection for air conditioning system using machine learning

IAES International Journal of Artificial Intelligence (IJ-AI) ◽

10.11591/ijai.v9.i1.pp109-116 ◽

2020 ◽

Vol 9 (1) ◽

pp. 109

Author(s):

Noor Asyikin Sulaiman ◽

Md Pauzi Abdullah ◽

Hayati Abdullah ◽

Muhammad Noorazlan Shah Zainudin ◽

Azdiana Md Yusop

Keyword(s):

Machine Learning ◽

Supervised Learning ◽

Air Conditioning ◽

Machine Learning Algorithms ◽

Coefficient Of Performance ◽

Support Vector ◽

Air Conditioning System ◽

Learning Classifier ◽

Negative Impacts ◽

The Impact

Air conditioning system is a complex system and consumes the most energy in a building. Any fault in the system operation such as cooling tower fan faulty, compressor failure, damper stuck, etc. could lead to energy wastage and reduction in the system’s coefficient of performance (COP). Due to the complexity of the air conditioning system, detecting those faults is hard as it requires exhaustive inspections. This paper consists of two parts; i) to investigate the impact of different faults related to the air conditioning system on COP and ii) to analyse the performances of machine learning algorithms to classify those faults. Three supervised learning classifier models were developed, which were deep learning, support vector machine (SVM) and multi-layer perceptron (MLP). The performances of each classifier were investigated in terms of six different classes of faults. Results showed that different faults give different negative impacts on the COP. Also, the three supervised learning classifier models able to classify all faults for more than 94%, and MLP produced the highest accuracy and precision among all.

Download Full-text