scholarly journals Fake News Data Exploration and Analytics

Electronics ◽  
2021 ◽  
Vol 10 (19) ◽  
pp. 2326
Author(s):  
Mazhar Javed Awan ◽  
Awais Yasin ◽  
Haitham Nobanee ◽  
Ahmed Abid Ali ◽  
Zain Shahzad ◽  
...  

Before the internet, people acquired their news from the radio, television, and newspapers. With the internet, the news moved online, and suddenly, anyone could post information on websites such as Facebook and Twitter. The spread of fake news has also increased with social media. It has become one of the most significant issues of this century. People use the method of fake news to pollute the reputation of a well-reputed organization for their benefit. The most important reason for such a project is to frame a device to examine the language designs that describe fake and right news through machine learning. This paper proposes models of machine learning that can successfully detect fake news. These models identify which news is real or fake and specify the accuracy of said news, even in a complex environment. After data-preprocessing and exploration, we applied three machine learning models; random forest classifier, logistic regression, and term frequency-inverse document frequency (TF-IDF) vectorizer. The accuracy of the TFIDF vectorizer, logistic regression, random forest classifier, and decision tree classifier models was approximately 99.52%, 98.63%, 99.63%, and 99.68%, respectively. Machine learning models can be considered a great choice to find reality-based results and applied to other unstructured data for various sentiment analysis applications.

Author(s):  
Sheikh Shehzad Ahmed

The Internet is used practically everywhere in today's digital environment. With the increased use of the Internet comes an increase in the number of threats. DDoS attacks are one of the most popular types of cyber-attacks nowadays. With the fast advancement of technology, the harm caused by DDoS attacks has grown increasingly severe. Because DDoS attacks may readily modify the ports/protocols utilized or how they function, the basic features of these attacks must be examined. Machine learning approaches have also been used extensively in intrusion detection research. Still, it is unclear what features are applicable and which approach would be better suited for detection. With this in mind, the research presents a machine learning-based DDoS attack detection approach. To train the attack detection model, we employ four Machine Learning algorithms: Decision Tree classifier (ID3), k-Nearest Neighbors (k-NN), Logistic Regression, and Random Forest classifier. The results of our experiments show that the Random Forest classifier is more accurate in recognizing attacks.


2020 ◽  
Vol 25 (5) ◽  
pp. 637-644
Author(s):  
Shiva Prasad Satla ◽  
Manchala Sadanandam ◽  
Buradagunta Suvarna

Many vulnerable, heinous acts that are coming about in the society especially at Roads, most specifically affecting women in the society, are more in recent days. Though new technologies are developing day by day, the fatality rate is not in control to date. Without proper guidance to the people about the particular place where there is a big scope of occurrence of a greater number of accidents, this menace cannot be regulated. It is required to highlight the District-wise data and Roads where the accidents and fatalities are more. The data would help the policymakers to put in place Focused Initiatives regarding those top dangerous roads to address the menace of rising road accidents and resultant fatalities. In this, we created a dataset in Andhra Pradesh where we include those attributes that are helpful for our analysis to predict which road is the most dangerous one. We applied various Machine Learning models such as Logistic regression, Random forest classifier, Gradient Boosting Classifier, Gaussian Naive Bayes, Decision Tree Classifier, K- Nearest Neighbour Classifier and SVM to predict the dangerous roads. It is observed that Logistic Regression provides good accuracy with 87.14.


2021 ◽  
Author(s):  
Ryan Moore ◽  
Kristin R. Archer ◽  
Leena Choi

AbstractPurposeAccelerometers are increasingly utilized in healthcare research to assess human activity. Accelerometry data are often collected by mailing accelerometers to participants, who wear the accelerometers to collect data on their activity. The devices are then mailed back to the laboratory for analysis. We develop models to classify days in accelerometry data as activity from actual human wear or the delivery process. These models can be used to automate the cleaning of accelerometry datasets that are adulterated with activity from delivery.MethodsFor the classification of delivery days in accelerometry data, we developed statistical and machine learning models in a supervised learning context using a large human activity and delivery labeled accelerometry dataset. We extracted several features, which were included to develop random forest, logistic regression, mixed effects regression, and multilayer perceptron models, while convolutional neural network, recurrent neural network, and hybrid convolutional recurrent neural network models were developed without feature extraction. Model performances were assessed using Monte Carlo cross-validation.ResultsWe found that a hybrid convolutional recurrent neural network performed best in the classification task with an F1 score of 0.960 but simpler models such as logistic regression and random forest also had excellent performance with F1 scores of 0.951 and 0.957, respectively.ConclusionThe models developed in this study can be used to classify days in accelerometry data as either human or delivery activity. An analyst can weigh the larger computational cost and greater performance of the convolutional recurrent neural network against the faster but slightly less powerful random forest or logistic regression. The best performing models for classification of delivery data are publicly available on the open source R package, PhysicalActivity.


2019 ◽  
Vol 8 (4) ◽  
pp. 1477-1483

With the fast moving technological advancement, the internet usage has been increased rapidly in all the fields. The money transactions for all the applications like online shopping, banking transactions, bill settlement in any industries, online ticket booking for travel and hotels, Fees payment for educational organization, Payment for treatment to hospitals, Payment for super market and variety of applications are using online credit card transactions. This leads to the fraud usage of other accounts and transaction that result in the loss of service and profit to the institution. With this background, this paper focuses on predicting the fraudulent credit card transaction. The Credit Card Transaction dataset from KAGGLE machine learning Repository is used for prediction analysis. The analysis of fraudulent credit card transaction is achieved in four ways. Firstly, the relationship between the variables of the dataset is identified and represented by the graphical notations. Secondly, the feature importance of the dataset is identified using Random Forest, Ada boost, Logistic Regression, Decision Tree, Extra Tree, Gradient Boosting and Naive Bayes classifiers. Thirdly, the extracted feature importance if the credit card transaction dataset is fitted to Random Forest classifier, Ada boost classifier, Logistic Regression classifier, Decision Tree classifier, Extra Tree classifier, Gradient Boosting classifier and Naive Bayes classifier. Fourth, the Performance Analysis is done by analyzing the performance metrics like Accuracy, FScore, AUC Score, Precision and Recall. The implementation is done by python in Anaconda Spyder Navigator Integrated Development Environment. Experimental Results shows that the Decision Tree classifier have achieved the effective prediction with the precision of 1.0, recall of 1.0, FScore of 1.0 , AUC Score of 89.09 and Accuracy of 99.92%.


Genus ◽  
2020 ◽  
Vol 76 (1) ◽  
Author(s):  
Fikrewold H. Bitew ◽  
Samuel H. Nyarko ◽  
Lloyd Potter ◽  
Corey S. Sparks

Abstract There is a dearth of literature on the use of machine learning models to predict important under-five mortality risks in Ethiopia. In this study, we showed spatial variations of under-five mortality and used machine learning models to predict its important sociodemographic determinants in Ethiopia. The study data were drawn from the 2016 Ethiopian Demographic and Health Survey. We used three machine learning models such as random forests, logistic regression, and K-nearest neighbors as well as one traditional logistic regression model to predict under-five mortality determinants. For each machine learning model, measures of model accuracy and receiver operating characteristic curves were used to evaluate the predictive power of each model. The descriptive results show that there are considerable regional variations in under-five mortality rates in Ethiopia. The under-five mortality prediction ability was found to be between 46.3 and 67.2% for the models considered, with the random forest model (67.2%) showing the best performance. The best predictive model shows that household size, time to the source of water, breastfeeding status, number of births in the preceding 5 years, sex of a child, birth intervals, antenatal care, birth order, type of water source, and mother’s body mass index play an important role in under-five mortality levels in Ethiopia. The random forest machine learning model produces a better predictive power for estimating under-five mortality risk factors and may help to improve policy decision-making in this regard. Childhood survival chances can be improved considerably by using these important factors to inform relevant policies.


In the growing era of technological world, the people are suffered with various diseases. The common disease faced by the population irrespective of the age is the heart disease. Though the world is blooming in technological aspects, the prediction and the identification of the heart disease still remains a challenging issue. Due to the deficiency of the availability of patient symptoms, the prediction of heart disease is a disputed charge. With this overview, we have used Heart Disease Prediction dataset extorted from UCI Machine Learning Repository for the analysis and comparison of various parameters in the classification algorithms. The parameter analysis of various classification algorithms of heart disease classes are done in five ways. Firstly, the analysis of dataset is done by exploiting the correlation matrix, feature importance analysis, Target distribution of the dataset and Disease probability based on the density distribution of age and sex. Secondly, the dataset is fitted to K-Nearest Neighbor classifier to analyze the performance for the various combinations of neighbors with and without PCA. Thirdly, the dataset is fitted to Support Vector classifier to analyze the performance for the various combinations of kernels with and without PCA. Fourth, the dataset is fitted to Decision Tree classifier to analyze the performance for the various combinations of features with and without PCA. Fifth, the dataset is fitted to Random Forest classifier to analyze the performance for the various levels of estimators with and without PCA. The implementation is done using python language under Spyder platform with Anaconda Navigator. Experimental results shows that for KNN classifier, the performance for 12 neighbours is found to be effective with 0.52 before applying PCA and 0.53 after applying PCA. For Support Vector classifier, the rbf kernel is found to be effective with the score of 0.519 with and without PCA. For Decision Tree classifier, before applying PCA, the score is 0.47 for 7 features and after applying PCA, the score is 0.49 for 4 features. For, Random Forest Classifier, before applying PCA, the score is 0.53 for 500 estimators and after applying PCA, the score is 0.52 for 500 estimators.


Author(s):  
Kallu Samatha ◽  
Muppidi Rohitha Reddy ◽  
Pattan Faizal Khan ◽  
Rayapati Akhil Chowdary ◽  
P.V.R.D Prasada Rao

Kidney diseases are increasing day by day among people. It is becoming a major health issue around the world. Not maintaining proper food habits and drinking less amount of water are one of the major reasons that contribute this condition. With this, it has become necessary to build up a system to foresee Chronic Kidney Diseases precisely. Here, we have proposed an approach for real time kidney disease prediction. Our aim is to find the best and efficient machine learning (ML) application that can effectively recognize and predict the condition of chronic kidney disease. We have used the data from UCI machine learning repository. In this work, five important machine learning classification techniques were considered for predicting chronic kidney disease which are KNN, Logistic Regression, Random Forest Classifier, SVM and Decision Tree Classifier. In this process, the data has been divided into two sections. In one section train dataset got trained and another section got evaluated by test dataset. The analysis results show that Decision Tree Classifier and Logistic Regression algorithms achieved highest performance than the other classifiers, obtaining the accuracy of 98.75% followed by random Forest, which stands at 97.5%.


2019 ◽  
Vol 8 (4) ◽  
pp. 10316-10320

Nowadays, heart disease has become a major disease among the people irrespective of the age. We are seeing this even in children dying due to the heart disease. If we can predict this even before they die, there may be huge chances of surviving. Everybody has various qualities of beat rate (pulse rate) and circulatory strain (blood pressure). We are living in a period of data. Due to the rise in the technology, the amount of data that is generated is increasing daily. Some terabytes of data are being produced and stored. For example, the huge amount of data about the patients is produced in the hospitals such as chest pain, heart rate, blood pressure, pulse rate etc. If we can get this data and apply some machine learning techniques, we can reduce the probability of people dying. In this paper we have done survey using different classification and grouping strategies, for example, KNN, Decision tree classifier, Gaussian Naïve Bayes, Support vector machine, Linear regression, Logistic regression, Random forest classifier, Random forest regression, linear descriptive analysis. We have taken the 14 attributes that are present in the dataset as an input and applying on the dataset which is taken from the UCI repository to develop and accurate model of predicting the heart disease contains colossal (huge) therapeutic (medical) information. In the proposed research, the exhibition of the conclusion model is acquired by using utilizing classification strategies. In this paper proposed an accuracy model to predict whether a person has coronary disease or not. This is implemented by comparing the accuracies of different machine-learning strategies such as KNN, Decision tree classifier, Gaussian Naïve Bayes, SVM, Logistic regression, Random forest classifier, Linear regression, Random forest regression, linear descriptive analysis


2021 ◽  
Vol 1 (1) ◽  
pp. 8-13
Author(s):  
Amir Mahmud Husein ◽  
Mawaddah Harahap

Peralihan pelanggan merupakan fenomena dimana pelanggan perusahaan berhenti membeli atau berinteraksi sehingga sangat penting bagi perusahaan khususnya perbankan untuk memprediksi kemungkinan churn pelanggan dan hasilnya dapat digunakan untuk membantu retensi pelanggan dan bagian dari strategi perusahaan. Makalah ini menyajikan analisis dan prediksi churn pelanggan dengan menggunakan lima model berbeda yaitu Kneighbors Classifier, Logistic Regression, Linear SVC, Random Tree Classifier dan Random Forest Classifier. Berdasarkan hasil pengujian pendekatan model Random Forest Classifier dan Kneighbors Classifier lebih baik dari pada model lain dengan akurasi sebesar 86% dan 84%. Rekayasa fitur dengan pendekatan Anova dan Chi Square memiliki pengaruh yang signifikan terhadap peningkatan kinerja model prediksi.


Sign in / Sign up

Export Citation Format

Share Document