scholarly journals Analyzing Fake News Based on Machine Learning Algorithms

Author(s):  
Pawar A B ◽  
Jawale M A ◽  
Kyatanavar D N

Usages of Natural Language Processing techniques in the field of detection of fake news is analyzed in this research paper. Fake news are misleading concepts spread by invalid resources can provide damages to human-life, society. To carry out this analysis work, dataset obtained from web resource OpenSources.co is used which is mainly part of Signal Media. The document frequency terms as TF-IDF of bi-grams used in correlation with PCFG (Probabilistic Context Free Grammar) on a set of 11,000 documents extracted as news articles. This set tested on classification algorithms namely SVM (Support Vector Machines), Stochastic Gradient Descent, Bounded Decision Trees, Gradient Boosting algorithm with Random Forests. In experimental analysis, found that combination of Stochastic Gradient Descent with TF-IDF of bi-grams gives an accuracy of 77.2% in detecting fake contents, which observes with PCFGs having slight recalling defects

Author(s):  
L. S. Koriashkina ◽  
H. V. Symonets

Purpose. Detecting toxic comments on YouTube video hosting under training videos by classifying unstructured text using a combination of machine learning methods. Methodology. To work with the specified type of data, machine learning methods were used for cleaning, normalizing, and presenting textual data in a form acceptable for processing on a computer. Directly to classify comments as “toxic”, we used a logistic regression classifier, a linear support vector classification method without and with a learning method – stochastic gradient descent, a random forest classifier and a gradient enhancement classifier. In order to assess the work of the classifiers, the methods of calculating the matrix of errors, accuracy, completeness and F-measure were used. For a more generalized assessment, a cross-validation method was used. Python programming language. Findings. Based on the assessment indicators, the most optimal methods were selected – support vector machine (Linear SVM), without and with the training method using stochastic gradient descent. The described technologies can be used to analyze the textual comments under any training videos to detect toxic reviews. Also, the approach can be useful for identifying unwanted or even aggressive information on social networks or services where reviews are provided. Originality. It consists in a combination of methods for preprocessing a specific type of text, taking into account such features as the possibility of having a timecode, emoji, links, and the like, as well as in the adaptation of classification methods of machine learning for the analysis of Russian-language comments. Practical value. It is about optimizing (simplification) the comment analysis process. The need for this processing is due to the growing volumes of text data, especially in the field of education through quarantine conditions and the transition to distance learning. The volume of educational Internet content already needs to automate the processing and analysis of feedback, over time this need will only grow.


Symmetry ◽  
2021 ◽  
Vol 13 (9) ◽  
pp. 1652
Author(s):  
Wanida Panup ◽  
Rabian Wangkeeree

In this paper, we propose a stochastic gradient descent algorithm, called stochastic gradient descent method-based generalized pinball support vector machine (SG-GPSVM), to solve data classification problems. This approach was developed by replacing the hinge loss function in the conventional support vector machine (SVM) with a generalized pinball loss function. We show that SG-GPSVM is convergent and that it approximates the conventional generalized pinball support vector machine (GPSVM). Further, the symmetric kernel method was adopted to evaluate the performance of SG-GPSVM as a nonlinear classifier. Our suggested algorithm surpasses existing methods in terms of noise insensitivity, resampling stability, and accuracy for large-scale data scenarios, according to the experimental results.


2021 ◽  
Author(s):  
ANKIT GHOSH ◽  
ALOK KOLE

<p>Smart grid is an essential concept in the transformation of the electricity sector into an intelligent digitalized energy network that can deliver optimal energy from the source to the consumers. Smart grids being self-sufficient systems are constructed through the integration of information, telecommunication, and advanced power technologies with the existing electricity systems. Artificial Intelligence (AI) is an important technology driver in smart grids. The application of AI techniques in smart grid is becoming more apparent because the traditional modelling optimization and control techniques have their own limitations. Machine Learning (ML) being a sub-set of AI enables intelligent decision-making and response to sudden changes in the customer energy demands, unexpected disruption of power supply, sudden variations in renewable energy output or any other catastrophic events in a smart grid. This paper presents the comparison among some of the state-of-the-art ML algorithms for predicting smart grid stability. The dataset that has been selected contains results from simulations of smart grid stability. Enhanced ML algorithms such as Support Vector Machine (SVM), Logistic Regression, K-Nearest Neighbour (KNN), Naïve Bayes (NB), Decision Tree (DT), Random Forest (RF), Stochastic Gradient Descent (SGD) classifier, XGBoost and Gradient Boosting classifiers have been implemented to forecast smart grid stability. A comparative analysis among the different ML models has been performed based on the following evaluation metrics such as accuracy, precision, recall, F1-score, AUC-ROC, and AUC-PR curves. The test results that have been obtained have been quite promising with the XGBoost classifier outperforming all the other models with an accuracy of 97.5%, recall of 98.4%, precision of 97.6%, F1-score of 97.9%, AUC-ROC of 99.8% and AUC-PR of 99.9%. </p>


2021 ◽  
Vol 7 ◽  
pp. e712
Author(s):  
Babacar Gaye ◽  
Dezheng Zhang ◽  
Aziguli Wulamu

The satisfaction of employees is very important for any organization to make sufficient progress in production and to achieve its goals. Organizations try to keep their employees satisfied by making their policies according to employees’ demands which help to create a good environment for the collective. For this reason, it is beneficial for organizations to perform staff satisfaction surveys to be analyzed, allowing them to gauge the levels of satisfaction among employees. Sentiment analysis is an approach that can assist in this regard as it categorizes sentiments of reviews into positive and negative results. In this study, we perform experiments for the world’s big six companies and classify their employees’ reviews based on their sentiments. For this, we proposed an approach using lexicon-based and machine learning based techniques. Firstly, we extracted the sentiments of employees from text reviews and labeled the dataset as positive and negative using TextBlob. Then we proposed a hybrid/voting model named Regression Vector-Stochastic Gradient Descent Classifier (RV-SGDC) for sentiment classification. RV-SGDC is a combination of logistic regression, support vector machines, and stochastic gradient descent. We combined these models under a majority voting criteria. We also used other machine learning models in the performance comparison of RV-SGDC. Further, three feature extraction techniques: term frequency-inverse document frequency (TF-IDF), bag of words, and global vectors are used to train learning models. We evaluated the performance of all models in terms of accuracy, precision, recall, and F1 score. The results revealed that RV-SGDC outperforms with a 0.97 accuracy score using the TF-IDF feature due to its hybrid architecture.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Arvin Hansrajh ◽  
Timothy T. Adeliyi ◽  
Jeanette Wing

The exponential growth in fake news and its inherent threat to democracy, public trust, and justice has escalated the necessity for fake news detection and mitigation. Detecting fake news is a complex challenge as it is intentionally written to mislead and hoodwink. Humans are not good at identifying fake news. The detection of fake news by humans is reported to be at a rate of 54% and an additional 4% is reported in the literature as being speculative. The significance of fighting fake news is exemplified during the present pandemic. Consequently, social networks are ramping up the usage of detection tools and educating the public in recognising fake news. In the literature, it was observed that several machine learning algorithms have been applied to the detection of fake news with limited and mixed success. However, several advanced machine learning models are not being applied, although recent studies are demonstrating the efficacy of the ensemble machine learning approach; hence, the purpose of this study is to assist in the automated detection of fake news. An ensemble approach is adopted to help resolve the identified gap. This study proposed a blended machine learning ensemble model developed from logistic regression, support vector machine, linear discriminant analysis, stochastic gradient descent, and ridge regression, which is then used on a publicly available dataset to predict if a news report is true or not. The proposed model will be appraised with the popular classical machine learning models, while performance metrics such as AUC, ROC, recall, accuracy, precision, and f1-score will be used to measure the performance of the proposed model. Results presented showed that the proposed model outperformed other popular classical machine learning models.


2020 ◽  
Vol 39 (6) ◽  
pp. 8069-8078
Author(s):  
R. Dhanalakshmi ◽  
T. Sri Devi

Cognitive computing is the mirroring of human brain and this is made possible by using natural language processing, pattern recognition and data mining. By mirroring the human brain (Cognitive computing system), helps to solve some of the complicated problems without much of human supervision. In the fast-changing world, the major challenge every organization facing is difficulty in retaining its employees. Employees may leave an organization due to low salary, overwork, lack of opportunities and recognition, work culture, work-life imbalance etc. Better ways to retain employees is to understand their requirements and fulfill them. The proposed employee feedback sentiment analysis system collects the employee feedback reviews from open forums and perform sentiment analysis using Recurrent Neural Network – Long Short-term Memory (RNN-LSTM) algorithm. On performing Sentiment analysis, employee review comments are classified as Positive or Negative. A report is generated and sent to the HR of the organization as webapp or mobile app. The report has total number of positive and negative comments and positive and negative counts with respect to salary, work pressure etc. With the report, the organization can arrive at identifying social sentiments of their brand and may take corrective actions to retain employees which benefits both organization and employees. This paper also captures the performance of various models in training and predicting the employee feedback dataset and the models evaluated are Logistic Regression, Support Vector Machine, Random Forest Classifier, AdaBoost Classifier, Gradient Boosting Classifier, Decision Tree Classifier and Gaussian Naïve Bayes. The classification report and accuracy of each model is captured. The dataset size was gradually increased from 200 to 1000 and accuracy was predicted for each model. It was identified that the accuracy of machine learning algorithms was ranging between 66% to 85%. On training RNN-LSTM algorithm with dataset of size 30 k, the accuracy was 88%. It was identified that Deep learning algorithm RNN-LSTM performs better with huge dataset. Increasing dataset size still increase the performance of RNN-LSTM algorithm in training and prediction. Thus, the objective function of the proposed model to perform sentiment analysis on employee feedback review comments is achieved successfully.


2020 ◽  
Vol 12 (19) ◽  
pp. 3265
Author(s):  
Rei Sonobe ◽  
Hiroto Yamashita ◽  
Harumi Mihara ◽  
Akio Morita ◽  
Takashi Ikka

Japanese horseradish (wasabi) grows in very specific conditions, and recent environmental climate changes have damaged wasabi production. In addition, the optimal culture methods are not well known, and it is becoming increasingly difficult for incipient farmers to cultivate it. Chlorophyll a, b and carotenoid contents, as well as their allocation, could be an adequate indicator in evaluating its production and environmental stress; thus, developing an in situ method to monitor photosynthetic pigments based on reflectance could be useful for agricultural management. Besides original reflectance (OR), five pre-processing techniques, namely, first derivative reflectance (FDR), continuum-removed (CR), de-trending (DT), multiplicative scatter correction (MSC), and standard normal variate transformation (SNV), were compared to assess the accuracy of the estimation. Furthermore, five machine learning algorithms—random forest (RF), support vector machine (SVM), kernel-based extreme learning machine (KELM), Cubist, and Stochastic Gradient Boosting (SGB)—were considered. To classify the samples under different pH or sulphur ion concentration conditions, the end of the red edge bands was effective for OR, FDR, DT, MSC, and SNV, while a green-peak band was effective for CR. Overall, KELM and Cubist showed high performance and incorporating pre-processing techniques was effective for obtaining estimated values with high accuracy. The best combinations were found to be DT–KELM for chl a (RPD = 1.511–5.17, RMSE = 1.23–3.62 μg cm−2) and chl a:b (RPD = 0.73–3.17, RMSE = 0.13–0.60); CR–KELM for chl b (RPD = 1.92–5.06, RMSE = 0.41–1.03 μg cm−2) and chl a:car (RPD = 1.31–3.23, RMSE = 0.26–0.50); SNV–Cubist for car (RPD = 1.63–3.32, RMSE = 0.31–1.89 μg cm−2); and DT–Cubist for chl:car (RPD = 1.53–3.96, RMSE = 0.27–0.74).


Sign in / Sign up

Export Citation Format

Share Document