scholarly journals Machine Learning Methods for Classification Textual Information

Author(s):  
Николай Кривошеев ◽  
Nikolay Krivosheev ◽  
Владимир Спицын ◽  
Vladimir Spicyn

A method for classifying textual information based on the apparatus of convolutional neural networks is considered. The text preprocessing algorithm is presented. Text preprocessing consists of: lemmatizing words, removing stop words, processing text characters, etc. The word-by-word conversion of the text into dense vectors is performed. Testing is carried out on the basis of the text data of "The 20 Newsgroups". This sample contains a collection of approximately 20,000 news stories in English, which is divided (approximately) evenly between 20 different categories. The accuracy of the best convolutional neural network used in this work on the test set was ~ 74%. The topology of the best neural network is given. The accuracy of voting of neural networks by the Bagging algorithm was ~ 81.5%. Based on a review of similar solutions, a comparison is made with the following text classification algorithms: the support vector method (SVM, 82.84%), the naive Bayes classifier (81%), the k nearest neighbors algorithm (75.93%), and the word bag.

Author(s):  
Hani Bani-Salameh ◽  
Shadi M. Alkhatib ◽  
Moawyiah Abdalla ◽  
Mo’taz Al-Hami ◽  
Ruaa Banat ◽  
...  

Background: Diabetes and hypertension are two of the commonest diseases in the world. As they unfavorably affect people of different age groups, they have become a cause of concern and must be predicted and diagnosed well in advance. Objective: This research aims to determine the effectiveness of artificial neural networks (ANNs) in predicting diabetes and blood pressure diseases and to point out the factors which have a high impact on these diseases. Sample: This work used two online datasets which consist of data collected from 768 individuals. We applied neural network algorithms to predict if the individuals have those two diseases based on some factors. Diabetes prediction is based on five factors: age, weight, fat-ratio, glucose, and insulin, while blood pressure prediction is based on six factors: age, weight, fat-ratio, blood pressure, alcohol, and smoking. Method: A model based on the Multi-Layer Perceptron Neural Network (MLP) was implemented. The inputs of the network were the factors for each disease, while the output was the prediction of the disease’s occurrence. The model performance was compared with other classifiers such as Support Vector Machine (SVM) and K-Nearest Neighbors (KNN). We used performance metrics measures to assess the accuracy and performance of MLP. Also, a tool was implemented to help diagnose the diseases and to understand the results. Result: The model predicted the two diseases with correct classification rate (CCR) of 77.6% for diabetes and 68.7% for hypertension. The results indicate that MLP correctly predicts the probability of being diseased or not, and the performance can be significantly increased compared with both SVM and KNN. This shows MLPs effectiveness in early disease prediction.


Water ◽  
2018 ◽  
Vol 10 (11) ◽  
pp. 1676 ◽  
Author(s):  
Zahra Alizadeh ◽  
Jafar Yazdi ◽  
Joong Kim ◽  
Abobakr Al-Shamiri

Monthly flow predictions provide an essential basis for efficient decision-making regarding water resource allocation. In this paper, the performance of different popular data-driven models for monthly flow prediction is assessed to detect the appropriate model. The considered methods include feedforward neural networks (FFNNs), time delay neural networks (TDNNs), radial basis neural networks (RBFNNs), recurrent neural network (RNN), a grasshopper optimization algorithm (GOA)-based support vector machine (SVM) and K-nearest neighbors (KNN) model. For this purpose, the performance of each model is evaluated in terms of several residual metrics using a monthly flow time series for two real case studies with different flow regimes. The results show that the KNN outperforms the different neural network configurations for the first case study, whereas RBFNN model has better performance for the second case study in terms of the correlation coefficient. According to the accuracy of the results, in the first case study with more input features, the KNN model is recommended for short-term predictions and for the second case with a smaller number of input features, but more training observations, the RBFNN model is suitable.


2021 ◽  
pp. 107754632110131
Author(s):  
Somaye Mohammadi ◽  
Abdolreza Ohadi ◽  
Mostafa Irannejad-Parizi

Promoting safe tires with low external rolling noise increases the environmental efficiency of road transport. Although tire builders have been striving to reduce emitted noise, the issue’s sophisticated nature has made it difficult. This article aims to make the problem straightforward, relying on recent significant improvements in statistical science. In this regard, the prediction ability of new methods in this field, including support vector machine, relevance vector machine, and convolutional neural network, along with the new architecture of the neural network is compared. Tire noise is measured under the coast-by condition. Two training strategies are proposed: extracting features from a tread pattern image and directly importing an image to the model. The relevance vector method, which is trained using the first strategy, has provided the most accurate results with an error of 0.62 dB(A) in predicting the total noise level. This precise model is used instead of experimentation to analyze the sensitivity of tire noise to its parameters using a small central composite design. The parametric study reveals striking tips for reducing noise, especially in terms of interactions between parameters that have not previously been shown. Finally, a novel two-stage approach for reducing noise by tread pattern optimization is proposed, inspired by two regression models derived from statistical investigation and variance analysis. Changes in tread pattern specifications of two case studies and their randomization have resulted in a reduction of 3.2 dB(A) for a high-noise tire and 0.4 dB(A) decrement for a quieter tire.


Biomolecules ◽  
2021 ◽  
Vol 11 (4) ◽  
pp. 500
Author(s):  
László Keresztes ◽  
Evelin Szögi ◽  
Bálint Varga ◽  
Viktor Farkas ◽  
András Perczel ◽  
...  

The amyloid state of proteins is widely studied with relevance to neurology, biochemistry, and biotechnology. In contrast with nearly amorphous aggregation, the amyloid state has a well-defined structure, consisting of parallel and antiparallel β-sheets in a periodically repeated formation. The understanding of the amyloid state is growing with the development of novel molecular imaging tools, like cryogenic electron microscopy. Sequence-based amyloid predictors were developed, mainly using artificial neural networks (ANNs) as the underlying computational technique. From a good neural-network-based predictor, it is a very difficult task to identify the attributes of the input amino acid sequence, which imply the decision of the network. Here, we present a linear Support Vector Machine (SVM)-based predictor for hexapeptides with correctness higher than 84%, i.e., it is at least as good as the best published ANN-based tools. Unlike artificial neural networks, the decisions of the linear SVMs are much easier to analyze and, from a good predictor, we can infer rich biochemical knowledge. In the Budapest Amyloid Predictor webserver the user needs to input a hexapeptide, and the server outputs a prediction for the input plus the 6 × 19 = 114 distance-1 neighbors of the input hexapeptide.


Author(s):  
GERALDO BRAZ JUNIOR ◽  
LEONARDO DE OLIVEIRA MARTINS ◽  
ARISTÓFANES CORREA SILVA ◽  
ANSELMO CARDOSO PAIVA

Female breast cancer is a major cause of deaths in occidental countries. Computer-aided Detection (CAD) systems can aid radiologists to increase diagnostic accuracy. In this work, we present a comparison between two classifiers applied to the separation of normal and abnormal breast tissues from mammograms. The purpose of the comparison is to select the best prediction technique to be part of a CAD system. Each region of interest is classified through a Support Vector Machine (SVM) and a Bayesian Neural Network (BNN) as normal or abnormal region. SVM is a machine-learning method, based on the principle of structural risk minimization, which shows good performance when applied to data outside the training set. A Bayesian Neural Network is a classifier that joins traditional neural networks theory and Bayesian inference. We use a set of measures obtained by the application of the semivariogram, semimadogram, covariogram, and correlogram functions to the characterization of breast tissue as normal or abnormal. The results show that SVM presents best performance for the classification of breast tissues in mammographic images. The tests indicate that SVM has more generalization power than the BNN classifier. BNN has a sensibility of 76.19% and a specificity of 79.31%, while SVM presents a sensibility of 74.07% and a specificity of 98.77%. The accuracy rate for tests is 78.70% and 92.59% for BNN and SVM, respectively.


Sensors ◽  
2020 ◽  
Vol 20 (22) ◽  
pp. 6491
Author(s):  
Le Zhang ◽  
Jeyan Thiyagalingam ◽  
Anke Xue ◽  
Shuwen Xu

Classification of clutter, especially in the context of shore based radars, plays a crucial role in several applications. However, the task of distinguishing and classifying the sea clutter from land clutter has been historically performed using clutter models and/or coastal maps. In this paper, we propose two machine learning, particularly neural network, based approaches for sea-land clutter separation, namely the regularized randomized neural network (RRNN) and the kernel ridge regression neural network (KRR). We use a number of features, such as energy variation, discrete signal amplitude change frequency, autocorrelation performance, and other statistical characteristics of the respective clutter distributions, to improve the performance of the classification. Our evaluation based on a unique mixed dataset, which is comprised of partially synthetic clutter data for land and real clutter data from sea, offers improved classification accuracy. More specifically, the RRNN and KRR methods offer 98.50% and 98.75% accuracy, outperforming the conventional support vector machine and extreme learning based solutions.


2020 ◽  
Vol 34 (29) ◽  
pp. 2050326
Author(s):  
Ning Cao ◽  
Jianjun Wang

The realization of exploratory innovation is a complex and nonlinear evolutionary problem. Existing works point out that it is closely related with knowledge governance and boundary-spanning search. However, the intricate relationship among them still lacks exact quantitative explanations. Motivated by this, using four machine learning methods, namely, linear regression (LR), neural network (NN), support vector machine (SVM) and k-nearest neighbors (KNN), we explore how boundary-spanning search combined with knowledge governance influences innovation. Results show that SVM has the highest values of both stability and goodness of fitting. The SVM results show that the combination of low knowledge governance and high boundary-spanning search boosts innovation most efficiently, while high knowledge governance combined with low boundary-spanning search caused the most detrimental effect on innovation. Our results reveal enhancing boundary-spanning search is essential and beneficial to innovation.


2019 ◽  
Vol 9 (11) ◽  
pp. 2347 ◽  
Author(s):  
Hannah Kim ◽  
Young-Seob Jeong

As the number of textual data is exponentially increasing, it becomes more important to develop models to analyze the text data automatically. The texts may contain various labels such as gender, age, country, sentiment, and so forth. Using such labels may bring benefits to some industrial fields, so many studies of text classification have appeared. Recently, the Convolutional Neural Network (CNN) has been adopted for the task of text classification and has shown quite successful results. In this paper, we propose convolutional neural networks for the task of sentiment classification. Through experiments with three well-known datasets, we show that employing consecutive convolutional layers is effective for relatively longer texts, and our networks are better than other state-of-the-art deep learning models.


Author(s):  
Bong-Hyun Kim ◽  
Kijin Yu ◽  
Peter C W Lee

Abstract Motivation Cancer classification based on gene expression profiles has provided insight on the causes of cancer and cancer treatment. Recently, machine learning-based approaches have been attempted in downstream cancer analysis to address the large differences in gene expression values, as determined by single-cell RNA sequencing (scRNA-seq). Results We designed cancer classifiers that can identify 21 types of cancers and normal tissues based on bulk RNA-seq as well as scRNA-seq data. Training was performed with 7398 cancer samples and 640 normal samples from 21 tumors and normal tissues in TCGA based on the 300 most significant genes expressed in each cancer. Then, we compared neural network (NN), support vector machine (SVM), k-nearest neighbors (kNN) and random forest (RF) methods. The NN performed consistently better than other methods. We further applied our approach to scRNA-seq transformed by kNN smoothing and found that our model successfully classified cancer types and normal samples. Availability and implementation Cancer classification by neural network. Supplementary information Supplementary data are available at Bioinformatics online.


Materials ◽  
2020 ◽  
Vol 13 (17) ◽  
pp. 3766 ◽  
Author(s):  
Shin-Hyung Song

In this research, hot deformation experiments of 316L stainless steel were carried out at a temperature range of 800–1000 °C and strain rate of 2 × 10−3–2 × 10−1. The flow stress behavior of 316L stainless steel was found to be highly dependent on the strain rate and temperature. After the experimental study, the flow stress was modeled using the Arrhenius-type constitutive equation, a neural network approach, and the support vector regression algorithm. The present research mainly focused on a comparative study of three algorithms for modeling the characteristics of hot deformation. The results indicated that the neural network approach and the support vector regression algorithm could be used to model the flow stress better than the approach of the Arrhenius-type equation. The modeling efficiency of the support vector regression algorithm was also found to be more efficient than the algorithm for neural networks.


Sign in / Sign up

Export Citation Format

Share Document