Diagnosis Model of Hydrogen Sulfide Poisoning Based on Support Vector Machine

Introduction: Hydrogen sulfide (H2S) is a lethal environmental and industrial poison. The mortality rate of occupational acute H2S poisoning reported in China is 23.1% ~ 50%. Due to the huge amount of information on metabolomics changes after body poisoning, it is important to use intelligent algorithms to mine multivariate interactions. Methods: This paper first uses GC-MS metabolomics to detect changes in the urine components of the poisoned group and control rats to form a metabolic data set, and then uses the SVM classification algorithm in machine learning to train the hydrogen sulfide poisoning training data set to obtain a classification recognition model. A batch of rats (n = 15) was randomly selected and exposed to 20 ppm H2S gas for 40 days (twice morning and evening, 1 hour each exposure) to prepare a chronic H2S rat poisoning model. The other rats (n = 15) were exposed to the same volume of air and 0 ppm hydrogen sulfide gas as the control group. The treated urine samples were tested using a GC-MS. Results: The method locates the optimal parameters of SVM, which improves the accuracy of SVM classification to 100%. This paper uses the information gain attribute evaluation method to screen out the top 6 biomarkers that contribute to the predicted category (Glycerol，β-Hydroxybutyric acid， arabinofuranose，Pentitol，L-Tyrosine，L-Proline). Conclusion: The SVM diagnostic model of hydrogen sulfide poisoning constructed in this work has training time and prediction accuracy; it has achieved excellent results and provided an intelligent decision-making method for the diagnosis of hydrogen sulfide poisoning.

Download Full-text

AUGMENTATIVE AND ALTERNATIVE COMMUNICATION METHOD BASED ON TONGUE CLICKING FOR MUTE DISABILITIES

IIUM Engineering Journal ◽

10.31436/iiumej.v20i1.1021 ◽

2019 ◽

Vol 20 (1) ◽

pp. 119-128

Author(s):

NIK NUR WAHIDAH NIK HASHIM ◽

MUHAMMAD AMIRUL AMIN AZMI ◽

HAZLINA MD. YUSOF

Keyword(s):

Amplitude Modulation ◽

Augmentative And Alternative Communication ◽

Training Data ◽

Support Vector ◽

Classification Rate ◽

Data Set ◽

Zero Crossing ◽

Svm Classification ◽

Development Data ◽

Multiclass Svm

This paper presents a pilot study for a novel application of converting tongue clicking sound to words for people with the inability to speak. 15 features of speech that are related to speech timing patterns, amplitude modulation, zero crossing and peak detection were extracted. The experiments were conducted with three different patterns using binary Support Vector Machine (SVM) classification with 10 recordings as training data and 10 recordings as development data. Peak size outperformed all features with 85% classification rate for pattern P1-P3 whereas multiple features produced 100% classification rate for P1-P2 and P2-P3. A GUI based system was developed to validate the trained classifier. Multiclass SVM were constructed based on the best features obtained from binary SVM classification outcome, namely peak size and skewness amplitude modulation, and then tested on 15 recordings. The GUI based multiclass SVM obtained a satisfying performance of 67% correct classification of the test data set. ABSTRAK: Kertas ini membentangkan panduan kajian kepada aplikasi terkini dalam menukar bunyi klik pada lidah kepada perkataan untuk orang yang mempunyai kehilangan upaya dalam bertutur. 15 ciri khas berkaitan pertuturan adalah pola masa, modulasi nilai tertinggi, tiada titik persilangan dan nilai terpilih yang dikesan. Eksperimen telah dijalankan dengan tiga corak berlainan menggunakan perduaan Mesin Vektor Sokongan (SVM) klasifikasi dengan 10 rakaman sebagai data terlatih dan 10 rakaman sebagai data yang dibina. Saiz tertinggi yang melebihi semua ciri-ciri pada 85% kadar klasifikasi dilihat pada corak P1-P3, sedangkan ciri-ciri pelbagai telah terhasil pada 100% kadar klasifikasi P1-P2 dan P2-P3. Sistem berdasarkan GUI telah dibina bagi menilai ciri terlatih. Kelas pelbagai SVM telah dibina berdasarkan ciri-ciri terbaik dan dihasilkan daripada klasifikasi perduaan SVM, iaitu saiz tertinggi dan modulasi saiz tertinggi tidak linear, dan telah diuji dengan 15 rakaman. Kelas pelbagai SVM yang didapati melalui GUI ini adalah memberangsangkan iaitu 67% klasifikasi adalah tepat pada set data yang diuji.

Download Full-text

Product Sentiment Assessment using Large Scale Cloud System

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.k1296.10812s19 ◽

2019 ◽

Vol 8 (12S) ◽

pp. 1076-1080

Keyword(s):

Feature Selection ◽

Large Scale ◽

Opinion Mining ◽

Information Gain ◽

Training Data ◽

Support Vector ◽

Data Set ◽

Selection For ◽

Support Decision Making ◽

Original Feature

A typical manner in which valuable information can be obtained by means of extracting the sentiment or also the opinion from any message is called sentiment analysis. The sentiment classification exploits the technologies in machine learning owing to their ability to learn from training data set to predict and support decision making with high accuracy level. Some algorithms do not maintain proper scalability for large datasets. Today, there are several disciplines that have the need to deal with some big datasets for involving features in high numbers. The methods of feature selection have been aiming at the elimination of the noisy, the irrelevant or the redundant features that can bring down the performance of classification. Most of the traditional methods lack the scalability to be able to cope with the results within a given time. Here in this work, Term Frequency (TF) that is a method of feature extraction has been used. The focus has been on the selection for the opinion mining by using the Information Gain (IG) based method and compared with the method of. All these methods of feature selection have reduced all the original feature sets by means of removing the features that are irrelevant to enhance the accuracy of classification and bring down the running time of the learning algorithms. The method proposed has been evaluated by means of using the Support Vector Machine (SVM) based classifier. The experimental results have proved that the proposed method had achieved better performance.

Download Full-text

Evaluating Grayware Characteristics and Risks

Journal of Computer Networks and Communications ◽

10.1155/2011/569829 ◽

2011 ◽

Vol 2011 ◽

pp. 1-28 ◽

Cited By ~ 1

Author(s):

Zhongqiang Chen ◽

Zhanyan Liang ◽

Yuan Zhang ◽

Zhongrong Chen

Keyword(s):

Information Gain ◽

Feature Space ◽

Training Data ◽

Support Vector ◽

Learning Models ◽

Generalization Capability ◽

Self Organizing Maps ◽

Defense Strategies ◽

Security Applications ◽

Vector Machines

Grayware encyclopedias collect known species to provide information for incident analysis, however, the lack of categorization and generalization capability renders them ineffective in the development of defense strategies against clustered strains. A grayware categorization framework is therefore proposed here to not only classify grayware according to diverse taxonomic features but also facilitate evaluations on grayware risk to cyberspace. Armed with Support Vector Machines, the framework builds learning models based on training data extracted automatically from grayware encyclopedias and visualizes categorization results with Self-Organizing Maps. The features used in learning models are selected with information gain and the high dimensionality of feature space is reduced by word stemming and stopword removal process. The grayware categorizations on diversified features reveal that grayware typically attempts to improve its penetration rate by resorting to multiple installation mechanisms and reduced code footprints. The framework also shows that grayware evades detection by attacking victims' security applications and resists being removed by enhancing its clotting capability with infected hosts. Our analysis further points out that species in categoriesSpywareandAdwarecontinue to dominate the grayware landscape and impose extremely critical threats to the Internet ecosystem.

Download Full-text

A sentiment analysis system for social media using machine learning techniques: Social enablement

Digital Scholarship in the Humanities ◽

10.1093/llc/fqy037 ◽

2018 ◽

Vol 34 (3) ◽

pp. 569-581 ◽

Cited By ~ 1

Author(s):

Sujata Rani ◽

Parteek Kumar

Keyword(s):

Machine Learning ◽

Social Media ◽

Sentiment Analysis ◽

Media Analysis ◽

Training Data ◽

Machine Learning Techniques ◽

Support Vector ◽

Analysis Tool ◽

Data Set ◽

Learning Techniques

Abstract In this article, an innovative approach to perform the sentiment analysis (SA) has been presented. The proposed system handles the issues of Romanized or abbreviated text and spelling variations in the text to perform the sentiment analysis. The training data set of 3,000 movie reviews and tweets has been manually labeled by native speakers of Hindi in three classes, i.e. positive, negative, and neutral. The system uses WEKA (Waikato Environment for Knowledge Analysis) tool to convert these string data into numerical matrices and applies three machine learning techniques, i.e. Naive Bayes (NB), J48, and support vector machine (SVM). The proposed system has been tested on 100 movie reviews and tweets, and it has been observed that SVM has performed best in comparison to other classifiers, and it has an accuracy of 68% for movie reviews and 82% in case of tweets. The results of the proposed system are very promising and can be used in emerging applications like SA of product reviews and social media analysis. Additionally, the proposed system can be used in other cultural/social benefits like predicting/fighting human riots.

Download Full-text

Improved Instance Selection Methods for Support Vector Machine Speed Optimization

Security and Communication Networks ◽

10.1155/2017/6790975 ◽

2017 ◽

Vol 2017 ◽

pp. 1-11 ◽

Cited By ~ 6

Author(s):

Andronicus A. Akinyelu ◽

Aderemi O. Adewumi

Keyword(s):

Support Vector Machine ◽

Machine Learning Algorithms ◽

Support Vector ◽

Instance Selection ◽

Training Time ◽

Svm Classification ◽

Support Vectors ◽

Increase With Increase ◽

Dataset Size ◽

Machine Speed

Support vector machine (SVM) is one of the top picks in pattern recognition and classification related tasks. It has been used successfully to classify linearly separable and nonlinearly separable data with high accuracy. However, in terms of classification speed, SVMs are outperformed by many machine learning algorithms, especially, when massive datasets are involved. SVM classification speed scales linearly with number of support vectors, and support vectors increase with increase in dataset size. Hence, SVM classification speed can be enormously reduced if it is trained on a reduced dataset. Instance selection techniques are one of the most effective techniques suitable for minimizing SVM training time. In this study, two instance selection techniques suitable for identifying relevant training instances are proposed. The techniques are evaluated on a dataset containing 4000 emails and results obtained compared to other existing techniques. Result reveals excellent improvement in SVM classification speed.

Download Full-text

An Incremental Isomap Method for Hyperspectral Dimensionality Reduction and Classification

Photogrammetric Engineering & Remote Sensing ◽

10.14358/pers.87.7.445 ◽

2021 ◽

Vol 87 (6) ◽

pp. 445-455

Author(s):

Yi Ma ◽

Zezhong Zheng ◽

Yutang Ma ◽

Mingcang Zhu ◽

Ran Huang ◽

...

Keyword(s):

Manifold Learning ◽

Nearest Neighbor ◽

Hyperspectral Image ◽

Hyperspectral Data ◽

Training Data ◽

Support Vector ◽

Data Sets ◽

K Nearest Neighbor ◽

Data Set ◽

Data Points

Many manifold learning algorithms conduct an eigen vector analysis on a data-similarity matrix with a size of N×N, where N is the number of data points. Thus, the memory complexity of the analysis is no less than O(N2). We pres- ent in this article an incremental manifold learning approach to handle large hyperspectral data sets for land use identification. In our method, the number of dimensions for the high-dimensional hyperspectral-image data set is obtained with the training data set. A local curvature varia- tion algorithm is utilized to sample a subset of data points as landmarks. Then a manifold skeleton is identified based on the landmarks. Our method is validated on three AVIRIS hyperspectral data sets, outperforming the comparison algorithms with a k–nearest-neighbor classifier and achieving the second best performance with support vector machine.

Download Full-text

Exploiting Rules to Enhance Machine Learning in Extracting Information From Multi-Institutional Prostate Pathology Reports

JCO Clinical Cancer Informatics ◽

10.1200/cci.20.00028 ◽

2020 ◽

pp. 865-874

Author(s):

Enrico Santus ◽

Tal Schuster ◽

Amir M. Tahmasebi ◽

Clara Li ◽

Adam Yala ◽

...

Keyword(s):

Machine Learning ◽

Hybrid Systems ◽

High Performance ◽

Feature Model ◽

Training Data ◽

Gradient Boosting ◽

Support Vector ◽

Data Set ◽

Extreme Gradient Boosting ◽

Pathology Reports

PURPOSE Literature on clinical note mining has highlighted the superiority of machine learning (ML) over hand-crafted rules. Nevertheless, most studies assume the availability of large training sets, which is rarely the case. For this reason, in the clinical setting, rules are still common. We suggest 2 methods to leverage the knowledge encoded in pre-existing rules to inform ML decisions and obtain high performance, even with scarce annotations. METHODS We collected 501 prostate pathology reports from 6 American hospitals. Reports were split into 2,711 core segments, annotated with 20 attributes describing the histology, grade, extension, and location of tumors. The data set was split by institutions to generate a cross-institutional evaluation setting. We assessed 4 systems, namely a rule-based approach, an ML model, and 2 hybrid systems integrating the previous methods: a Rule as Feature model and a Classifier Confidence model. Several ML algorithms were tested, including logistic regression (LR), support vector machine (SVM), and eXtreme gradient boosting (XGB). RESULTS When training on data from a single institution, LR lags behind the rules by 3.5% (F1 score: 92.2% v 95.7%). Hybrid models, instead, obtain competitive results, with Classifier Confidence outperforming the rules by +0.5% (96.2%). When a larger amount of data from multiple institutions is used, LR improves by +1.5% over the rules (97.2%), whereas hybrid systems obtain +2.2% for Rule as Feature (97.7%) and +2.6% for Classifier Confidence (98.3%). Replacing LR with SVM or XGB yielded similar performance gains. CONCLUSION We developed methods to use pre-existing handcrafted rules to inform ML algorithms. These hybrid systems obtain better performance than either rules or ML models alone, even when training data are limited.

Download Full-text

Fast Linear Adaptive Skipping Training Algorithm for Training Artificial Neural Network

Mathematical Problems in Engineering ◽

10.1155/2013/346949 ◽

2013 ◽

Vol 2013 ◽

pp. 1-9 ◽

Cited By ~ 1

Author(s):

R. Manjula Devi ◽

S. Kuppuswami ◽

R. C. Suganthe

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Training Model ◽

Training Data ◽

Experimental Result ◽

Training Algorithms ◽

Data Set ◽

Training Time ◽

Input Sample ◽

Artificial Neural

Artificial neural network has been extensively consumed training model for solving pattern recognition tasks. However, training a very huge training data set using complex neural network necessitates excessively high training time. In this correspondence, a new fast Linear Adaptive Skipping Training (LAST) algorithm for training artificial neural network (ANN) is instituted. The core essence of this paper is to ameliorate the training speed of ANN by exhibiting only the input samples that do not categorize perfectly in the previous epoch which dynamically reducing the number of input samples exhibited to the network at every single epoch without affecting the network’s accuracy. Thus decreasing the size of the training set can reduce the training time, thereby ameliorating the training speed. This LAST algorithm also determines how many epochs the particular input sample has to skip depending upon the successful classification of that input sample. This LAST algorithm can be incorporated into any supervised training algorithms. Experimental result shows that the training speed attained by LAST algorithm is preferably higher than that of other conventional training algorithms.

Download Full-text

Identification of Leukemia Subtypes from Microscopic Images Using Convolutional Neural Network

Diagnostics ◽

10.3390/diagnostics9030104 ◽

2019 ◽

Vol 9 (3) ◽

pp. 104 ◽

Cited By ~ 11

Author(s):

Ahmed ◽

Yigit ◽

Isik ◽

Alpkocak

Keyword(s):

Machine Learning ◽

Data Augmentation ◽

Nearest Neighbor ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Training Data ◽

Support Vector ◽

K Nearest Neighbor ◽

Data Set ◽

Leukemia Data

Leukemia is a fatal cancer and has two main types: Acute and chronic. Each type has two more subtypes: Lymphoid and myeloid. Hence, in total, there are four subtypes of leukemia. This study proposes a new approach for diagnosis of all subtypes of leukemia from microscopic blood cell images using convolutional neural networks (CNN), which requires a large training data set. Therefore, we also investigated the effects of data augmentation for an increasing number of training samples synthetically. We used two publicly available leukemia data sources: ALL-IDB and ASH Image Bank. Next, we applied seven different image transformation techniques as data augmentation. We designed a CNN architecture capable of recognizing all subtypes of leukemia. Besides, we also explored other well-known machine learning algorithms such as naive Bayes, support vector machine, k-nearest neighbor, and decision tree. To evaluate our approach, we set up a set of experiments and used 5-fold cross-validation. The results we obtained from experiments showed that our CNN model performance has 88.25% and 81.74% accuracy, in leukemia versus healthy and multiclass classification of all subtypes, respectively. Finally, we also showed that the CNN model has a better performance than other wellknown machine learning algorithms.

Download Full-text

Detection and Characterization of Physical Activity and Psychological Stress from Wristband Data

Signals ◽

10.3390/signals1020011 ◽

2020 ◽

Vol 1 (2) ◽

pp. 188-208

Author(s):

Mert Sevil ◽

Mudassir Rashid ◽

Mohammad Reza Askari ◽

Zacharie Maloney ◽

Iman Hajizadeh ◽

...

Keyword(s):

Physical Activity ◽

Signal Processing ◽

Feature Extraction ◽

Psychological Stress ◽

Training Data ◽

Support Vector ◽

K Nearest Neighbor ◽

Data Set ◽

Linear Discriminant ◽

Physiological Variables

Wearable devices continuously measure multiple physiological variables to inform users of health and behavior indicators. The computed health indicators must rely on informative signals obtained by processing the raw physiological variables with powerful noise- and artifacts-filtering algorithms. In this study, we aimed to elucidate the effects of signal processing techniques on the accuracy of detecting and discriminating physical activity (PA) and acute psychological stress (APS) using physiological measurements (blood volume pulse, heart rate, skin temperature, galvanic skin response, and accelerometer) collected from a wristband. Data from 207 experiments involving 24 subjects were used to develop signal processing, feature extraction, and machine learning (ML) algorithms that can detect and discriminate PA and APS when they occur individually or concurrently, classify different types of PA and APS, and estimate energy expenditure (EE). Training data were used to generate feature variables from the physiological variables and develop ML models (naïve Bayes, decision tree, k-nearest neighbor, linear discriminant, ensemble learning, and support vector machine). Results from an independent labeled testing data set demonstrate that PA was detected and classified with an accuracy of 99.3%, and APS was detected and classified with an accuracy of 92.7%, whereas the simultaneous occurrences of both PA and APS were detected and classified with an accuracy of 89.9% (relative to actual class labels), and EE was estimated with a low mean absolute error of 0.02 metabolic equivalent of task (MET).The data filtering and adaptive noise cancellation techniques used to mitigate the effects of noise and artifacts on the classification results increased the detection and discrimination accuracy by 0.7% and 3.0% for PA and APS, respectively, and by 18% for EE estimation. The results demonstrate the physiological measurements from wristband devices are susceptible to noise and artifacts, and elucidate the effects of signal processing and feature extraction on the accuracy of detection, classification, and estimation of PA and APS.

Download Full-text