scholarly journals Classification of Gene Expression Data Set using Support Vectors Machine with RBF Kernel

2019 ◽  
Vol 8 (2) ◽  
pp. 2907-2913

The huge amount of data being generated by different organizations and its underlying advantages in multiple fields like decision making, data security, research purposes have made data classification a very important and mandatory process now-a-days. Data Classification is the process of grouping data of similar characteristics into categories. Classification can be done based on the output we are looking forward to. Hence it is considered very useful. Classifying data allows us to predict the nature of future data-sets and discover useful patterns among them. This project aims at classifying gene data sets. Gene data sets are the information collected from a set of genes put to a specific test. It can be used for medical research purposes; by studying the pattern in the datasets allows us to predict the kind of genes that are more vulnerable to a particular disease there by allowing us to prevent the manifestation of the disease right at its beginning, just as they say, prevention is better than cure. In this paper, such classification is effort using a supervised machine learning algorithm – SVM (Support Vector Machine). There are many algorithms in existence to perform classification but this algorithm has its own lead over the others. It is capable of both classification and regression. It works well with structured, semistructured and unstructured data too. It contains a kernel function which when used appropriately can solve any complex problem. The summary of this project is, taking gene data sets as input and obtaining classified clusters as output.

2021 ◽  
Vol 50 (8) ◽  
pp. 2479-2497
Author(s):  
Buvana M. ◽  
Muthumayil K.

One of the most symptomatic diseases is COVID-19. Early and precise physiological measurement-based prediction of breathing will minimize the risk of COVID-19 by a reasonable distance from anyone; wearing a mask, cleanliness, medication, balanced diet, and if not well stay safe at home. To evaluate the collected datasets of COVID-19 prediction, five machine learning classifiers were used: Nave Bayes, Support Vector Machine (SVM), Logistic Regression, K-Nearest Neighbour (KNN), and Decision Tree. COVID-19 datasets from the Repository were combined and re-examined to remove incomplete entries, and a total of 2500 cases were utilized in this study. Features of fever, body pain, runny nose, difficulty in breathing, shore throat, and nasal congestion, are considered to be the most important differences between patients who have COVID-19s and those who do not. We exhibit the prediction functionality of five machine learning classifiers. A publicly available data set was used to train and assess the model. With an overall accuracy of 99.88 percent, the ensemble model is performed commendably. When compared to the existing methods and studies, the proposed model is performed better. As a result, the model presented is trustworthy and can be used to screen COVID-19 patients timely, efficiently.


Author(s):  
Intisar Shadeed Al-Mejibli ◽  
Jwan K. Alwan ◽  
Dhafar Hamed Abd

Currently, the support vector machine (SVM) regarded as one of supervised machine learning algorithm that provides analysis of data for classification and regression. This technique is implemented in many fields such as bioinformatics, face recognition, text and hypertext categorization, generalized predictive control and many other different areas. The performance of SVM is affected by some parameters, which are used in the training phase, and the settings of parameters can have a profound impact on the resulting engine’s implementation. This paper investigated the SVM performance based on value of gamma parameter with used kernels. It studied the impact of gamma value on (SVM) efficiency classifier using different kernels on various datasets descriptions. SVM classifier has been implemented by using Python. The kernel functions that have been investigated are polynomials, radial based function (RBF) and sigmoid. UC irvine machine learning repository is the source of all the used datasets. Generally, the results show uneven effect on the classification accuracy of three kernels on used datasets. The changing of the gamma value taking on consideration the used dataset influences polynomial and sigmoid kernels. While the performance of RBF kernel function is more stable with different values of gamma as its accuracy is slightly changed.


2007 ◽  
Vol 25 (18_suppl) ◽  
pp. 15107-15107
Author(s):  
R. V. Iyer ◽  
B. Tennant ◽  
M. Ruiz ◽  
T. Szyperski ◽  
D. Trump ◽  
...  

15107 Background: HCC is a common and rapidly fatal cancer. Current screening tools are inadequate for identification of potentially curable cases. Our aim was to determine whether H-NMR can identify HCC compared to controls in the woodchuck (WC) model of hepatitis related HCC. Methods: Eastern WCs were bred and inoculated at birth with dilute sera from WCs that are chronic carriers of Woodchuck Hepatitis B Virus (WHV). This resulted in chronic hepatitis in ∼60% animals and all carriers developed HCC by 24–36 months. Serum from 10 chronic WHV carriers with HCC (group 1), 5 WHV carriers with no HCC (group 2) and 15 matched non-infected controls (group 3) was obtained. 45uL serum was diluted with 5uL of D2O containing 27mM formic acid + 0.9% saline. Spectra were collected on a 600 MHz INOVA spectrometer using a CapNMR flow probe with 10uL flow cell at 298K without knowledge of group assignments. The resulting 1D spectra were processed using Nuts from AcornNMR. Results: Principle component analysis and supervised PLS-DA was performed using Simca P+ from Umetrics. Despite general separation of groups, the Q2 value of this model was relatively low (0.20). We trained a Support Vector Machine (SVM) algorithm, a supervised machine-learning algorithm, to learn to identify the groups. Evaluation of the performance of the algorithm using 10-fold validation on the data set achieved a Kappa value of 0.43. This algorithm learnt to identify HCC [0.765 ROC, 0.8 sensitivity, and 0.727 positive predictive value (PPV)] and controls (0.75 ROC, 0.69 sensitivity and 0.73 PPV) but not the WHV carrier group, likely due to the small numbers. In a second analysis of 10 HCC and 15 controls, PLS-DA showed clear separation using three components (Q2= 0.5). The corresponding SVM model showed a kappa value of 0.52 and ROC values of 0.767 for both classes. Conclusions: Our preliminary results indicate that H-NMR spectra alone can be used to distinguish HCC from healthy controls using the machine-learning algorithm for classification. Further validation in a larger cohort of woodchucks is ongoing and confirmation of these preliminary findings would support investigation of this technique as a screening tool in patients at risk for developing HCC. No significant financial relationships to disclose.


2019 ◽  
Vol 8 (4) ◽  
pp. 8797-8801

In this we explore the effectiveness of language features to identify Twitter messages ' feelings. We assess the utility of existing lexical tools as well as capturing features of informal and innovative language knowledge used in micro blogging. We take a supervised approach to the problem, but to create training data, we use existing hash tags in the Twitter data. We Using three separate Twitter messaging companies in our experiments. We use the hash tagged data set (HASH) for development and training, which we compile from the Edinburgh Twitter corpus, and the emoticon data set (EMOT) from the I Sieve Corporation (ISIEVE) for evaluation. Twitter contains huge amount of data . This data may be of different types such as structured data or unstructured data. So by using this data and Appling pre processing techniques we can be able to read the comments from the users. And also the comments will be classified into three categories. They are positive negative and also the neutral comments.Today they use the processing of natural language, information, and text interpretation to derive and classify text feeling into pos itive, negative, and neutral categories. We can also examine the utility of language features to identify Twitter mess ages ' feelings. In addition, state-of - the-art approaches take into consideration only the tweet to be classified when classifying the feeling; they ignore its context (i.e. related tweets).Since tweets are usually short and more ambiguous, however, it is sometimes not enough to consider only the current tweet for classification of sentiments.Informal and innovative microblogging language. We take a sup ervised approach to the problem, but to create training data, we use existing hashtags in the Twitter data.This paper also contrasts sentiment analysis approaches in evaluating political views using Naïve Bayes supervised machine learning algorithm which performs in better analysis compared to other techniques Paper


2019 ◽  
Vol 23 (1) ◽  
pp. 12-21 ◽  
Author(s):  
Shikha N. Khera ◽  
Divya

Information technology (IT) industry in India has been facing a systemic issue of high attrition in the past few years, resulting in monetary and knowledge-based loses to the companies. The aim of this research is to develop a model to predict employee attrition and provide the organizations opportunities to address any issue and improve retention. Predictive model was developed based on supervised machine learning algorithm, support vector machine (SVM). Archival employee data (consisting of 22 input features) were collected from Human Resource databases of three IT companies in India, including their employment status (response variable) at the time of collection. Accuracy results from the confusion matrix for the SVM model showed that the model has an accuracy of 85 per cent. Also, results show that the model performs better in predicting who will leave the firm as compared to predicting who will not leave the company.


A large volume of datasets is available in various fields that are stored to be somewhere which is called big data. Big Data healthcare has clinical data set of every patient records in huge amount and they are maintained by Electronic Health Records (EHR). More than 80 % of clinical data is the unstructured format and reposit in hundreds of forms. The challenges and demand for data storage, analysis is to handling large datasets in terms of efficiency and scalability. Hadoop Map reduces framework uses big data to store and operate any kinds of data speedily. It is not solely meant for storage system however conjointly a platform for information storage moreover as processing. It is scalable and fault-tolerant to the systems. Also, the prediction of the data sets is handled by machine learning algorithm. This work focuses on the Extreme Machine Learning algorithm (ELM) that can utilize the optimized way of finding a solution to find disease risk prediction by combining ELM with Cuckoo Search optimization-based Support Vector Machine (CS-SVM). The proposed work also considers the scalability and accuracy of big data models, thus the proposed algorithm greatly achieves the computing work and got good results in performance of both veracity and efficiency.


2020 ◽  
Author(s):  
Castro Mayleen Dorcas Bondoc ◽  
Tumibay Gilbert Malawit

Today many schools, universities and institutions recognize the necessity and importance of using Learning Management Systems (LMS) as part of their educational services. This research work has applied LMS in the teaching and learning process of Bulacan State University (BulSU) Graduate School (GS) Program that enhances the face-to-face instruction with online components. The researchers uses an LMS that provides educators a platform that can motivate and engage students to new educational environment through manage online classes. The LMS allows educators to distribute information, manage learning materials, assignments, quizzes, and communications. Aside from the basic functions of the LMS, the researchers uses Machine Learning (ML) Algorithms applying Support Vector Machine (SVM) that will classify and identify the best related videos per topic. SVM is a supervised machine learning algorithm that analyzes data for classification and regression analysis by Maity [1]. The results of this study showed that integration of video tutorials in LMS can significantly contribute knowledge and skills in the learning process of the students.


2021 ◽  
Vol 87 (6) ◽  
pp. 445-455
Author(s):  
Yi Ma ◽  
Zezhong Zheng ◽  
Yutang Ma ◽  
Mingcang Zhu ◽  
Ran Huang ◽  
...  

Many manifold learning algorithms conduct an eigen vector analysis on a data-similarity matrix with a size of N×N, where N is the number of data points. Thus, the memory complexity of the analysis is no less than O(N2). We pres- ent in this article an incremental manifold learning approach to handle large hyperspectral data sets for land use identification. In our method, the number of dimensions for the high-dimensional hyperspectral-image data set is obtained with the training data set. A local curvature varia- tion algorithm is utilized to sample a subset of data points as landmarks. Then a manifold skeleton is identified based on the landmarks. Our method is validated on three AVIRIS hyperspectral data sets, outperforming the comparison algorithms with a k–nearest-neighbor classifier and achieving the second best performance with support vector machine.


GEOMATICA ◽  
2021 ◽  
pp. 1-23
Author(s):  
Roholah Yazdan ◽  
Masood Varshosaz ◽  
Saied Pirasteh ◽  
Fabio Remondino

Automatic detection and recognition of traffic signs from images is an important topic in many applications. At first, we segmented the images using a classification algorithm to delineate the areas where the signs are more likely to be found. In this regard, shadows, objects having similar colours, and extreme illumination changes can significantly affect the segmentation results. We propose a new shape-based algorithm to improve the accuracy of the segmentation. The algorithm works by incorporating the sign geometry to filter out the wrong pixels from the classification results. We performed several tests to compare the performance of our algorithm against those obtained by popular techniques such as Support Vector Machine (SVM), K-Means, and K-Nearest Neighbours. In these tests, to overcome the unwanted illumination effects, the images are transformed into colour spaces Hue, Saturation, and Intensity, YUV, normalized red green blue, and Gaussian. Among the traditional techniques used in this study, the best results were obtained with SVM applied to the images transformed into the Gaussian colour space. The comparison results also suggested that by adding the geometric constraints proposed in this study, the quality of sign image segmentation is improved by 10%–25%. We also comparted the SVM classifier enhanced by incorporating the geometry of signs with a U-Shaped deep learning algorithm. Results suggested the performance of both techniques is very close. Perhaps the deep learning results could be improved if a more comprehensive data set is provided.


Author(s):  
A. B.M. Shawkat Ali

From the beginning, machine learning methodology, which is the origin of artificial intelligence, has been rapidly spreading in the different research communities with successful outcomes. This chapter aims to introduce for system analysers and designers a comparatively new statistical supervised machine learning algorithm called support vector machine (SVM). We explain two useful areas of SVM, that is, classification and regression, with basic mathematical formulation and simple demonstration to make easy the understanding of SVM. Prospects and challenges of future research in this emerging area are also described. Future research of SVM will provide improved and quality access to the users. Therefore, developing an automated SVM system with state-of-the-art technologies is of paramount importance, and hence, this chapter will link up an important step in the system analysis and design perspective to this evolving research arena.


Sign in / Sign up

Export Citation Format

Share Document