Twitter Sentiment Recognition using Support Vector Machine

In this we explore the effectiveness of language features to identify Twitter messages ' feelings. We assess the utility of existing lexical tools as well as capturing features of informal and innovative language knowledge used in micro blogging. We take a supervised approach to the problem, but to create training data, we use existing hash tags in the Twitter data. We Using three separate Twitter messaging companies in our experiments. We use the hash tagged data set (HASH) for development and training, which we compile from the Edinburgh Twitter corpus, and the emoticon data set (EMOT) from the I Sieve Corporation (ISIEVE) for evaluation. Twitter contains huge amount of data . This data may be of different types such as structured data or unstructured data. So by using this data and Appling pre processing techniques we can be able to read the comments from the users. And also the comments will be classified into three categories. They are positive negative and also the neutral comments.Today they use the processing of natural language, information, and text interpretation to derive and classify text feeling into pos itive, negative, and neutral categories. We can also examine the utility of language features to identify Twitter mess ages ' feelings. In addition, state-of - the-art approaches take into consideration only the tweet to be classified when classifying the feeling; they ignore its context (i.e. related tweets).Since tweets are usually short and more ambiguous, however, it is sometimes not enough to consider only the current tweet for classification of sentiments.Informal and innovative microblogging language. We take a sup ervised approach to the problem, but to create training data, we use existing hashtags in the Twitter data.This paper also contrasts sentiment analysis approaches in evaluating political views using Naïve Bayes supervised machine learning algorithm which performs in better analysis compared to other techniques Paper

Download Full-text

Lost in Space: Geolocation in Event Data

Political Science Research and Methods ◽

10.1017/psrm.2018.23 ◽

2018 ◽

Vol 7 (04) ◽

pp. 871-888 ◽

Cited By ~ 6

Author(s):

Sophie J. Lee ◽

Howard Liu ◽

Michael D. Ward

Keyword(s):

Learning Algorithm ◽

Text Processing ◽

Contextual Information ◽

Training Data ◽

Supervised Machine Learning ◽

Model Parameters ◽

Event Data ◽

Data Set ◽

N Gram ◽

Automated Text Processing

Improving geolocation accuracy in text data has long been a goal of automated text processing. We depart from the conventional method and introduce a two-stage supervised machine-learning algorithm that evaluates each location mention to be either correct or incorrect. We extract contextual information from texts, i.e., N-gram patterns for location words, mention frequency, and the context of sentences containing location words. We then estimate model parameters using a training data set and use this model to predict whether a location word in the test data set accurately represents the location of an event. We demonstrate these steps by constructing customized geolocation event data at the subnational level using news articles collected from around the world. The results show that the proposed algorithm outperforms existing geocoders even in a case added post hoc to test the generality of the developed algorithm.

Download Full-text

Extraction of Sea Ice Cover by Sentinel-1 SAR Based on SVM with Unsupervised Generation of Training Data

10.20944/preprints202005.0336.v1 ◽

2020 ◽

Author(s):

Xiaoming Li ◽

Yan Sun ◽

Qiang Zhang

Keyword(s):

Machine Learning ◽

Sea Ice ◽

Learning Algorithm ◽

Texture Features ◽

Open Water ◽

Ice Cover ◽

Training Data ◽

Support Vector ◽

Training Samples

In this paper, we focus on developing a novel method to extract sea ice cover (i.e., discrimination/classification of sea ice and open water) using Sentinel-1 (S1) cross-polarization (vertical-horizontal, VH or horizontal-vertical, HV) data in extra wide (EW) swath mode based on the machine learning algorithm support vector machine (SVM). The classification basis includes the S1 radar backscatter coefficients and texture features that are calculated from S1 data using the gray level co-occurrence matrix (GLCM). Different from previous methods where appropriate samples are manually selected to train the SVM to classify sea ice and open water, we proposed a method of unsupervised generation of the training samples based on two GLCM texture features, i.e. entropy and homogeneity, that have contrasting characteristics on sea ice and open water. We eliminate the most uncertainty of selecting training samples in machine learning and achieve automatic classification of sea ice and open water by using S1 EW data. The comparison shows good agreement between the SAR-derived sea ice cover using the proposed method and a visual inspection, of which the accuracy reaches approximately 90% - 95% based on a few cases. Besides this, compared with the analyzed sea ice cover data Ice Mapping System (IMS) based on 728 S1 EW images, the accuracy of extracted sea ice cover by using S1 data is more than 80%.

Download Full-text

Application of Support Vector Machine in Determination of Real Estate Price

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.461.818 ◽

2012 ◽

Vol 461 ◽

pp. 818-821

Author(s):

Shi Hu Zhang

Keyword(s):

Support Vector Machine ◽

Real Estate ◽

Learning Algorithm ◽

Predictive Ability ◽

Training Data ◽

Small Samples ◽

Support Vector ◽

Data Set ◽

Real Estate Price

The problem of real estate prices are the current focus of the community's concern. Support Vector Machine is a new machine learning algorithm, as its excellent performance of the study, and in small samples to identify many ways, and so has its unique advantages, is now used in many areas. Determination of real estate price is a complicated problem due to its non-linearity and the small quantity of training data. In this study, support vector machine (SVM) is proposed to forecast the price of real estate price in China. The experimental results indicate that the SVM method can achieve greater accuracy than grey model, artificial neural network under the circumstance of small training data. It was also found that the predictive ability of the SVM outperformed those of some traditional pattern recognition methods for the data set used here.

Download Full-text

Prediction of COVID-19 Patient using Supervised Machine Learning Algorithm

Sains Malaysiana ◽

10.17576/jsm-2021-5008-28 ◽

2021 ◽

Vol 50 (8) ◽

pp. 2479-2497

Author(s):

Buvana M. ◽

Muthumayil K.

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Nasal Congestion ◽

Supervised Machine Learning ◽

Support Vector ◽

Data Set ◽

Physiological Measurement ◽

Machine Learning Classifiers ◽

Balanced Diet ◽

Learning Classifiers

One of the most symptomatic diseases is COVID-19. Early and precise physiological measurement-based prediction of breathing will minimize the risk of COVID-19 by a reasonable distance from anyone; wearing a mask, cleanliness, medication, balanced diet, and if not well stay safe at home. To evaluate the collected datasets of COVID-19 prediction, five machine learning classifiers were used: Nave Bayes, Support Vector Machine (SVM), Logistic Regression, K-Nearest Neighbour (KNN), and Decision Tree. COVID-19 datasets from the Repository were combined and re-examined to remove incomplete entries, and a total of 2500 cases were utilized in this study. Features of fever, body pain, runny nose, difficulty in breathing, shore throat, and nasal congestion, are considered to be the most important differences between patients who have COVID-19s and those who do not. We exhibit the prediction functionality of five machine learning classifiers. A publicly available data set was used to train and assess the model. With an overall accuracy of 99.88 percent, the ensemble model is performed commendably. When compared to the existing methods and studies, the proposed model is performed better. As a result, the model presented is trustworthy and can be used to screen COVID-19 patients timely, efficiently.

Download Full-text

Earthquake Prediction using Machine Learning Algorithm

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.e9110.018620 ◽

2020 ◽

Vol 8 (6) ◽

pp. 4684-4688

Keyword(s):

Machine Learning ◽

Structural Damage ◽

Data Science ◽

Learning Algorithm ◽

Economic Loss ◽

Machine Learning Algorithms ◽

Training Data ◽

Support Vector ◽

Science Data ◽

Data Set

Per the statistics received from BBC, data varies for every earthquake occurred till date. Approximately, up to thousands are dead, about 50,000 are injured, around 1-3 Million are dislocated, while a significant amount go missing and homeless. Almost 100% structural damage is experienced. It also affects the economic loss, varying from 10 to 16 million dollars. A magnitude corresponding to 5 and above is classified as deadliest. The most life-threatening earthquake occurred till date took place in Indonesia where about 3 million were dead, 1-2 million were injured and the structural damage accounted to 100%. Hence, the consequences of earthquake are devastating and are not limited to loss and damage of living as well as nonliving, but it also causes significant amount of change-from surrounding and lifestyle to economic. Every such parameter desiderates into forecasting earthquake. A couple of minutes’ notice and individuals can act to shield themselves from damage and demise; can decrease harm and monetary misfortunes, and property, characteristic assets can be secured. In current scenario, an accurate forecaster is designed and developed, a system that will forecast the catastrophe. It focuses on detecting early signs of earthquake by using machine learning algorithms. System is entitled to basic steps of developing learning systems along with life cycle of data science. Data-sets for Indian sub-continental along with rest of the World are collected from government sources. Pre-processing of data is followed by construction of stacking model that combines Random Forest and Support Vector Machine Algorithms. Algorithms develop this mathematical model reliant on “training data-set”. Model looks for pattern that leads to catastrophe and adapt to it in its building, so as to settle on choices and forecasts without being expressly customized to play out the task. After forecast, we broadcast the message to government officials and across various platforms. The focus of information to obtain is keenly represented by the 3 factors – Time, Locality and Magnitude.

Download Full-text

Classification of Gene Expression Data Set using Support Vectors Machine with RBF Kernel

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b2463.078219 ◽

2019 ◽

Vol 8 (2) ◽

pp. 2907-2913

Keyword(s):

Learning Algorithm ◽

Data Classification ◽

Complex Problem ◽

Supervised Machine Learning ◽

Support Vector ◽

Data Sets ◽

Specific Test ◽

Data Set ◽

Rbf Kernel ◽

Gene Data

The huge amount of data being generated by different organizations and its underlying advantages in multiple fields like decision making, data security, research purposes have made data classification a very important and mandatory process now-a-days. Data Classification is the process of grouping data of similar characteristics into categories. Classification can be done based on the output we are looking forward to. Hence it is considered very useful. Classifying data allows us to predict the nature of future data-sets and discover useful patterns among them. This project aims at classifying gene data sets. Gene data sets are the information collected from a set of genes put to a specific test. It can be used for medical research purposes; by studying the pattern in the datasets allows us to predict the kind of genes that are more vulnerable to a particular disease there by allowing us to prevent the manifestation of the disease right at its beginning, just as they say, prevention is better than cure. In this paper, such classification is effort using a supervised machine learning algorithm – SVM (Support Vector Machine). There are many algorithms in existence to perform classification but this algorithm has its own lead over the others. It is capable of both classification and regression. It works well with structured, semistructured and unstructured data too. It contains a kernel function which when used appropriately can solve any complex problem. The summary of this project is, taking gene data sets as input and obtaining classified clusters as output.

Download Full-text

Weed Detection and Classification using ICA Based SVM Classifier

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.c5410.018520 ◽

2020 ◽

Vol 8 (5) ◽

pp. 1557-1560

Keyword(s):

Support Vector Machine ◽

Classification Accuracy ◽

Learning Algorithm ◽

Feature Weighting ◽

Training Data ◽

Support Vector ◽

Svm Classifier ◽

Classification Problems ◽

Weed Detection ◽

Data Set

Support vector machine (SVM) is a commonly known efficient supervised learning algorithm for classification problems. However, the classification accuracy of the SVM classifier depends on its training parameters and the training data set as well. The main objective of this paper is to optimize its parameters and feature weighting in order to improve the strength of the SVM simultaneously. In this paper, the Imperialist Competitive Algorithm based Support Vector Machine (ICA-SVM) classifier is proposed to classify the efficient weed detection. This enhanced ICA-SVM classifier is able to select the appropriate input features and to optimize the parameters of SVM and is improving the classification accuracy. Experimental results show that the ICA-SVM classification algorithm reduces the computational complexity tremendously and improves classification Accuracy.

Download Full-text

Can 1H-Nuclear magnetic resonance (NMR) be used for early detection of hepatocellular cancer (HCC)?

Journal of Clinical Oncology ◽

10.1200/jco.2007.25.18_suppl.15107 ◽

2007 ◽

Vol 25 (18_suppl) ◽

pp. 15107-15107

Author(s):

R. V. Iyer ◽

B. Tennant ◽

M. Ruiz ◽

T. Szyperski ◽

D. Trump ◽

...

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Hepatocellular Cancer ◽

Screening Tools ◽

Supervised Machine Learning ◽

Support Vector ◽

Machine Learning Algorithm ◽

Data Set ◽

H Nmr ◽

Kappa Value

15107 Background: HCC is a common and rapidly fatal cancer. Current screening tools are inadequate for identification of potentially curable cases. Our aim was to determine whether H-NMR can identify HCC compared to controls in the woodchuck (WC) model of hepatitis related HCC. Methods: Eastern WCs were bred and inoculated at birth with dilute sera from WCs that are chronic carriers of Woodchuck Hepatitis B Virus (WHV). This resulted in chronic hepatitis in ∼60% animals and all carriers developed HCC by 24–36 months. Serum from 10 chronic WHV carriers with HCC (group 1), 5 WHV carriers with no HCC (group 2) and 15 matched non-infected controls (group 3) was obtained. 45uL serum was diluted with 5uL of D2O containing 27mM formic acid + 0.9% saline. Spectra were collected on a 600 MHz INOVA spectrometer using a CapNMR flow probe with 10uL flow cell at 298K without knowledge of group assignments. The resulting 1D spectra were processed using Nuts from AcornNMR. Results: Principle component analysis and supervised PLS-DA was performed using Simca P+ from Umetrics. Despite general separation of groups, the Q2 value of this model was relatively low (0.20). We trained a Support Vector Machine (SVM) algorithm, a supervised machine-learning algorithm, to learn to identify the groups. Evaluation of the performance of the algorithm using 10-fold validation on the data set achieved a Kappa value of 0.43. This algorithm learnt to identify HCC [0.765 ROC, 0.8 sensitivity, and 0.727 positive predictive value (PPV)] and controls (0.75 ROC, 0.69 sensitivity and 0.73 PPV) but not the WHV carrier group, likely due to the small numbers. In a second analysis of 10 HCC and 15 controls, PLS-DA showed clear separation using three components (Q2= 0.5). The corresponding SVM model showed a kappa value of 0.52 and ROC values of 0.767 for both classes. Conclusions: Our preliminary results indicate that H-NMR spectra alone can be used to distinguish HCC from healthy controls using the machine-learning algorithm for classification. Further validation in a larger cohort of woodchucks is ongoing and confirmation of these preliminary findings would support investigation of this technique as a screening tool in patients at risk for developing HCC. No significant financial relationships to disclose.

Download Full-text

Probabilistic cosmic web classification using fast-generated training data

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/staa2008 ◽

2020 ◽

Vol 497 (4) ◽

pp. 5041-5060

Author(s):

Brandon Buncher ◽

Matias Carrasco Kind

Keyword(s):

Learning Algorithm ◽

Local Density ◽

Density Field ◽

Training Data ◽

Three Dimensions ◽

Supervised Machine Learning ◽

Data Generation ◽

Data Set ◽

Field Magnitude ◽

Web Classification

ABSTRACT We present a novel method of robust probabilistic cosmic web particle classification in three dimensions using a supervised machine learning algorithm. Training data were generated using a simplified ΛCDM toy model with pre-determined algorithms for generating haloes, filaments, and voids. While this framework is not constrained by physical modelling, it can be generated substantially more quickly than an N-body simulation without loss in classification accuracy. For each particle in this data set, measurements were taken of the local density field magnitude and directionality. These measurements were used to train a random forest algorithm, which was used to assign class probabilities to each particle in a ΛCDM, dark matter-only N-body simulation with 2563 particles, as well as on another toy model data set. By comparing the trends in the ROC curves and other statistical metrics of the classes assigned to particles in each data set using different feature sets, we demonstrate that the combination of measurements of the local density field magnitude and directionality enables accurate and consistent classification of halo, filament, and void particles in varied environments. We also show that this combination of training features ensures that the construction of our toy model does not affect classification. The use of a fully supervised algorithm allows greater control over the information deemed important for classification, preventing issues arising from arbitrary hyperparameters and mode collapse in deep learning models. Due to the speed of training data generation, our method is highly scalable, making it particularly suited for classifying large data sets, including observed data.

Download Full-text

Efficient Multilevel Polarity Sentiment Classification Algorithm using Support Vector Machine and Fuzzy Logic

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.l3772.1081219 ◽

2019 ◽

Vol 8 (12) ◽

pp. 5048-5051

Keyword(s):

Support Vector Machine ◽

Social Networking Sites ◽

Fuzzy Inference ◽

Learning Algorithm ◽

Online Reviews ◽

Sentiment Classification ◽

Support Vector ◽

Data Set ◽

Inference System

This paper discusses an efficient algorithm for sentiment classification of online text reviews posted in social networking sites and blogs which are mostly in unstructured and ungrammatical in nature. Model proposed in this paper utilizes support vector machine supervised learning algorithm and fuzzy inference system for enhancing the degree of sentiment polarity of text reviews and providing multilevel polarity categories. Model is also able to predict degree of sentiment polarity of online reviews. The model accuracy is validated on twitter data set and compared with another earlier model.

Download Full-text