scholarly journals Twitter Sentiment Recognition using Support Vector Machine

2019 ◽  
Vol 8 (4) ◽  
pp. 8797-8801

In this we explore the effectiveness of language features to identify Twitter messages ' feelings. We assess the utility of existing lexical tools as well as capturing features of informal and innovative language knowledge used in micro blogging. We take a supervised approach to the problem, but to create training data, we use existing hash tags in the Twitter data. We Using three separate Twitter messaging companies in our experiments. We use the hash tagged data set (HASH) for development and training, which we compile from the Edinburgh Twitter corpus, and the emoticon data set (EMOT) from the I Sieve Corporation (ISIEVE) for evaluation. Twitter contains huge amount of data . This data may be of different types such as structured data or unstructured data. So by using this data and Appling pre processing techniques we can be able to read the comments from the users. And also the comments will be classified into three categories. They are positive negative and also the neutral comments.Today they use the processing of natural language, information, and text interpretation to derive and classify text feeling into pos itive, negative, and neutral categories. We can also examine the utility of language features to identify Twitter mess ages ' feelings. In addition, state-of - the-art approaches take into consideration only the tweet to be classified when classifying the feeling; they ignore its context (i.e. related tweets).Since tweets are usually short and more ambiguous, however, it is sometimes not enough to consider only the current tweet for classification of sentiments.Informal and innovative microblogging language. We take a sup ervised approach to the problem, but to create training data, we use existing hashtags in the Twitter data.This paper also contrasts sentiment analysis approaches in evaluating political views using Naïve Bayes supervised machine learning algorithm which performs in better analysis compared to other techniques Paper

2018 ◽  
Vol 7 (04) ◽  
pp. 871-888 ◽  
Author(s):  
Sophie J. Lee ◽  
Howard Liu ◽  
Michael D. Ward

Improving geolocation accuracy in text data has long been a goal of automated text processing. We depart from the conventional method and introduce a two-stage supervised machine-learning algorithm that evaluates each location mention to be either correct or incorrect. We extract contextual information from texts, i.e., N-gram patterns for location words, mention frequency, and the context of sentences containing location words. We then estimate model parameters using a training data set and use this model to predict whether a location word in the test data set accurately represents the location of an event. We demonstrate these steps by constructing customized geolocation event data at the subnational level using news articles collected from around the world. The results show that the proposed algorithm outperforms existing geocoders even in a case added post hoc to test the generality of the developed algorithm.


Author(s):  
Xiaoming Li ◽  
Yan Sun ◽  
Qiang Zhang

In this paper, we focus on developing a novel method to extract sea ice cover (i.e., discrimination/classification of sea ice and open water) using Sentinel-1 (S1) cross-polarization (vertical-horizontal, VH or horizontal-vertical, HV) data in extra wide (EW) swath mode based on the machine learning algorithm support vector machine (SVM). The classification basis includes the S1 radar backscatter coefficients and texture features that are calculated from S1 data using the gray level co-occurrence matrix (GLCM). Different from previous methods where appropriate samples are manually selected to train the SVM to classify sea ice and open water, we proposed a method of unsupervised generation of the training samples based on two GLCM texture features, i.e. entropy and homogeneity, that have contrasting characteristics on sea ice and open water. We eliminate the most uncertainty of selecting training samples in machine learning and achieve automatic classification of sea ice and open water by using S1 EW data. The comparison shows good agreement between the SAR-derived sea ice cover using the proposed method and a visual inspection, of which the accuracy reaches approximately 90% - 95% based on a few cases. Besides this, compared with the analyzed sea ice cover data Ice Mapping System (IMS) based on 728 S1 EW images, the accuracy of extracted sea ice cover by using S1 data is more than 80%.


2012 ◽  
Vol 461 ◽  
pp. 818-821
Author(s):  
Shi Hu Zhang

The problem of real estate prices are the current focus of the community's concern. Support Vector Machine is a new machine learning algorithm, as its excellent performance of the study, and in small samples to identify many ways, and so has its unique advantages, is now used in many areas. Determination of real estate price is a complicated problem due to its non-linearity and the small quantity of training data. In this study, support vector machine (SVM) is proposed to forecast the price of real estate price in China. The experimental results indicate that the SVM method can achieve greater accuracy than grey model, artificial neural network under the circumstance of small training data. It was also found that the predictive ability of the SVM outperformed those of some traditional pattern recognition methods for the data set used here.


2021 ◽  
Vol 50 (8) ◽  
pp. 2479-2497
Author(s):  
Buvana M. ◽  
Muthumayil K.

One of the most symptomatic diseases is COVID-19. Early and precise physiological measurement-based prediction of breathing will minimize the risk of COVID-19 by a reasonable distance from anyone; wearing a mask, cleanliness, medication, balanced diet, and if not well stay safe at home. To evaluate the collected datasets of COVID-19 prediction, five machine learning classifiers were used: Nave Bayes, Support Vector Machine (SVM), Logistic Regression, K-Nearest Neighbour (KNN), and Decision Tree. COVID-19 datasets from the Repository were combined and re-examined to remove incomplete entries, and a total of 2500 cases were utilized in this study. Features of fever, body pain, runny nose, difficulty in breathing, shore throat, and nasal congestion, are considered to be the most important differences between patients who have COVID-19s and those who do not. We exhibit the prediction functionality of five machine learning classifiers. A publicly available data set was used to train and assess the model. With an overall accuracy of 99.88 percent, the ensemble model is performed commendably. When compared to the existing methods and studies, the proposed model is performed better. As a result, the model presented is trustworthy and can be used to screen COVID-19 patients timely, efficiently.


2020 ◽  
Vol 8 (6) ◽  
pp. 4684-4688

Per the statistics received from BBC, data varies for every earthquake occurred till date. Approximately, up to thousands are dead, about 50,000 are injured, around 1-3 Million are dislocated, while a significant amount go missing and homeless. Almost 100% structural damage is experienced. It also affects the economic loss, varying from 10 to 16 million dollars. A magnitude corresponding to 5 and above is classified as deadliest. The most life-threatening earthquake occurred till date took place in Indonesia where about 3 million were dead, 1-2 million were injured and the structural damage accounted to 100%. Hence, the consequences of earthquake are devastating and are not limited to loss and damage of living as well as nonliving, but it also causes significant amount of change-from surrounding and lifestyle to economic. Every such parameter desiderates into forecasting earthquake. A couple of minutes’ notice and individuals can act to shield themselves from damage and demise; can decrease harm and monetary misfortunes, and property, characteristic assets can be secured. In current scenario, an accurate forecaster is designed and developed, a system that will forecast the catastrophe. It focuses on detecting early signs of earthquake by using machine learning algorithms. System is entitled to basic steps of developing learning systems along with life cycle of data science. Data-sets for Indian sub-continental along with rest of the World are collected from government sources. Pre-processing of data is followed by construction of stacking model that combines Random Forest and Support Vector Machine Algorithms. Algorithms develop this mathematical model reliant on “training data-set”. Model looks for pattern that leads to catastrophe and adapt to it in its building, so as to settle on choices and forecasts without being expressly customized to play out the task. After forecast, we broadcast the message to government officials and across various platforms. The focus of information to obtain is keenly represented by the 3 factors – Time, Locality and Magnitude.


2019 ◽  
Vol 8 (2) ◽  
pp. 2907-2913

The huge amount of data being generated by different organizations and its underlying advantages in multiple fields like decision making, data security, research purposes have made data classification a very important and mandatory process now-a-days. Data Classification is the process of grouping data of similar characteristics into categories. Classification can be done based on the output we are looking forward to. Hence it is considered very useful. Classifying data allows us to predict the nature of future data-sets and discover useful patterns among them. This project aims at classifying gene data sets. Gene data sets are the information collected from a set of genes put to a specific test. It can be used for medical research purposes; by studying the pattern in the datasets allows us to predict the kind of genes that are more vulnerable to a particular disease there by allowing us to prevent the manifestation of the disease right at its beginning, just as they say, prevention is better than cure. In this paper, such classification is effort using a supervised machine learning algorithm – SVM (Support Vector Machine). There are many algorithms in existence to perform classification but this algorithm has its own lead over the others. It is capable of both classification and regression. It works well with structured, semistructured and unstructured data too. It contains a kernel function which when used appropriately can solve any complex problem. The summary of this project is, taking gene data sets as input and obtaining classified clusters as output.


2020 ◽  
Vol 8 (5) ◽  
pp. 1557-1560

Support vector machine (SVM) is a commonly known efficient supervised learning algorithm for classification problems. However, the classification accuracy of the SVM classifier depends on its training parameters and the training data set as well. The main objective of this paper is to optimize its parameters and feature weighting in order to improve the strength of the SVM simultaneously. In this paper, the Imperialist Competitive Algorithm based Support Vector Machine (ICA-SVM) classifier is proposed to classify the efficient weed detection. This enhanced ICA-SVM classifier is able to select the appropriate input features and to optimize the parameters of SVM and is improving the classification accuracy. Experimental results show that the ICA-SVM classification algorithm reduces the computational complexity tremendously and improves classification Accuracy.


2007 ◽  
Vol 25 (18_suppl) ◽  
pp. 15107-15107
Author(s):  
R. V. Iyer ◽  
B. Tennant ◽  
M. Ruiz ◽  
T. Szyperski ◽  
D. Trump ◽  
...  

15107 Background: HCC is a common and rapidly fatal cancer. Current screening tools are inadequate for identification of potentially curable cases. Our aim was to determine whether H-NMR can identify HCC compared to controls in the woodchuck (WC) model of hepatitis related HCC. Methods: Eastern WCs were bred and inoculated at birth with dilute sera from WCs that are chronic carriers of Woodchuck Hepatitis B Virus (WHV). This resulted in chronic hepatitis in ∼60% animals and all carriers developed HCC by 24–36 months. Serum from 10 chronic WHV carriers with HCC (group 1), 5 WHV carriers with no HCC (group 2) and 15 matched non-infected controls (group 3) was obtained. 45uL serum was diluted with 5uL of D2O containing 27mM formic acid + 0.9% saline. Spectra were collected on a 600 MHz INOVA spectrometer using a CapNMR flow probe with 10uL flow cell at 298K without knowledge of group assignments. The resulting 1D spectra were processed using Nuts from AcornNMR. Results: Principle component analysis and supervised PLS-DA was performed using Simca P+ from Umetrics. Despite general separation of groups, the Q2 value of this model was relatively low (0.20). We trained a Support Vector Machine (SVM) algorithm, a supervised machine-learning algorithm, to learn to identify the groups. Evaluation of the performance of the algorithm using 10-fold validation on the data set achieved a Kappa value of 0.43. This algorithm learnt to identify HCC [0.765 ROC, 0.8 sensitivity, and 0.727 positive predictive value (PPV)] and controls (0.75 ROC, 0.69 sensitivity and 0.73 PPV) but not the WHV carrier group, likely due to the small numbers. In a second analysis of 10 HCC and 15 controls, PLS-DA showed clear separation using three components (Q2= 0.5). The corresponding SVM model showed a kappa value of 0.52 and ROC values of 0.767 for both classes. Conclusions: Our preliminary results indicate that H-NMR spectra alone can be used to distinguish HCC from healthy controls using the machine-learning algorithm for classification. Further validation in a larger cohort of woodchucks is ongoing and confirmation of these preliminary findings would support investigation of this technique as a screening tool in patients at risk for developing HCC. No significant financial relationships to disclose.


2020 ◽  
Vol 497 (4) ◽  
pp. 5041-5060
Author(s):  
Brandon Buncher ◽  
Matias Carrasco Kind

ABSTRACT We present a novel method of robust probabilistic cosmic web particle classification in three dimensions using a supervised machine learning algorithm. Training data were generated using a simplified ΛCDM toy model with pre-determined algorithms for generating haloes, filaments, and voids. While this framework is not constrained by physical modelling, it can be generated substantially more quickly than an N-body simulation without loss in classification accuracy. For each particle in this data set, measurements were taken of the local density field magnitude and directionality. These measurements were used to train a random forest algorithm, which was used to assign class probabilities to each particle in a ΛCDM, dark matter-only N-body simulation with 2563 particles, as well as on another toy model data set. By comparing the trends in the ROC curves and other statistical metrics of the classes assigned to particles in each data set using different feature sets, we demonstrate that the combination of measurements of the local density field magnitude and directionality enables accurate and consistent classification of halo, filament, and void particles in varied environments. We also show that this combination of training features ensures that the construction of our toy model does not affect classification. The use of a fully supervised algorithm allows greater control over the information deemed important for classification, preventing issues arising from arbitrary hyperparameters and mode collapse in deep learning models. Due to the speed of training data generation, our method is highly scalable, making it particularly suited for classifying large data sets, including observed data.


This paper discusses an efficient algorithm for sentiment classification of online text reviews posted in social networking sites and blogs which are mostly in unstructured and ungrammatical in nature. Model proposed in this paper utilizes support vector machine supervised learning algorithm and fuzzy inference system for enhancing the degree of sentiment polarity of text reviews and providing multilevel polarity categories. Model is also able to predict degree of sentiment polarity of online reviews. The model accuracy is validated on twitter data set and compared with another earlier model.


Sign in / Sign up

Export Citation Format

Share Document