You, thou and thee

This study creates a prediction model to identify which linguistic and extra-linguistic features influence pronoun choices in the plays of Shakespeare. In the English of Shakespeare’s time, the now-archaic distinction between you and thou persisted, and is usually reported as being determined by relative social status and personal closeness of speaker and addressee. But it remains to be determined whether statistical machine learning will support this traditional explanation. 23 features are investigated, having been selected from multiple linguistic areas, such as pragmatics, sociolinguistics and conversation analysis. The three algorithms used, Naive Bayes, decision tree and support vector machine, are selected as illustrative of a range of possible models in light of their contrasting assumptions and learning biases. Two predictions are performed, firstly on a binary (you/thou) distinction and then on a trinary (you/thou/thee) distinction. Of the three algorithms, the support vector machine models score best. The features identified as the best predictors of pronoun choice are the words in the direct linguistic context. Several other features are also shown to influence the pronoun prediction, including the names of the speaker and addressee, the status differential, and positive and negative sentiment.

Download Full-text

Comparing Supervised Machine Learning Strategies and Linguistic Features to Search for Very Negative Opinions

Information ◽

10.3390/info10010016 ◽

2019 ◽

Vol 10 (1) ◽

pp. 16 ◽

Cited By ~ 3

Author(s):

Sattam Almatarneh ◽

Pablo Gamallo

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Empirical Study ◽

Learning Strategies ◽

Supervised Machine Learning ◽

Support Vector ◽

Word Embeddings ◽

Linguistic Features ◽

Machine Learning Classifiers ◽

Supervised Machine Learning Classifiers

In this paper, we examine the performance of several classifiers in the process of searching for very negative opinions. More precisely, we do an empirical study that analyzes the influence of three types of linguistic features (n-grams, word embeddings, and polarity lexicons) and their combinations when they are used to feed different supervised machine learning classifiers: Naive Bayes (NB), Decision Tree (DT), and Support Vector Machine (SVM). The experiments we have carried out show that SVM clearly outperforms NB and DT in all datasets by taking into account all features individually as well as their combinations.

Download Full-text

Comparing Supervised Machine Learning Strategies and Linguistic Features to Search for Very Negative Opinions

10.20944/preprints201811.0436.v1 ◽

2018 ◽

Author(s):

Sattam Almatarneh ◽

Pablo Gamallo

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Empirical Study ◽

Learning Strategies ◽

Supervised Machine Learning ◽

Support Vector ◽

Word Embeddings ◽

Linguistic Features ◽

Machine Learning Classifiers ◽

Supervised Machine Learning Classifiers

In this paper, we examine the performance of several classifiers in the process of searching for very negative opinions. More precisely, we do an empirical study that analyzes the influence of three types of linguistic features (n-grams, word embeddings, and polarity lexicons) and their combinations when they are used to feed different supervised machine learning classifiers: Support Vector Machine (SVM), Naive Bayes (NB), and Decision Tree (DT).

Download Full-text

Driver Recognition on Segway

International Journal of Artificial Life Research ◽

10.4018/jalr.2012010107 ◽

2012 ◽

Vol 3 (1) ◽

pp. 76-88

Author(s):

Hiroshi Sato ◽

Julien Rossignol

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Support Vector ◽

Learning Approach ◽

Statistical Machine Learning ◽

Recognition Method ◽

Human Behaviors ◽

Machine Learning Approach ◽

User Friendly

Statistical machine learning approach to understand human behaviors has been attracting considerable amounts of attention in recent years. If the authors understand more about humans, the authors can make more user-friendly machines. In this paper, the authors propose the driver recognition method from their record of manipulations using support vector machine. The authors demonstrate the efficiency of the authors’ method using the Segway. The performance of the recognition is quite good especially when the authors introduce the pre-process with FFT.

Download Full-text

Cancer Diagnosis and Disease Gene Identification via Statistical Machine Learning

Current Bioinformatics ◽

10.2174/1574893615666200207094947 ◽

2020 ◽

Vol 15 ◽

Author(s):

Liuyuan Chen ◽

Juntao Li ◽

Mingming Chang

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Cancer Diagnosis ◽

Gene Selection ◽

Disease Gene ◽

Group Lasso ◽

Elastic Net ◽

Support Vector ◽

Multinomial Regression ◽

Statistical Machine Learning

: Diagnosing cancer and identifying the disease gene by using DNA microarray gene expression data are the hot topics in current bioinformatics. This paper is devoted to the latest development of cancer diagnosis and gene selection via statistical machine learning. Support vector machine is firstly introduced for the binary cancer diagnosis. Then, 1_norm support vector machine, doubly regularized support vector machine, adaptive huberized support vector machine and other extensions are presented to improve the performance of gene selection. Lasso, elastic net, partly adaptive elastic net, group lasso, sparse group lasso, adaptive sparse group lasso and other sparse regression methods are also introduced for performing simultaneous binary cancer classification and gene selection. In addition to introducing three strategies for reducing multiclass to binary, methods of directly considering all classes of data in a learning model (multi_class support vector, sparse multinomial regression, adaptive multinomial regression and so on) are presented for performing multiple cancer diagnosis. Limitations and promising directions are also discussed.

Download Full-text

Sentiment Analysis using Feature Based Support Vector Machine – A Proposed Method

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1463.0982s1119 ◽

2019 ◽

Vol 8 (2S11) ◽

pp. 3671-3676 ◽

Cited By ~ 1

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Vital Role ◽

Support Vector ◽

Features Selection ◽

Linguistic Features ◽

Good Tool ◽

Business Decisions ◽

The People ◽

Feature Based

Business decisions for any service or product depend on sentiments by the people. The mood of people towards any event, service and product are expressed in sentiments. The text sentiment contains different linguistic features of sentence. A sentiment sentence also contains other features which are playing a vital role in deciding the polarity of sentiments.The features like duplication of sentiment, unknown emotics may change the polarity of sentiment.If features selection is proper one can extract better sentiments for decision making. A directed preprocessing will feed filtered input to any machine learning approach. Support vector machine proved as a good tool of machine learning for better sentiment analysis.Better use of parts os speech (POS) folled by guided preprocessing and evaluation will provide less errorus polarity of sentiments

Download Full-text

IMPLEMENTASI SUPPORT VECTOR MACHINE PADA PREDIKSI HARGA SAHAM GABUNGAN (IHSG)

Jurnal Ilmiah Teknologi dan Rekayasa ◽

10.35760/tr.2020.25i1.2571 ◽

2020 ◽

Vol 25 (1) ◽

pp. 24-38

Author(s):

Eka Patriya

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Support Vector Regression ◽

Support Vector

Saham adalah instrumen pasar keuangan yang banyak dipilih oleh investor sebagai alternatif sumber keuangan, akan tetapi saham yang diperjual belikan di pasar keuangan sering mengalami fluktuasi harga (naik dan turun) yang tinggi. Para investor berpeluang tidak hanya mendapat keuntungan, tetapi juga dapat mengalami kerugian di masa mendatang. Salah satu indikator yang perlu diperhatikan oleh investor dalam berinvestasi saham adalah pergerakan Indeks Harga Saham Gabungan (IHSG). Tindakan dalam menganalisa IHSG merupakan hal yang penting dilakukan oleh investor dengan tujuan untuk menemukan suatu trend atau pola yang mungkin berulang dari pergerakan harga saham masa lalu, sehingga dapat digunakan untuk memprediksi pergerakan harga saham di masa mendatang. Salah satu metode yang dapat digunakan untuk memprediksi pergerakan harga saham secara akurat adalah machine learning. Pada penelitian ini dibuat sebuah model prediksi harga penutupan IHSG menggunakan algoritma Support Vector Regression (SVR) yang menghasilkan kemampuan prediksi dan generalisasi yang baik dengan nilai RMSE training dan testing sebesar 14.334 dan 20.281, serta MAPE training dan testing sebesar 0.211% dan 0.251%. Hasil penelitian ini diharapkan dapat membantu para investor dalam mengambil keputusan untuk menyusun strategi investasi saham.

Download Full-text

Predicting Future Occurrence of Acute Hypotensive Episodes Using Noninvasive and Invasive Features

Military Medicine ◽

10.1093/milmed/usaa418 ◽

2021 ◽

Vol 186 (Supplement_1) ◽

pp. 445-451

Author(s):

Yifei Sun ◽

Navid Rashedi ◽

Vikrant Vaze ◽

Parikshit Shah ◽

Ryan Halter ◽

...

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Real World ◽

Short Term Memory ◽

Model Performance ◽

Learning Technologies ◽

Machine Learning Algorithms ◽

Support Vector ◽

K Nearest Neighbor ◽

Continuous Map

ABSTRACT Introduction Early prediction of the acute hypotensive episode (AHE) in critically ill patients has the potential to improve outcomes. In this study, we apply different machine learning algorithms to the MIMIC III Physionet dataset, containing more than 60,000 real-world intensive care unit records, to test commonly used machine learning technologies and compare their performances. Materials and Methods Five classification methods including K-nearest neighbor, logistic regression, support vector machine, random forest, and a deep learning method called long short-term memory are applied to predict an AHE 30 minutes in advance. An analysis comparing model performance when including versus excluding invasive features was conducted. To further study the pattern of the underlying mean arterial pressure (MAP), we apply a regression method to predict the continuous MAP values using linear regression over the next 60 minutes. Results Support vector machine yields the best performance in terms of recall (84%). Including the invasive features in the classification improves the performance significantly with both recall and precision increasing by more than 20 percentage points. We were able to predict the MAP with a root mean square error (a frequently used measure of the differences between the predicted values and the observed values) of 10 mmHg 60 minutes in the future. After converting continuous MAP predictions into AHE binary predictions, we achieve a 91% recall and 68% precision. In addition to predicting AHE, the MAP predictions provide clinically useful information regarding the timing and severity of the AHE occurrence. Conclusion We were able to predict AHE with precision and recall above 80% 30 minutes in advance with the large real-world dataset. The prediction of regression model can provide a more fine-grained, interpretable signal to practitioners. Model performance is improved by the inclusion of invasive features in predicting AHE, when compared to predicting the AHE based on only the available, restricted set of noninvasive technologies. This demonstrates the importance of exploring more noninvasive technologies for AHE prediction.

Download Full-text