N-gram support vector machines for scalable procedure and diagnosis classification, with applications to clinical free text data from the intensive care unit

AbstractEarly admission to the neurosciences intensive care unit (NSICU) is associated with improved patient outcomes. Natural language processing offers new possibilities for mining free text in electronic health record data. We sought to develop a machine learning model using both tabular and free text data to identify patients requiring NSICU admission shortly after arrival to the emergency department (ED). We conducted a single-center, retrospective cohort study of adult patients at the Mount Sinai Hospital, an academic medical center in New York City. All patients presenting to our institutional ED between January 2014 and December 2018 were included. Structured (tabular) demographic, clinical, bed movement record data, and free text data from triage notes were extracted from our institutional data warehouse. A machine learning model was trained to predict likelihood of NSICU admission at 30 min from arrival to the ED. We identified 412,858 patients presenting to the ED over the study period, of whom 1900 (0.5%) were admitted to the NSICU. The daily median number of ED presentations was 231 (IQR 200–256) and the median time from ED presentation to the decision for NSICU admission was 169 min (IQR 80–324). A model trained only with text data had an area under the receiver-operating curve (AUC) of 0.90 (95% confidence interval (CI) 0.87–0.91). A structured data-only model had an AUC of 0.92 (95% CI 0.91–0.94). A combined model trained on structured and text data had an AUC of 0.93 (95% CI 0.92–0.95). At a false positive rate of 1:100 (99% specificity), the combined model was 58% sensitive for identifying NSICU admission. A machine learning model using structured and free text data can predict NSICU admission soon after ED arrival. This may potentially improve ED and NSICU resource allocation. Further studies should validate our findings.

Download Full-text

Automatic language identification using support vector machines and phonetic N-gram

2008 International Conference on Audio, Language and Image Processing ◽

10.1109/icalip.2008.4590023 ◽

2008 ◽

Cited By ~ 4

Author(s):

Yan Deng ◽

Jia Liu

Keyword(s):

Support Vector Machines ◽

Language Identification ◽

Support Vector ◽

Vector Machines ◽

N Gram

Download Full-text

Automatic Detection of Toxic South African Tweets Using Support Vector Machines with N-Gram Features

2019 6th International Conference on Soft Computing & Machine Intelligence (ISCMI) ◽

10.1109/iscmi47871.2019.9004298 ◽

2019 ◽

Author(s):

Oluwafemi Oriola ◽

Eduan Kotze

Keyword(s):

Support Vector Machines ◽

South African ◽

Automatic Detection ◽

Support Vector ◽

Vector Machines ◽

N Gram

Download Full-text

Automatic Assessment of Students’ Free-Text Answers with Support Vector Machines

Trends in Applied Intelligent Systems - Lecture Notes in Computer Science ◽

10.1007/978-3-642-13022-9_24 ◽

2010 ◽

pp. 235-243 ◽

Cited By ~ 6

Author(s):

Wen-Juan Hou ◽

Jia-Hao Tsao ◽

Sheng-Yang Li ◽

Li Chen

Keyword(s):

Support Vector Machines ◽

Support Vector ◽

Free Text ◽

Automatic Assessment ◽

Vector Machines

Download Full-text

Influential post identification on Instagram through caption and hashtag analysis

Measurement and Control ◽

10.1177/0020294019877489 ◽

2020 ◽

Vol 53 (3-4) ◽

pp. 409-415 ◽

Cited By ~ 2

Author(s):

Benyamin Bashari ◽

Ehsan Fazl-Ersi

Keyword(s):

Social Networks ◽

Support Vector Machines ◽

Significant Role ◽

Network Topology ◽

Support Vector ◽

Alternative Source ◽

Text Data ◽

Vector Machines ◽

Source Of Information ◽

Embedding Methods

Influencer marketing through social networks is becoming an important alternative to traditional ways of advertising. Various solutions have been proposed that often take advantage of graph-based approaches to discover influencers in social networks. This paper designs a new method for the discovery of influential users in Instagram, by focusing on user-generated posts as an alternative source of information, to potentially augment the existing solutions based on network topology or connections. The text associated with each Instagram post potentially consists of a set of hashtags and a descriptive caption. Various word embedding methods such as Co-occurrence and fastText are examined to represent captions and hashtags. These representations are combined within a support vector machines framework to distinguish influential posts from non-influential ones. Extensive experiments show that the text data can play a significant role in identifying influential posts, and further demonstrate the strength of the proposed method for discovering influencers on Instagram.

Download Full-text

Penerapan Support Vector Machine (SVM) untuk Pengkategorian Penelitian

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v1i1.11 ◽

2017 ◽

Vol 1 (1) ◽

pp. 19-25

Author(s):

Fithri Selva Jumeilah

Keyword(s):

Support Vector Machine ◽

Support Vector Machines ◽

Input Data ◽

Secondary Data ◽

Support Vector ◽

Training Process ◽

Knowledge Model ◽

Term Weighting ◽

Text Data ◽

Vector Machines

Research every college will continue to grow. Research will be stored in softcopy and hardcopy. The preparation of the research should be categorized in order to facilitate the search for people who need reference. To categorize the research, we need a method for text mining, one of them is with the implementation of Support Vector Machines (SVM). The data used to recognize the characteristics of each category then it takes secondary data which is a collection of abstracts of research. The data will be pre-processed with several stages: case folding converts all the letters into lowercase, stop words removal removal of very common words, tokenizing discard punctuation, and stemming searching for root words by removing the prefix and suffix. Further data that has undergone preprocessing will be converted into a numerical form with for the term weighting stage that is the weighting contribution of each word. From the results of term weighting then obtained data that can be used for data training and test data. The training process is done by providing input in the form of text data that is known to the class or category. Then by using the Support Vector Machines algorithm, the input data is transformed into a rule, function, or knowledge model that can be used in the prediction process. From the results of this study obtained that the categorization of research produced by SVM has been very good. This is proven by the results of the test which resulted in an accuracy of 90%.

Download Full-text

Support Vector Machines versus Multi-layer Perceptrons for Reducing False Alarms in Intensive Care Units

International Journal of Computer Applications ◽

10.5120/7675-0969 ◽

2012 ◽

Vol 49 (11) ◽

pp. 41-47 ◽

Cited By ~ 4

Author(s):

Ben RejabFahmi ◽

Nouira Kaouther ◽

Abdelwahed Trabelsi

Keyword(s):

Intensive Care ◽

Support Vector Machines ◽

Intensive Care Units ◽

Support Vector ◽

False Alarms ◽

Vector Machines

Download Full-text

Sarcasm Detection From Twitter Database Using Text Mining Algorithms

Turkish Journal of Computer and Mathematics Education (TURCOMAT) ◽

10.17762/turcomat.v12i11.6144 ◽

2021 ◽

Vol 12 (11) ◽

pp. 1916-1924

Author(s):

Tamanna Siddiqui, Et. al.

Keyword(s):

Social Media ◽

Logistic Regression ◽

Support Vector Machines ◽

Decision Tree ◽

Support Vector ◽

Decision Tree Classifier ◽

Text Data ◽

Data Set ◽

Tree Classifier ◽

Vector Machines

Sarcasm is well-defined as a cutting, frequently sarcastic remark intended to fast ridicule or dislike. Irony detection is the assignment of fittingly labeling the text as’ Sarcasm’ or ’non- Sarcasm.’ There is a challenging task owing to the deficiency of facial expressions and intonation in the text. Social media and micro-blogging websites are extensively explored for getting the information to extract the opinion of the target because a huge of text data existence is put out into the open field into social media like Twitter. Such large, openly available text data could be utilized for a variety of researches. Here we applied text data set for classifying Sarcasm and experiments have been made from the textual data extracted from the Twitter data set. Text data set downloaded from Kaggle, including 1984 tweets that collected from Twitter. These data already have labels here. In this paper, we apply these data to train our model Classifiers for different algorithms to see the ability of model machine learning to recognize sarcasm and non-sarcasm through a set of the process start by text pre-processing feature extraction (TF-IDF) and apply different classification algorithms, such as Decision Tree classifier, Multinomial Naïve Bayes Classifier, Support vector machines, and Logistic Regression classifier. Then tuning a model fitting the best results, we get in (TF-IDF) we achieve 0.94% in Multinomial NB, Decision Tree Classifier we achieve 0.93%, Logistic Regression we achieve 0.97%, and Support vector machines (SVM) we achieve 0.42%. All these result models were improved, except the SVM model has the lowest accuracy. The results were extracted, and the evaluation of the results has been proved above to be good in accuracy for identifying sarcastic impressions of people.

Download Full-text

Multi-class Classification of Cancer Stages from Free-text Histology Reports using Support Vector Machines

2007 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society ◽

10.1109/iembs.2007.4353497 ◽

2007 ◽

Cited By ~ 12

Author(s):

Anthony Nguyen ◽

Darren Moore ◽

Iain McCowan ◽

Mary-Jane Courage

Keyword(s):

Support Vector Machines ◽

Support Vector ◽

Free Text ◽

Vector Machines ◽

Multi Class Classification

Download Full-text