Exploiting Language Models to Classify Events from Twitter

Computational Intelligence and Neuroscience ◽

10.1155/2015/401024 ◽

2015 ◽

Vol 2015 ◽

pp. 1-11 ◽

Cited By ~ 4

Author(s):

Duc-Thuan Vo ◽

Vo Thuan Hai ◽

Cheol-Young Ock

Keyword(s):

Latent Dirichlet Allocation ◽

Nearest Neighbor ◽

Language Models ◽

K Nearest Neighbor ◽

Text Corpora ◽

Common Term ◽

Selectional Preferences ◽

Linguistic Relations ◽

Relationship Of ◽

Learning Language

Classifying events is challenging in Twitter because tweets texts have a large amount of temporal data with a lot of noise and various kinds of topics. In this paper, we propose a method to classify events from Twitter. We firstly find the distinguishing terms between tweets in events and measure their similarities with learning language models such as ConceptNet and a latent Dirichlet allocation method for selectional preferences (LDA-SP), which have been widely studied based on large text corpora within computational linguistic relations. The relationship of term words in tweets will be discovered by checking them under each model. We then proposed a method to compute the similarity between tweets based on tweets’ features including common term words and relationships among their distinguishing term words. It will be explicit and convenient for applying to k-nearest neighbor techniques for classification. We carefully applied experiments on the Edinburgh Twitter Corpus to show that our method achieves competitive results for classifying events.

Harvesting Online Reviews to Identify the Competitor Set in a Service Business: Evidence From the Hotel Industry

Journal of Service Research ◽

10.1177/1094670520975143 ◽

2020 ◽

pp. 109467052097514

Author(s):

Fei Ye ◽

Qian Xia ◽

Minhao Zhang ◽

Yuanzhu Zhan ◽

Yina Li

Keyword(s):

Latent Dirichlet Allocation ◽

Nearest Neighbor ◽

Service Industry ◽

Analytical Framework ◽

Online Reviews ◽

Service Industries ◽

K Nearest Neighbor ◽

Allocation Model ◽

Customer Reviews ◽

Latent Dirichlet Allocation Model

In today’s global service industry, online reviews posted by consumers offer critical information that influences subsequent consumers’ purchasing decisions and firms’ operation strategies. However, little research has been done on how the same information can be used to identify key competitors and improve services to increase competitiveness. In this article, we propose an analytical framework based on an improved k-nearest neighbor model and a latent Dirichlet allocation model for service managers to harvest online reviews to identify their key competitors and to evaluate the strengths and weaknesses of their businesses. With a sample comprising over 8 million customer reviews of 6,409 hotels in 50 Chinese cities from Ctrip.com , we validate the effectiveness of the proposed approach in the analysis of a hotel’s service competitiveness and its key competitors. The findings indicate that the importance of particular attributes of a hotel varies in different segments according to hotel star ratings. This study extends the literature by bridging online reviews and competitor identification for service industries. It also contributes to practice by offering a systematic and effective way for managers to identify their key competitors, monitor market preferences, ensure service quality, and formulate effective marketing strategies.

The Comparative Experimental Study of Multilabel Classification for Diagnosis Assistant Based on Chinese Obstetric EMRs

Journal of Healthcare Engineering ◽

10.1155/2018/7273451 ◽

2018 ◽

Vol 2018 ◽

pp. 1-9 ◽

Cited By ~ 4

Author(s):

Kunli Zhang ◽

Hongchao Ma ◽

Yueshu Zhao ◽

Hongying Zan ◽

Lei Zhuang

Keyword(s):

Electronic Medical Records ◽

Medical Records ◽

Latent Dirichlet Allocation ◽

Nearest Neighbor ◽

Fertility Level ◽

K Nearest Neighbor ◽

Multilabel Classification ◽

Multilabel Learning ◽

Chief Complaints ◽

Physical Examinations

Obstetric electronic medical records (EMRs) contain massive amounts of medical data and health information. The information extraction and diagnosis assistants of obstetric EMRs are of great significance in improving the fertility level of the population. The admitting diagnosis in the first course record of the EMR is reasoned from various sources, such as chief complaints, auxiliary examinations, and physical examinations. This paper treats the diagnosis assistant as a multilabel classification task based on the analyses of obstetric EMRs. The latent Dirichlet allocation (LDA) topic and the word vector are used as features and the four multilabel classification methods, BP-MLL (backpropagation multilabel learning), RAkEL (RAndom k labELsets), MLkNN (multilabel k-nearest neighbor), and CC (chain classifier), are utilized to build the diagnosis assistant models. Experimental results conducted on real cases show that the BP-MLL achieves the best performance with an average precision up to 0.7413 ± 0.0100 when the number of label sets and the word dimensions are 71 and 100, respectively. The result of the diagnosis assistant can be introduced as a supplementary learning method for medical students. Additionally, the method can be used not only for obstetric EMRs but also for other medical records.

Application of Data Mining for Optimal Drug Inventory in a Hospital

SinkrOn ◽

10.33395/sinkron.v4i1.10236 ◽

2019 ◽

Vol 4 (1) ◽

pp. 207

Author(s):

Dewi Sahputri Siringo-Ringo ◽

Razana Baringin Daud Tambunan ◽

Dian Yulizar ◽

Tri Agustina Daulay ◽

Amir Mahmud Husein

Keyword(s):

Data Mining ◽

Inventory Management ◽

Nearest Neighbor ◽

Emergency Services ◽

Disease Classification ◽

K Nearest Neighbor ◽

Optimal Drug ◽

Hospital Revenue ◽

Individual Health Services ◽

Relationship Of

The Hospital is a health care institution that conducts complete individual health services that provide inpatient, outpatient and emergency services. Drug inventory management is one thing that is very important for the survival of hospitals, management of the supply of medical equipment that is not optimal including medicines will have an impact on medical services as well as economically, because 70% of hospital revenue comes from drugs. In this study we propose data mining with a focus on contributions to the comparison of the K-Means and K-Nearest Neighbor (KNN) algorithms for disease classification, then the classification results are carried out mapping the correlation of diseases with drugs using Apriori, based on the results of testing the K-Means algorithm more accurately compared KNN in the Apriori method to find the relationship of disease with drugs based on the value of support, trust, support value, trust is expected to be a reference for drug purchase recommendations so that there is no excess or emptiness of the drug.

Machine Learning Verdict of EEG Signals in Brain Computer Interface

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit1838114 ◽

2018 ◽

pp. 429-441

Author(s):

M. Jeyanthi ◽

C. Velayutham

Keyword(s):

Nearest Neighbor ◽

Technology Development ◽

Vital Role ◽

Svm Classifier ◽

K Nearest Neighbor ◽

Data Mining Technique ◽

Data Set ◽

Eeg Data ◽

Irrelevant Attributes

In Science and Technology Development BCI plays a vital role in the field of Research. Classification is a data mining technique used to predict group membership for data instances. Analyses of BCI data are challenging because feature extraction and classification of these data are more difficult as compared with those applied to raw data. In this paper, We extracted features using statistical Haralick features from the raw EEG data . Then the features are Normalized, Binning is used to improve the accuracy of the predictive models by reducing noise and eliminate some irrelevant attributes and then the classification is performed using different classification techniques such as Naïve Bayes, k-nearest neighbor classifier, SVM classifier using BCI dataset. Finally we propose the SVM classification algorithm for the BCI data set.

PENENTUAN DAERAH PRIORITAS PELAYANAN AKTA KELAHIRAN DENGAN METODE K-NN DAN K-MEANS

Komputasi: Jurnal Ilmiah Ilmu Komputer dan Matematika ◽

10.33751/komputasi.v17i1.1735 ◽

2020 ◽

Vol 17 (1) ◽

pp. 319-328

Author(s):

Ade Muchlis Maulana Anwar ◽

Prihastuti Harsani ◽

Aries Maesya

Keyword(s):

Nearest Neighbor ◽

Information Gain ◽

Birth Certificate ◽

Population Data ◽

Community Services ◽

Birth Certificates ◽

Similar Data ◽

K Nearest Neighbor ◽

Civil Registration ◽

The Family

Population Data is individual data or aggregate data that is structured as a result of Population Registration and Civil Registration activities. Birth Certificate is a Civil Registration Deed as a result of recording the birth event of a baby whose birth is reported to be registered on the Family Card and given a Population Identification Number (NIK) as a basis for obtaining other community services. From the total number of integrated birth certificate reporting for the 2018 Population Administration Information System (SIAK) totaling 570,637 there were 503,946 reported late and only 66,691 were reported publicly. Clustering is a method used to classify data that is similar to others in one group or similar data to other groups. K-Nearest Neighbor is a method for classifying objects based on learning data that is the closest distance to the test data. k-means is a method used to divide a number of objects into groups based on existing categories by looking at the midpoint. In data mining preprocesses, data is cleaned by filling in the blank data with the most dominating data, and selecting attributes using the information gain method. Based on the k-nearest neighbor method to predict delays in reporting and the k-means method to classify priority areas of service with 10,000 birth certificate data on birth certificates in 2019 that have good enough performance to produce predictions with an accuracy of 74.00% and with K = 2 on k-means produces a index davies bouldin of 1,179.

A Scalable K-Nearest Neighbor Algorithm for Recommendation System Problems

2020 43rd International Convention on Information, Communication and Electronic Technology (MIPRO) ◽

10.23919/mipro48935.2020.9245195 ◽

2020 ◽

Author(s):

A. Sagdic ◽

C. Tekinbas ◽

E. Arslan ◽

T. Kucukyilmaz

Keyword(s):

Recommendation System ◽

Nearest Neighbor ◽

K Nearest Neighbor ◽

Nearest Neighbor Algorithm ◽

K Nearest Neighbor Algorithm

Optimizing Error Rate in Intrusion Detection System Using Artificial Neural Network Algorithm

International Journal of Emerging Research in Management and Technology ◽

10.23956/ijermt.v6i9.102 ◽

2018 ◽

Vol 6 (9) ◽

pp. 152

Author(s):

S. Vijaya Rani ◽

G. N. K. Suresh Babu

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Intrusion Detection ◽

Error Rate ◽

Learning Process ◽

Nearest Neighbor ◽

Detection System ◽

Support Vector ◽

K Nearest Neighbor ◽

Artificial Neural

The illegal hackers penetrate the servers and networks of corporate and financial institutions to gain money and extract vital information. The hacking varies from one computing system to many system. They gain access by sending malicious packets in the network through virus, worms, Trojan horses etc. The hackers scan a network through various tools and collect information of network and host. Hence it is very much essential to detect the attacks as they enter into a network. The methods available for intrusion detection are Naive Bayes, Decision tree, Support Vector Machine, K-Nearest Neighbor, Artificial Neural Networks. A neural network consists of processing units in complex manner and able to store information and make it functional for use. It acts like human brain and takes knowledge from the environment through training and learning process. Many algorithms are available for learning process This work carry out research on analysis of malicious packets and predicting the error rate in detection of injured packets through artificial neural network algorithms.

Perancangan Aplikasi Prediksi Kelulusan Tepat Waktu Bagi Mahasiswa Baru Dengan Teknik Data Mining (Studi Kasus: Data Akademik Mahasiswa STMIK Dipanegara Makassar)

Creative Information Technology Journal ◽

10.24076/citec.2014v1i4.27 ◽

2015 ◽

Vol 1 (4) ◽

pp. 270

Author(s):

Muhammad Syukri Mustafa ◽

I. Wayan Simpen

Keyword(s):

Data Mining ◽

Nearest Neighbor ◽

Test Results ◽

K Nearest Neighbor ◽

Accuracy Rate ◽

Sample Data ◽

New Students ◽

K Nearest Neighbor Algorithm ◽

Using Data ◽

Existing Data

Penelitian ini dimaksudkan untuk melakukan prediksi terhadap kemungkian mahasiswa baru dapat menyelesaikan studi tepat waktu dengan menggunakan analisis data mining untuk menggali tumpukan histori data dengan menggunakan algoritma K-Nearest Neighbor (KNN). Aplikasi yang dihasilkan pada penelitian ini akan menggunakan berbagai atribut yang klasifikasikan dalam suatu data mining antara lain nilai ujian nasional (UN), asal sekolah/ daerah, jenis kelamin, pekerjaan dan penghasilan orang tua, jumlah bersaudara, dan lain-lain sehingga dengan menerapkan analysis KNN dapat dilakukan suatu prediksi berdasarkan kedekatan histori data yang ada dengan data yang baru, apakah mahasiswa tersebut berpeluang untuk menyelesaikan studi tepat waktu atau tidak. Dari hasil pengujian dengan menerapkan algoritma KNN dan menggunakan data sampel alumni tahun wisuda 2004 s.d. 2010 untuk kasus lama dan data alumni tahun wisuda 2011 untuk kasus baru diperoleh tingkat akurasi sebesar 83,36%.This research is intended to predict the possibility of new students time to complete studies using data mining analysis to explore the history stack data using K-Nearest Neighbor algorithm (KNN). Applications generated in this study will use a variety of attributes in a data mining classified among other Ujian Nasional scores (UN), the origin of the school / area, gender, occupation and income of parents, number of siblings, and others that by applying the analysis KNN can do a prediction based on historical proximity of existing data with new data, whether the student is likely to complete the study on time or not. From the test results by applying the KNN algorithm and uses sample data alumnus graduation year 2004 s.d 2010 for the case of a long and alumni data graduation year 2011 for new cases obtained accuracy rate of 83.36%.

Rancang Bangun Website Klasifikasi Untuk Pencarian Produk Pasar Online Menggunakan Algoritma K-Nearest Neighbor

Jurnal Teknik Informatika dan Sistem Informasi ◽

10.28932/jutisi.v3i3.655 ◽

2017 ◽

Vol 3 (3) ◽

Author(s):

Danny Sebastian

Keyword(s):

Nearest Neighbor ◽

K Nearest Neighbor

Klasifikasi Algoritma K-Nearest Neighbor Berbasis Particle Swarm Optimization Untuk Kelayakan Bantuan Rehabilitasi Rumah Tidak Layak Huni Pada Desa Lenek Duren Kecamatan Aikmel Kabupaten Lombok Timur

Infotek : Jurnal Informatika dan Teknologi ◽

10.29408/jit.v2i2.1417 ◽

2019 ◽

Vol 2 (2) ◽

pp. 79-85

Author(s):

Suhartini Suhartini ◽

◽

Hariman Bahtiar ◽

Keyword(s):

Particle Swarm Optimization ◽

Nearest Neighbor ◽

Particle Swarm ◽

K Nearest Neighbor ◽

Swarm Optimization