CLASSIFICATION OF COVID-19 SYMPTOM FOR CHATBOT USING BERT

2021 ◽  
Vol 10 (2) ◽  
pp. 1065-1069
Author(s):  
H. Park ◽  
G. Moon ◽  
K. Kim

Coronavirus disease (COVID-19) is a significant disaster worldwide from December 2019 to the present. Information on the COVID-19 is grasped through news media or social media, and researchers are conducting various research. This is because we are trying to shorten the time to be aware of the COVID-19 disaster situation. In this paper, we build a chatbot so that it can be used in emergencies using the COVID-19 data set and investigate how the analysis is changing the situation with deep learning.

2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Yahya Albalawi ◽  
Jim Buckley ◽  
Nikola S. Nikolov

AbstractThis paper presents a comprehensive evaluation of data pre-processing and word embedding techniques in the context of Arabic document classification in the domain of health-related communication on social media. We evaluate 26 text pre-processings applied to Arabic tweets within the process of training a classifier to identify health-related tweets. For this task we use the (traditional) machine learning classifiers KNN, SVM, Multinomial NB and Logistic Regression. Furthermore, we report experimental results with the deep learning architectures BLSTM and CNN for the same text classification problem. Since word embeddings are more typically used as the input layer in deep networks, in the deep learning experiments we evaluate several state-of-the-art pre-trained word embeddings with the same text pre-processing applied. To achieve these goals, we use two data sets: one for both training and testing, and another for testing the generality of our models only. Our results point to the conclusion that only four out of the 26 pre-processings improve the classification accuracy significantly. For the first data set of Arabic tweets, we found that Mazajak CBOW pre-trained word embeddings as the input to a BLSTM deep network led to the most accurate classifier with F1 score of 89.7%. For the second data set, Mazajak Skip-Gram pre-trained word embeddings as the input to BLSTM led to the most accurate model with F1 score of 75.2% and accuracy of 90.7% compared to F1 score of 90.8% achieved by Mazajak CBOW for the same architecture but with lower accuracy of 70.89%. Our results also show that the performance of the best of the traditional classifier we trained is comparable to the deep learning methods on the first dataset, but significantly worse on the second dataset.


2019 ◽  
Vol 7 (9) ◽  
pp. 318 ◽  
Author(s):  
Yu ◽  
Zhao ◽  
Chin

The southeastern coast of China suffers many typhoon disasters every year, causing huge casualties and economic losses. In addition, collecting statistics on typhoon disaster situations is hard work for the government. At the same time, near-real-time disaster-related information can be obtained on developed social media platforms like Twitter and Weibo. Many cases have proved that citizens are able to organize themselves promptly on the spot, and begin to share disaster information when a disaster strikes, producing massive VGI (volunteered geographic information) about the disaster situation, which could be valuable for disaster response if this VGI could be exploited efficiently and properly. However, this social media information has features such as large quantity, high noise, and unofficial modes of expression that make it difficult to obtain useful information. In order to solve this problem, we first designed a new classification system based on the characteristics of social medial data like Sina Weibo data, and made a microblogging dataset of typhoon damage with according category labels. Secondly, we used this social medial dataset to train the deep learning model, and constructed a typhoon disaster mining model based on a deep learning network, which could automatically extract information about the disaster situation. The model is different from the general classification system in that it automatically selected microblogs related to disasters from a large number of microblog data, and further subdivided them into different types of disasters to facilitate subsequent emergency response and loss estimation. The advantages of the model included a wide application range, high reliability, strong pertinence and fast speed. The research results of this thesis provide a new approach to typhoon disaster assessment in the southeastern coastal areas of China, and provide the necessary information for the authoritative information acquisition channel.


2021 ◽  
Author(s):  
Tomochika Fujisawa ◽  
Victor Noguerales ◽  
Emmanouil Meramveliotakis ◽  
Anna Papadopoulou ◽  
Alfried P Vogler

Complex bulk samples of invertebrates from biodiversity surveys present a great challenge for taxonomic identification, especially if obtained from unexplored ecosystems. High-throughput imaging combined with machine learning for rapid classification could overcome this bottleneck. Developing such procedures requires that taxonomic labels from an existing source data set are used for model training and prediction of an unknown target sample. Yet the feasibility of transfer learning for the classification of unknown samples remains to be tested. Here, we assess the efficiency of deep learning and domain transfer algorithms for family-level classification of below-ground bulk samples of Coleoptera from understudied forests of Cyprus. We trained neural network models with images from local surveys versus global databases of above-ground samples from tropical forests and evaluated how prediction accuracy was affected by: (a) the quality and resolution of images, (b) the size and complexity of the training set and (c) the transferability of identifications across very disparate source-target pairs that do not share any species or genera. Within-dataset classification accuracy reached 98% and depended on the number and quality of training images and on dataset complexity. The accuracy of between-datasets predictions was reduced to a maximum of 82% and depended greatly on the standardisation of the imaging procedure. When the source and target images were of similar quality and resolution, albeit from different faunas, the reduction of accuracy was minimal. Application of algorithms for domain adaptation significantly improved the prediction performance of models trained by non-standardised, low-quality images. Our findings demonstrate that existing databases can be used to train models and successfully classify images from unexplored biota, when the imaging conditions and classification algorithms are carefully considered. Also, our results provide guidelines for data acquisition and algorithmic development for high-throughput image-based biodiversity surveys.


2021 ◽  
Vol 38 (1) ◽  
pp. 1-11
Author(s):  
Hafzullah İş ◽  
Taner Tuncer

It is highly important to detect malicious account interaction in social networks with regard to political, social and economic aspects. This paper analyzed the profile structure of social media users using their data interactions. A total of 10 parameters including diameter, density, reciprocity, centrality and modularity were used to comprehensively characterize the interactions of Twitter users. Moreover, a new data set was formed by visualizing the data obtained with these parameters. User profiles were classified using Convolutional Neural Network models with deep learning. Users were divided into active, passive and malicious classes. Success rates for the algorithms used in the classification were estimated based on the hyper parameters and application platforms. The best model had a success rate of 98.67%. The methodology demonstrated that Twitter user profiles can be classified successfully through user interaction-based parameters. It is expected that this paper will contribute to published literature in terms of behavioral analysis and the determination of malicious accounts in social networks.


2019 ◽  
Vol 11 (01n02) ◽  
pp. 1950002
Author(s):  
Rasim M. Alguliyev ◽  
Ramiz M. Aliguliyev ◽  
Fargana J. Abdullayeva

Recently, data collected from social media enable to analyze social events and make predictions about real events, based on the analysis of sentiments and opinions of users. Most cyber-attacks are carried out by hackers on the basis of discussions on social media. This paper proposes the method that predicts DDoS attacks occurrence by finding relevant texts in social media. To perform high-precision classification of texts to positive and negative classes, the CNN model with 13 layers and improved LSTM method are used. In order to predict the occurrence of the DDoS attacks in the next day, the negative and positive sentiments in social networking texts are used. To evaluate the efficiency of the proposed method experiments were conducted on Twitter data. The proposed method achieved a recall, precision, [Formula: see text]-measure, training loss, training accuracy, testing loss, and test accuracy of 0.85, 0.89, 0.87, 0.09, 0.78, 0.13, and 0.77, respectively.


Author(s):  
Rafly Indra Kurnia ◽  
◽  
Abba Suganda Girsang

This study will classify the text based on the rating of the provider application on the Google Play Store. This research is classification of user comments using Word2vec and the deep learning algorithm in this case is Long Short Term Memory (LSTM) based on the rating given with a rating scale of 1-5 with a detailed rating 1 is the lowest and rating 5 is the highest data and a rating scale of 1-3 with a detailed rating, 1 as a negative is a combination of ratings 1 and 2, rating 2 as a neutral is rating 3, and rating 3 as a positive is a combination of ratings 4 and 5 to get sentiment from users using SMOTE oversampling to handle the imbalance data. The data used are 16369 data. The training data and the testing data will be taken from user comments MyTelkomsel’s application from the play.google.com site where each comment has a rating in Indonesian Language. This review data will be very useful for companies to make business decisions. This data can be obtained from social media, but social media does not provide a rating feature for every user comment. This research goal is that data from social media such as Twitter or Facebook can also quickly find out the total of the user satisfaction based from the rating from the comment given. The best f1 scores and precisions obtained using 5 classes with LSTM and SMOTE were 0.62 and 0.70 and the best f1 scores and precisions obtained using 3 classes with LSTM and SMOTE were 0.86 and 0.87


Author(s):  
Guimei Wang ◽  
Xuehui Li ◽  
Lijie Yang

Real-time and accurate measurement of coal quantity is the key to energy-saving and speed regulation of belt conveyor. The electronic belt scale and the nuclear scale are the commonly used methods for detecting coal quantity. However, the electronic belt scale uses contact measurement with low measurement accuracy and a large error range. Although nuclear detection methods have high accuracy, they have huge potential safety hazards due to radiation. Due to the above reasons, this paper presents a method of coal quantity detection and classification based on machine vision and deep learning. This method uses an industrial camera to collect the dynamic coal quantity images of the conveyor belt irradiated by the laser transmitter. After preprocessing, skeleton extraction, laser line thinning, disconnection connection, image fusion, and filling, the collected images are processed to obtain coal flow cross-sectional images. According to the cross-sectional area and the belt speed of the belt conveyor, the coal volume per unit time is obtained, and the dynamic coal quantity detection is realized. On this basis, in order to realize the dynamic classification of coal quantity, the coal flow cross-section images corresponding to different coal quantities are divided into coal type images to establish the coal quantity data set. Then, a Dense-VGG network for dynamic coal classification is established by the VGG16 network. After the network training is completed, the dynamic classification performance of the method is verified through the experimental platform. The experimental results show that the classification accuracy reaches 94.34%, and the processing time of a single frame image is 0.270[Formula: see text]s.


2017 ◽  
Author(s):  
Ariel Rokem ◽  
Yue Wu ◽  
Aaron Lee

AbstractDeep learning algorithms have tremendous potential utility in the classification of biomedical images. For example, images acquired with retinal optical coherence tomography (OCT) can be used to accurately classify patients with adult macular degeneration (AMD), and distinguish them from healthy control patients. However, previous research has suggested that large amounts of data are required in order to train deep learning algorithms, because of the large number of parameters that need to be fit. Here, we show that a moderate amount of data (data from approximately 1,800 patients) may be enough to reach close-to-maximal performance in the classification of AMD patients from OCT images. These results suggest that deep learning algorithms can be trained on moderate amounts of data, provided that images are relatively homogenous, and the effective number of parameters is sufficiently small. Furthermore, we demonstrate that in this application, cross-validation with a separate test set that is not used in any part of the training does not differ substantially from cross-validation with a validation data-set used to determine the optimal stopping point for training.


Sign in / Sign up

Export Citation Format

Share Document