Method for Classification of Unstructured Data in Telecommunication Services

Author(s):  
Motoi Iwashita ◽  
Ken Nishimatsu ◽  
Shinsuke Shimogawa
Algorithms ◽  
2018 ◽  
Vol 11 (10) ◽  
pp. 158 ◽  
Author(s):  
Sathya Madhusudhanan ◽  
Suresh Jaganathan ◽  
Jayashree L S

Unstructured data are irregular information with no predefined data model. Streaming data which constantly arrives over time is unstructured, and classifying these data is a tedious task as they lack class labels and get accumulated over time. As the data keeps growing, it becomes difficult to train and create a model from scratch each time. Incremental learning, a self-adaptive algorithm uses the previously learned model information, then learns and accommodates new information from the newly arrived data providing a new model, which avoids the retraining. The incrementally learned knowledge helps to classify the unstructured data. In this paper, we propose a framework CUIL (Classification of Unstructured data using Incremental Learning) which clusters the metadata, assigns a label for each cluster and then creates a model using Extreme Learning Machine (ELM), a feed-forward neural network, incrementally for each batch of data arrived. The proposed framework trains the batches separately, reducing the memory resources, training time significantly and is tested with metadata created for the standard image datasets like MNIST, STL-10, CIFAR-10, Caltech101, and Caltech256. Based on the tabulated results, our proposed work proves to show greater accuracy and efficiency.


2016 ◽  
Vol 71 (2) ◽  
pp. 160-171 ◽  
Author(s):  
A. A. Baranov ◽  
L. S. Namazova-Baranova ◽  
I. V. Smirnov ◽  
D. A. Devyatkin ◽  
A. O. Shelmanov ◽  
...  

The paper presents the system for intelligent analysis of clinical information. Authors describe methods implemented in the system for clinical information retrieval, intelligent diagnostics of chronic diseases, patient’s features importance and for detection of hidden dependencies between features. Results of the experimental evaluation of these methods are also presented.Background: Healthcare facilities generate a large flow of both structured and unstructured data which contain important information about patients. Test results are usually retained as structured data but some data is retained in the form of natural language texts (medical history, the results of physical examination, and the results of other examinations, such as ultrasound, ECG or X-ray studies). Many tasks arising in clinical practice can be automated applying methods for intelligent analysis of accumulated structured array and unstructured data that leads to improvement of the healthcare quality.Aims: the creation of the complex system for intelligent data analysis in the multi-disciplinary pediatric center.Materials and methods: Authors propose methods for information extraction from clinical texts in Russian. The methods are carried out on the basis of deep linguistic analysis. They retrieve terms of diseases, symptoms, areas of the body and drugs. The methods can recognize additional attributes such as «negation» (indicates that the disease is absent), «no patient» (indicates that the disease refers to the patient’s family member, but not to the patient), «severity of illness», «disease course», «body region to which the disease refers». Authors use a set of hand-drawn templates and various techniques based on machine learning to retrieve information using a medical thesaurus. The extracted information is used to solve the problem of automatic diagnosis of chronic diseases. A machine learning method for classification of patients with similar nosology and the method for determining the most informative patients’ features are also proposed.Results: Authors have processed anonymized health records from the pediatric center to estimate the proposed methods. The results show the applicability of the information extracted from the texts for solving practical problems. The records of patients with allergic, glomerular and rheumatic diseases were used for experimental assessment of the method of automatic diagnostic. Authors have also determined the most appropriate machine learning methods for classification of patients for each group of diseases, as well as the most informative disease signs. It has been found that using additional information extracted from clinical texts, together with structured data helps to improve the quality of diagnosis of chronic diseases. Authors have also obtained pattern combinations of signs of diseases.Conclusions: The proposed methods have been implemented in the intelligent data processing system for a multidisciplinary pediatric center. The experimental results show the availability of the system to improve the quality of pediatric healthcare. 


2011 ◽  
Vol 403-408 ◽  
pp. 3724-3728
Author(s):  
Chantima Ekwong ◽  
Sageemas Na Wichain ◽  
Choochart Haruechaiyasak

According to the laws of education in Thailand, the Office for National Education Standards and Quality Assessment is responsible for assessing the external educational institutes in order to develop the quality and educational standards. The external quality assessment reports are represented in both structured and unstructured data. In this paper, we focus on the analysis of unstructured data, i.e., to automatically classify strength and weakness points. We propose and evaluate two different classification models: Flat Classification and Hierarchical Classification. Three algorithms, Naive Bayes, Support Vector Machines (SVM) and Decision Tree, were used in the experiments. The results showed that classification viathe Hierarchical Classification model by using the SVM yielded the best performance. The classification of strength and weakness points yielded the F-measure equal to 0.843 and 0.893, respectively. The proposed approach can be applied as a decision support function for quality assessment in vocational education.


2020 ◽  
Author(s):  
Michela Cameletti ◽  
Silvia Fabris ◽  
Stephan Schlosser ◽  
Daniele Toninelli

Abstract In the era of social media, the huge availability of digital data (e.g. posts sent through social networks or unstructured data scraped from websites) allows to develop new types of research in a wide range of fields. These types of data are characterized by some advantages such as reduced collection costs, short retrieval times and production of almost real-time outputs. Nevertheless, their collection and analysis can be challenging. For example, particular approaches are required for the selection of posts related to specific topics; moreover, retrieving the information we are interested in inside Twitter posts can be a difficult task.The main aim of this paper is to propose an unsupervised dictionary-based method to filter tweets related to a specific topic, i.e. environment. We start from the tweets sent by a selection of Official Social Accounts clearly linked with the subject of interest. Then, a list of keywords is identified in order to set a topic-oriented dictionary. We test the performance of our method by applying the dictionary to more than 54 million geolocated tweets posted in Great Britain between January and May 2019.


2020 ◽  
Author(s):  
◽  
Erick Esteven Montelongo González

The existence of large volumes of data generated by the health area presents an important opportunity for analysis. This can obtain information to support physicians in the decisionmaking process for the diagnosis or treatment of diseases, such as cancer. The present work shows a methodology for the classification of patients with liver, lung and breast cancer, through machine learning models, to obtain the model that performs best in the classification. The methodology considers three classification models: Support Vector Machines (SVM), Multi-Layer Perceptron (MLP) and AdaBoost using both structured and unstructured information from the patient's clinical records. Results show that the best classification model is MLP using only unstructured data, obtaining 89% of precision, showing the usefulness of this type of data in the classification of cancer patients.


2015 ◽  
Vol 77 (18) ◽  
Author(s):  
Nurul Fathiyah Shamsudin ◽  
Halizah Basiron ◽  
Zurina Saaya ◽  
Ahmad Fadzli Nizam Abdul Rahman ◽  
Mohd Hafiz Zakaria ◽  
...  

Sentiment analysis is the computational study of people’s opinion or feedback, attitudes, and emotions toward entities, individuals, issues, events, topics and their attributes. There are many research conducted for other languages such as English, Spanish, French, and German. However, lack of research is conducted to harvest the information in Malay words and structure them into a meaningful data. The objective of this paper is to introduce a lexical based method in analysing sentiment of Facebook comments in Malay. Three types of lexical based techniques are implemented in order to identify the sentiment of Facebook comments. The techniques used are term counting, term score summation and average on comments. The comparison of accuracy, precision and recall for all techniques are computed. The result shows that the average on comments method outperforms the other two techniques.


2020 ◽  
Author(s):  
Michela Cameletti ◽  
Silvia Fabris ◽  
Stephan Schlosser ◽  
Daniele Toninelli

Abstract In the era of social media, the huge availability of digital data (e.g. posts sent through social networks or unstructured data scraped from websites) allows to develop new types of research in a wide range of fields. These types of data are characterized by some advantages such as reduced collection costs, short retrieval times and production of almost real-time outputs. Nevertheless, their collection and analysis can be challenging. For example, particular approaches are required for the selection of posts related to specific topics; moreover, retrieving the information we are interested in inside Twitter posts can be a difficult task.The main aim of this paper is to propose an unsupervised dictionary-based method to filter tweets related to a specific topic, i.e. environment. We start from the tweets sent by a selection of Official Social Accounts clearly linked with the subject of interest. Then, a list of keywords is identified in order to set a topic-oriented dictionary. We test the performance of our method by applying the dictionary to more than 54 million geolocated tweets posted in Great Britain between January and May 2019.


Sign in / Sign up

Export Citation Format

Share Document