Efficient text feature extraction by integrating the average linkage and K-medoids clustering

Modern Physics Letters B ◽

10.1142/s0217984921501517 ◽

2021 ◽

pp. 2150151

Author(s):

Dasong Sun

Keyword(s):

Feature Extraction ◽

Text Classification ◽

Experimental Results ◽

The Other ◽

Central Feature ◽

Number Of Clusters ◽

Average Linkage ◽

Text Feature

By clustering feature words, we can not only simplify the dimension of feature subsets, but also eliminate the redundancy of the feature. However, for a feature set with very large dimensions, the traditional [Formula: see text]-medoids algorithm is difficult to accurately estimate the value of [Formula: see text]. Moreover, the clustering results of the average linkage (AL) algorithm cannot be divided again, and the AL algorithm cannot be directly used for text classification. In order to overcome the limitations of AL and [Formula: see text]-medoids, in this paper, we combine the two algorithms together so as to be mutually complementary to each other. In particular, in order to meet the purpose of text classification, we improve the AL algorithm and propose the [Formula: see text] testing statistics to obtain the approximate number of clusters. Finally, the central feature words are preserved, and the other feature words are deleted. The experimental results show that the new algorithm largely eliminates the redundancy of the feature. Compared with the traditional TF-IDF algorithms, the performance of the text classification of the new algorithm is improved.

Download Full-text

Applied-Information Technology with Distributed Text Feature Extraction Method Based on MapReduce

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.1046.444 ◽

2014 ◽

Vol 1046 ◽

pp. 444-448 ◽

Cited By ~ 1

Author(s):

Lu Chen ◽

Tao Zhang ◽

Yuan Yuan Ma ◽

Cheng Zhou

Keyword(s):

Information Technology ◽

Feature Extraction ◽

Text Classification ◽

Extraction Method ◽

Text Processing ◽

Rapid Development ◽

Internet Technology ◽

Feature Extraction Method ◽

Computing Model ◽

Text Feature

With the rapid development of Internet technology and information technology, the emergence of a large number of document data, text classification techniques for handling massive amounts of data is becoming increasingly important. This paper presents a distributed text feature extraction method based on distributed computing model—MapReduce. In the process of mass text processing, solve the problem of processing text size limit and inadequate performance, provide the research of text feature extraction method a new way of thinking.

Download Full-text

Comparison and Improvements of Feature Extraction Methods for Text Categorization

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.599-601.1824 ◽

2014 ◽

Vol 599-601 ◽

pp. 1824-1828

Author(s):

Juan Wang ◽

Zhi Xun Zhang ◽

Yong Dong Wang

Keyword(s):

Feature Extraction ◽

Mutual Information ◽

Text Classification ◽

Text Categorization ◽

Information Gain ◽

Extraction Methods ◽

Improved Method ◽

Document Frequency ◽

Text Feature

Feature extraction is a key point of text categorization[1]. The accuracy of extraction will directly affect the accuracy of text classification. This paper introduces and compares 4 commonly used methods of text feature extraction: IG (Information gain), MI (Mutual information), CHI (statistics), DF (Document frequency), and proposes an improved method based on the method of CHI. Experiment result shows that the proposed method can improve the accuracy of text categorization.

Download Full-text

Feature Extraction and Classification of EHG between Pregnancy and Labour Group Using Hilbert-Huang Transform and Extreme Learning Machine

Computational and Mathematical Methods in Medicine ◽

10.1155/2017/7949507 ◽

2017 ◽

Vol 2017 ◽

pp. 1-9 ◽

Cited By ~ 14

Author(s):

Lili Chen ◽

Yaru Hao

Keyword(s):

Feature Extraction ◽

Analytic Function ◽

Extreme Learning Machine ◽

Maximum Amplitude ◽

Experimental Results ◽

Intrinsic Mode Functions ◽

Hilbert Huang Transform ◽

Learning Machine ◽

Labour Group

Preterm birth (PTB) is the leading cause of perinatal mortality and long-term morbidity, which results in significant health and economic problems. The early detection of PTB has great significance for its prevention. The electrohysterogram (EHG) related to uterine contraction is a noninvasive, real-time, and automatic novel technology which can be used to detect, diagnose, or predict PTB. This paper presents a method for feature extraction and classification of EHG between pregnancy and labour group, based on Hilbert-Huang transform (HHT) and extreme learning machine (ELM). For each sample, each channel was decomposed into a set of intrinsic mode functions (IMFs) using empirical mode decomposition (EMD). Then, the Hilbert transform was applied to IMF to obtain analytic function. The maximum amplitude of analytic function was extracted as feature. The identification model was constructed based on ELM. Experimental results reveal that the best classification performance of the proposed method can reach an accuracy of 88.00%, a sensitivity of 91.30%, and a specificity of 85.19%. The area under receiver operating characteristic (ROC) curve is 0.88. Finally, experimental results indicate that the method developed in this work could be effective in the classification of EHG between pregnancy and labour group.

Download Full-text

Text Classification of Gujarati Newspaper Headlines

International Journal of Asian Language Processing ◽

10.1142/s2717554520500204 ◽

2021 ◽

pp. 2050020

Author(s):

Stuti Mehta ◽

Suman K. Mitra

Keyword(s):

Feature Extraction ◽

Language Processing ◽

Text Classification ◽

Low Resource ◽

Textual Data ◽

Gujarati Language ◽

News Headlines ◽

Embedding Methods ◽

Insight Into

Text classification is an extremely important area of Natural Language Processing (NLP). This paper studies various methods for embedding and classification in the Gujarati language. The dataset comprises of Gujarati News Headlines classified into various categories. Different embedding methods for Gujarati language and various classifiers are used to classify the headlines into given categories. Gujarati is a low resource language. This language is not commonly worked upon. This paper deals with one of the most important NLP tasks - classification and along with it, an idea about various embedding techniques for Gujarati language can be obtained since they help in feature extraction for the process of classification. This paper first performs embedding to get a valid representation of the textual data and then uses already existing robust classifiers to perform classification over the embedded data. Additionally, the paper provides an insight into how various NLP tasks can be performed over a low resource language like Gujarati. Finally, the research paper carries out a comparative analysis between the performances of various existing methods of embedding and classification to get an idea of which combination gives a better outcome.

Download Full-text

Automated seizure diagnosis system based on feature extraction and channel selection using EEG signals

Brain Informatics ◽

10.1186/s40708-021-00123-7 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Athar A. Ein Shoka ◽

Monagi H. Alkinani ◽

A. S. El-Sherbeny ◽

Ayman El-Sayed ◽

Mohamed M. Dessouky

Keyword(s):

Feature Extraction ◽

Channel Selection ◽

Ensemble Classifier ◽

Test Method ◽

The Other ◽

Continuous Case ◽

Knn Classifier ◽

Fourth Step ◽

Magnetic Resonance Imaging Mri

AbstractSeizure is an abnormal electrical activity of the brain. Neurologists can diagnose the seizure using several methods such as neurological examination, blood tests, computerized tomography (CT), magnetic resonance imaging (MRI) and electroencephalogram (EEG). Medical data, such as the EEG signal, usually includes a number of features and attributes that do not contains important information. This paper proposes an automatic seizure classification system based on extracting the most significant EEG features for seizure diagnosis. The proposed algorithm consists of five steps. The first step is the channel selection to minimize dimensionality by selecting the most affected channels using the variance parameter. The second step is the feature extraction to extract the most relevant features, 11 features, from the selected channels. The third step is to average the 11 features extracted from each channel. Next, the fourth step is the classification of the average features using the classification step. Finally, cross-validation and testing the proposed algorithm by dividing the dataset into training and testing sets. This paper presents a comparative study of seven classifiers. These classifiers were tested using two different methods: random case testing and continuous case testing. In the random case process, the KNN classifier had greater precision, specificity, positive predictability than the other classifiers. Still, the ensemble classifier had a higher sensitivity and a lower miss-rate (2.3%) than the other classifiers. For the continuous case test method, the ensemble classifier had higher metric parameters than the other classifiers. In addition, the ensemble classifier was able to detect all seizure cases without any mistake.

Download Full-text

Consistency in Latency Measurements and Interpretation of ABR Tracings

American Journal of Audiology ◽

10.1044/1059-0889.0601.57 ◽

1997 ◽

Vol 6 (1) ◽

pp. 57-62 ◽

Cited By ~ 2

Author(s):

Wayne O. Olsen ◽

Terri L. Pratt ◽

Christopher D. Bauch

Keyword(s):

The Other

Multichannel ABR recordings for 30 otoneurologic patients were reviewed independently by three audiologists to assess interjudge consistency in determining absolute latencies and overall interpretation of ABR results. Four months later, the tracings were reviewed a second time to evaluate intrajudge consistency in interpretation of ABR waveforms. Interjudge agreement in marking latencies for waves I, III, and V within 0.2 ms was on the order of 90% or better. Intrajudge consistency was slightly higher. Only rarely did inter- or intrajudge differences in latency measurements exceed 0.3 ms. Agreement in overall interpretation of ABR results as "normal" or "abnormal" was unanimous for 90% of the patients. Across pairs of judges, the agreement for "normal" and "abnormal" classification of the ABR tracings was 97%. Intrajudge consistency for "normal" and "abnormal" categorization of the ABR results was 100% for one judge, 97% for the other two judges.

Download Full-text

RHYNCHOLITES AND THE PROBLEM OF NARROW AND BROAD CONCEPTION OF TAXONS

Proceedings of higher educational establishments Geology and Exploration ◽

10.32454/0016-7762-2018-1-12-17 ◽

2018 ◽

pp. 12-17 ◽

Cited By ~ 3

Author(s):

I. R. Khuzina ◽

V. N. Komarov

Keyword(s):

Sexual Dimorphism ◽

Morphological Characteristics ◽

Mass Scale ◽

Individual Variability ◽

Point Of View ◽

The Other ◽

New Taxon ◽

Artificial System ◽

Broad Understanding

The paper considers a point of view, based on the conception of the broad understanding of taxons. According to this point of view, rhyncholites of the subgenus Dentatobeccus and Microbeccus are accepted to be synonymous with the genus Rhynchoteuthis, and subgenus Romanovichella is considered to be synonymous with the genus Palaeoteuthis. The criteria, exercising influence on the different approaches to the classification of rhyncholites, have been analyzed (such as age and individual variability, sexual dimorphism, pathological and teratological features, degree of disintegration of material), underestimation of which can lead to inaccuracy. Divestment of the subgenuses Dentatobeccus, Microbeccus and Romanovichella, possessing very bright morphological characteristics, to have an independent status and denomination to their synonyms, has been noted to be unjustified. An artificial system (any suggested variant) with all its minuses is a single probable system for rhyncholites. The main criteria, minimizing its negative sides and proving the separation of the new taxon, is an available mass-scale material. The narrow understanding of the genus, used in sensible limits, has been underlined to simplify the problem of the passing the view about the genus to the other investigators and recognition of rhyncholites for the practical tasks.

Download Full-text

A Brief Survey on Text Classification Using Various Machine Learning Techniques

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse.v8i1.521 ◽

2018 ◽

Vol 8 (1) ◽

pp. 14

Author(s):

Padmavathi .S ◽

M. Chidambaram

Keyword(s):

Machine Learning ◽

Text Classification ◽

Fixed Number ◽

Machine Learning Techniques ◽

Online Information ◽

Rule Based ◽

Learning Techniques ◽

Machine Learning Approach ◽

Rule Based Approach

Text classification has grown into more significant in managing and organizing the text data due to tremendous growth of online information. It does classification of documents in to fixed number of predefined categories. Rule based approach and Machine learning approach are the two ways of text classification. In rule based approach, classification of documents is done based on manually defined rules. In Machine learning based approach, classification rules or classifier are defined automatically using example documents. It has higher recall and quick process. This paper shows an investigation on text classification utilizing different machine learning techniques.

Download Full-text

Somatoform disorders: diseases of the civilization

Vestnik nevrologii, psihiatrii i nejrohirurgii (Bulletin of Neurology, Psychiatry and Neurosurgery) ◽

10.33920/med-01-2002-03 ◽

2020 ◽

pp. 25-30

Author(s):

I. Kukhtevich

Keyword(s):

Health Professionals ◽

Somatoform Disorders ◽

Neuropsychiatric Disorders ◽

The Other ◽

Significant Part ◽

Organic Basis ◽

Autonomic Disorders ◽

The One ◽

Somatic Diseases

Functional autonomic disorders occupy a significant part in the practice of neurologists and professionals of other specialties as well. However, there is no generally accepted classification of such disorders. In this paper the authors tried to show that functional autonomic pathology corresponds to the concept of somatoform disorders combining syndromes manifested by visceral, borderline psychopathological, neurological symptoms that do not have an organic basis. The relevance of the problem of somatoform disorders is that on the one hand many health professionals are not familiar enough with manifestations of borderline neuropsychiatric disorders, often forming functional autonomic disorders, and on the other hand they overestimate somatoform symptoms that are similar to somatic diseases.

Download Full-text

STUDY OF ORNAMENTS IN HUNTO SULTAN AMAY MOSQUE GORONTALO

ARTic ◽

10.34010/artic.v4i0.2418 ◽

2019 ◽

Vol 4 ◽

pp. 167-176

Author(s):

Risti Puspita Sari Hunowu

Keyword(s):

Qualitative Method ◽

The Other ◽

Visual Form ◽

Research Material ◽

Other Hand ◽

The World ◽

Existing Form ◽

The City ◽

Dutch Colonial

This research is aimed at studying the Hunto Sultan Amay Mosque located in Gorontalo City. Hunto Sultan Amay Mosque is the oldest mosque in the city of Gorontalo The Hunto Sultan Amay Mosque was built as proof of Sultan Amay's love for a daughter and is a representation of Islam in Gorontalo. Researchers will investigate the visual form of the Hunto Sultan Amay Mosque which was originally like an ancient mosque in the archipelago. can be seen from the shape of the roof which initially used an overlapping roof and then converted into a dome as well as mosques in the world, we can be sure the Hunto Sultan Amay Mosque uses a dome roof after the arrival of Dutch Colonial. The researcher used a qualitative method by observing the existing form in detail from the building of the mosque with an aesthetic approach, reviewing objects and selecting the selected ornament giving a classification of the shapes, so that the section became a reference for the author as research material. Based on the analysis of this thesis, the form of the Hunto Sultan Amay mosque as well as the mosques located in the archipelago and the existence of ornaments in the Hunto Sultan Amay Mosque as a decorative structure support the grandeur of a mosque. On the other hand, Hunto Mosque ornaments reveal a teaching. The form of a teaching is manifested in the form of motives and does not depict living beings in a realist or naturalist manner. the decorative forms of the Hunto Sultan Sultan Mosque in general tend to lead to a form of flora, geometric ornaments, and ornament of calligraphy dominated by the distinctive colors of Islam, namely gold, white, red, yellow and green.

Download Full-text