Feature Extraction Method Based on Social Network Analysis

An effective feature extraction method is key to improving the accuracy of a prediction model. From the Gene Expression Omnibus (GEO) database, which includes 13,487 genes, we obtained microarray gene expression data for 238 samples from colorectal cancer (CRC) samples and normal samples. Twelve gene modules were obtained by weighted gene co-expression network analysis (WGCNA) on 173 samples. By calculating the Pearson correlation coefficient (PCC) between the characteristic genes of each module and colorectal cancer, we obtained a key module that was highly correlated with CRC. We screened hub genes from the key module by considering module membership, gene significance, and intramodular connectivity. We selected 10 hub genes as a type of feature for the classifier. We used the variational autoencoder (VAE) for 1159 genes with significantly different expressions and mapped the data into a 10-dimensional representation, as another type of feature for the cancer classifier. The two types of features were applied to the support vector machines (SVM) classifier for CRC. The accuracy was 0.9692 with an AUC of 0.9981. The result shows a high accuracy of the two-step feature extraction method, which includes obtaining hub genes by WGCNA and a 10-dimensional representation by variational autoencoder (VAE).

Download Full-text

Research on Feature Extraction Method of Social Network Text

Journal of New Media ◽

10.32604/jnm.2021.018923 ◽

2021 ◽

Vol 3 (2) ◽

pp. 73-80

Author(s):

Zheng Zhang ◽

Shu Zhou

Keyword(s):

Feature Extraction ◽

Social Network ◽

Extraction Method ◽

Feature Extraction Method

Download Full-text

Natural Disaster on Twitter: Role of Feature Extraction Method of Word2Vec and Lexicon Based for Determining Direct Eyewitness

Trends in Sciences ◽

10.48048/tis.2021.680 ◽

2021 ◽

Vol 18 (23) ◽

pp. 680

Author(s):

Mohammad Reza Faisal ◽

Radityo Adi Nugroho ◽

Rahmat Ramadhani ◽

Friska Abadi ◽

Rudy Herteno ◽

...

Keyword(s):

Feature Extraction ◽

Social Network ◽

Natural Disasters ◽

Natural Disaster ◽

High Dimension ◽

Text Classification ◽

Extraction Method ◽

Hybrid Approach ◽

Feature Extraction Method ◽

High Dimension Data

Researchers have collected Twitter data to study a wide range of topics, one of which is a natural disaster. A social network sensor was developed in existing research to filter natural disaster information from direct eyewitnesses, none eyewitnesses, and non-natural disaster information. It can be used as a tool for early warning or monitoring when natural disasters occur. The main component of the social network sensor is the text tweet classification. Similar to text classification research in general, the challenge is the feature extraction method to convert Twitter text into structured data. The strategy commonly used is vector space representation. However, it has the potential to produce high dimension data. This research focuses on the feature extraction method to resolve high dimension data issues. We propose a hybrid approach of word2vec-based and lexicon-based feature extraction to produce new features. The Experiment result shows that the proposed method has fewer features and improves classification performance with an average AUC value of 0.84, and the number of features is 150. The value is obtained by using only the word2vec-based method. In the end, this research shows that lexicon-based did not influence the improvement in the performance of social network sensor predictions in natural disasters. HIGHLIGHTS Implementation of text classification is generally only used to perform sentiment analysis, it is still rare to use it to perform text classification for use in determining direct eyewitnesses in cases of natural disasters One of the common problems in text mining research is the extracted features from the vector space representation method generate high dimension data A hybrid approach of word2vec-based and lexicon-based feature extraction experiment was conducted in order to find a method that can generate new features with low dimensions and also improve the classification performance GRAPHICAL ABSTRACT

Download Full-text