high dimension data
Recently Published Documents


TOTAL DOCUMENTS

56
(FIVE YEARS 15)

H-INDEX

4
(FIVE YEARS 1)

2021 ◽  
Vol 18 (23) ◽  
pp. 680
Author(s):  
Mohammad Reza Faisal ◽  
Radityo Adi Nugroho ◽  
Rahmat Ramadhani ◽  
Friska Abadi ◽  
Rudy Herteno ◽  
...  

Researchers have collected Twitter data to study a wide range of topics, one of which is a natural disaster. A social network sensor was developed in existing research to filter natural disaster information from direct eyewitnesses, none eyewitnesses, and non-natural disaster information. It can be used as a tool for early warning or monitoring when natural disasters occur. The main component of the social network sensor is the text tweet classification. Similar to text classification research in general, the challenge is the feature extraction method to convert Twitter text into structured data. The strategy commonly used is vector space representation. However, it has the potential to produce high dimension data. This research focuses on the feature extraction method to resolve high dimension data issues. We propose a hybrid approach of word2vec-based and lexicon-based feature extraction to produce new features. The Experiment result shows that the proposed method has fewer features and improves classification performance with an average AUC value of 0.84, and the number of features is 150. The value is obtained by using only the word2vec-based method. In the end, this research shows that lexicon-based did not influence the improvement in the performance of social network sensor predictions in natural disasters. HIGHLIGHTS Implementation of text classification is generally only used to perform sentiment analysis, it is still rare to use it to perform text classification for use in determining direct eyewitnesses in cases of natural disasters One of the common problems in text mining research is the extracted features from the vector space representation method generate high dimension data A hybrid approach of word2vec-based and lexicon-based feature extraction experiment was conducted in order to find a method that can generate new features with low dimensions and also improve the classification performance GRAPHICAL ABSTRACT


Author(s):  
Haoyang Cheng ◽  
Wenquan Cui

Heteroscedasticity often appears in the high-dimensional data analysis. In order to achieve a sparse dimension reduction direction for high-dimensional data with heteroscedasticity, we propose a new sparse sufficient dimension reduction method, called Lasso-PQR. From the candidate matrix derived from the principal quantile regression (PQR) method, we construct a new artificial response variable which is made up from top eigenvectors of the candidate matrix. Then we apply a Lasso regression to obtain sparse dimension reduction directions. While for the “large [Formula: see text] small [Formula: see text]” case that [Formula: see text], we use principal projection to solve the dimension reduction problem in a lower-dimensional subspace and projection back to the original dimension reduction problem. Theoretical properties of the methodology are established. Compared with several existing methods in the simulations and real data analysis, we demonstrate the advantages of our method in the high dimension data with heteroscedasticity.


IEEE Access ◽  
2021 ◽  
pp. 1-1
Author(s):  
Mohammed Qaraad ◽  
Souad Amjad ◽  
Ibrahim I.M. Manhrawy ◽  
Hanaa Fathi ◽  
Bayoumi A. Hassan ◽  
...  

2020 ◽  
Vol 62 (12) ◽  
pp. 4717-4746
Author(s):  
Rodrigo Rocha Silva ◽  
Celso Massaki Hirata ◽  
Joubert de Castro Lima

Author(s):  
Ning Zhou ◽  
Jianhui Zhang ◽  
Binqiang Wang ◽  
Jia Xiao

AbstractMobile crowd sensing (MCS) is a novel emerging paradigm that leverages sensor-equipped smart mobile terminals (e.g., smartphones, tablets, and intelligent wearable devices) to collect information. Compared with traditional data collection methods, such as construct wireless sensor network infrastructures, MCS has advantages of lower data collection costs, easier system maintenance, and better scalability. However, the limited capabilities make a mobile crowd terminal only support limited data types, which may result in a failure of supporting high-dimension data collection tasks. This paper proposed a task allocation algorithm to solve the problem of high-dimensional data collection in mobile crowd sensing network. The low-cost and balance-participating algorithm (LCBPA) aims to reduce the data collection cost and improve the equality of node participation by trading-off between them. The LCBPA performs in two stages: in the first stage, it divides the high-dimensional data into fine-grained and smaller dimensional data, that is, dividing an m-dimension data collection task into k sub-task by K-means, where (k < m). In the second stage, it assigns different nodes with different sensing capability to perform sub-tasks. Simulation results show that the proposed method can improve the task completion ratio, minimizing the cost of data collection.


2019 ◽  
Author(s):  
E. Coissac ◽  
C. Gonindard-Melodelima

AbstractMotivationMolecular biology and ecology studies can produce high dimension data. Estimating correlations and shared variation between such data sets are an important step in disentangling the relationships between different elements of a biological system. Unfortunately, classical approaches are susceptible to producing falsely inferred correlations.ResultsHere we propose a corrected version of the Procrustean correlation coefficient that is robust to high dimensional data. This allows for a correct estimation of the shared variation between two data sets and the partial correlation coefficients between a set of matrix data.AvailabilityThe proposed corrected coefficients are implemented in the ProcMod R package available on CRAN. The git repository is hosted at https://git.metabarcoding.org/lecasofts/[email protected]


Sign in / Sign up

Export Citation Format

Share Document