A LSTM-cBiGAN based hybrid sampling method for time series customer classification

In deze studie is gekeken naar de netwerkstructuur van resilience. Hiervoor is gebruik gemaakt van netwerkanalyse om enkele eigenschappen van resilience in netwerkmodellen te schatten. Extra aandacht is er voor positief affect, welke naar de broaden-and-build theory of positive emotions (Fredrickson, 1998) een centrale rol zou spelen in resilience. Er deed een totaal van 31 deelnemers mee waarbij twee weken lang driemaal daags time series data werd afgenomen naar de experience sampling method. Er werden temporal en contemporaneous individuele netwerkmodellen geschat, waaruit bleek dat positief affect een minder centrale rol vertolkt dan op basis van de broaden- and-build theory of positive emotions werd verwacht. De aanwezigheid van partiële correlaties geeft een indicatie van causaliteit tussen de knopen van de netwerken.

Download Full-text

HSDP: A Hybrid Sampling Method for Imbalanced Big Data Based on Data Partition

Complexity ◽

10.1155/2021/6877284 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Liping Chen ◽

Jiabao Jiang ◽

Yong Zhang

Keyword(s):

Big Data ◽

Sampling Method ◽

Sampling Methods ◽

Data Partition ◽

Minority Class ◽

F Measure ◽

Better Than ◽

Hybrid Sampling

The classical classifiers are ineffective in dealing with the problem of imbalanced big dataset classification. Resampling the datasets and balancing samples distribution before training the classifier is one of the most popular approaches to resolve this problem. An effective and simple hybrid sampling method based on data partition (HSDP) is proposed in this paper. First, all the data samples are partitioned into different data regions. Then, the data samples in the noise minority samples region are removed and the samples in the boundary minority samples region are selected as oversampling seeds to generate the synthetic samples. Finally, a weighted oversampling process is conducted considering the generation of synthetic samples in the same cluster of the oversampling seed. The weight of each selected minority class sample is computed by the ratio between the proportion of majority class in the neighbors of this selected sample and the sum of all these proportions. Generation of synthetic samples in the same cluster of the oversampling seed guarantees new synthetic samples located inside the minority class area. Experiments conducted on eight datasets show that the proposed method, HSDP, is better than or comparable with the typical sampling methods for F-measure and G-mean.

Download Full-text

An Efficient Hybrid Sampling Method for Neural Network-Based Microwave Component Modeling and Optimization

IEEE Microwave and Wireless Components Letters ◽

10.1109/lmwc.2020.2995858 ◽

2020 ◽

Vol 30 (7) ◽

pp. 625-628

Author(s):

Zhen Zhang ◽

Qingsha S. Cheng ◽

Hongcai Chen ◽

Fan Jiang

Keyword(s):

Neural Network ◽

Sampling Method ◽

Modeling And Optimization ◽

Microwave Component ◽

Hybrid Sampling

Download Full-text

Hybrid sampling method for autoregressive classification trees under density-weighted curvature distance

Enterprise Information Systems ◽

10.1080/17517575.2020.1762245 ◽

2020 ◽

pp. 1-20

Author(s):

Hua Ye ◽

Xilong Qu ◽

Shengzong Liu ◽

Guang Li

Keyword(s):

Sampling Method ◽

Classification Trees ◽

Hybrid Sampling

Download Full-text

Projection Word Embedding Model With Hybrid Sampling Training for Classifying ICD-10-CM Codes: Longitudinal Observational Study (Preprint)

10.2196/preprints.14499 ◽

2019 ◽

Author(s):

Chin Lin ◽

Yu-Sheng Lou ◽

Dung-Jang Tsai ◽

Chia-Cheng Lee ◽

Chia-Jung Hsu ◽

...

Keyword(s):

General Hospital ◽

Sampling Method ◽

Model Performance ◽

Word Embedding ◽

Superior Performance ◽

Word Embeddings ◽

Technology Improvement ◽

Icd 10 ◽

F Measure ◽

Hybrid Sampling

BACKGROUND Most current state-of-the-art models for searching the International Classification of Diseases, Tenth Revision Clinical Modification (ICD-10-CM) codes use word embedding technology to capture useful semantic properties. However, they are limited by the quality of initial word embeddings. Word embedding trained by electronic health records (EHRs) is considered the best, but the vocabulary diversity is limited by previous medical records. Thus, we require a word embedding model that maintains the vocabulary diversity of open internet databases and the medical terminology understanding of EHRs. Moreover, we need to consider the particularity of the disease classification, wherein discharge notes present only positive disease descriptions. OBJECTIVE We aimed to propose a projection word2vec model and a hybrid sampling method. In addition, we aimed to conduct a series of experiments to validate the effectiveness of these methods. METHODS We compared the projection word2vec model and traditional word2vec model using two corpora sources: English Wikipedia and PubMed journal abstracts. We used seven published datasets to measure the medical semantic understanding of the word2vec models and used these embeddings to identify the three–character-level ICD-10-CM diagnostic codes in a set of discharge notes. On the basis of embedding technology improvement, we also tried to apply the hybrid sampling method to improve accuracy. The 94,483 labeled discharge notes from the Tri-Service General Hospital of Taipei, Taiwan, from June 1, 2015, to June 30, 2017, were used. To evaluate the model performance, 24,762 discharge notes from July 1, 2017, to December 31, 2017, from the same hospital were used. Moreover, 74,324 additional discharge notes collected from seven other hospitals were tested. The F-measure, which is the major global measure of effectiveness, was adopted. RESULTS In medical semantic understanding, the original EHR embeddings and PubMed embeddings exhibited superior performance to the original Wikipedia embeddings. After projection training technology was applied, the projection Wikipedia embeddings exhibited an obvious improvement but did not reach the level of original EHR embeddings or PubMed embeddings. In the subsequent ICD-10-CM coding experiment, the model that used both projection PubMed and Wikipedia embeddings had the highest testing mean F-measure (0.7362 and 0.6693 in Tri-Service General Hospital and the seven other hospitals, respectively). Moreover, the hybrid sampling method was found to improve the model performance (F-measure=0.7371/0.6698). CONCLUSIONS The word embeddings trained using EHR and PubMed could understand medical semantics better, and the proposed projection word2vec model improved the ability of medical semantics extraction in Wikipedia embeddings. Although the improvement from the projection word2vec model in the real ICD-10-CM coding task was not substantial, the models could effectively handle emerging diseases. The proposed hybrid sampling method enables the model to behave like a human expert.

Download Full-text

Combining Hybrid Approach Redefinition-Multiclass Imbalance (HAR-MI) and Hybrid Sampling in Handling Multi-Class Imbalance and Overlapping

JOIV International Journal on Informatics Visualization ◽

10.30630/joiv.5.1.420 ◽

2021 ◽

Vol 5 (1) ◽

Author(s):

Hartono Hartono ◽

Erianto Ongko

Keyword(s):

Sampling Method ◽

Hybrid Approach ◽

Class Imbalance ◽

Classifier Ensembles ◽

Class Imbalance Problem ◽

Minority Class ◽

Imbalance Problem ◽

Classifier Performance ◽

R Value ◽

Hybrid Sampling

Class imbalance is one of the main problems in classification because the number of samples in majority class is far more than the number of samples in minority class. The class imbalance problem in the multi-class dataset is much more difficult to handle than the problem in the two class dataset. This multi-class imbalance problem is even more complicated if it is accompanied by overlapping. One method that has proven reliable in dealing with this problem is the Hybrid Approach Redefinition-Multiclass Imbalance (HAR-MI) method which is classified as a hybrid approach which combines sampling and classifier ensembles. However, in terms of diversity among classifiers, hybrid approach that combine sampling and classifier ensembles will give better results. HAR-MI delivers excellent results in handling multi-class imbalances. The HAR-MI method uses SMOTE to increase the number of sample in minority class. However, this SMOTE also has a weakness where if there is an extremely imbalanced dataset and a large number of attributes there will be over-fitting. To overcome the problem of over-fitting, the Hybrid Sampling method was proposed. HAR-MI combination with Hybrid Sampling is done to increase the number of samples in the minority class and at the same time reduce the number of noise samples in the majority class. The preprocessing stages at HAR-MI will use the Minimizing Overlapping Selection under Hybrid Sazmpling (MOSHS) method and the processing stages will use Different Contribution Sampling. The results obtained will be compared with the results using Neighbourhood-based undersampling. Overlapping and Classifier Performance will be measured using Augmented R-Value, the Matthews Correlation Coefficient (MCC), Precision, Recall, and F-Value. The results showed that HAR-MI with Hybrid Sampling gave better results in terms of Augmented R-Value, Precision, Recall, and F-Value.

Download Full-text

RANDOM SAMPLING METHOD FOR CRYPTOCURRENCY MARKET TIME SERIES FORECASTING

Systems and Means of Informatics ◽

10.14357/08696527190406 ◽

2019 ◽

Keyword(s):

Time Series ◽

Random Sampling ◽

Sampling Method ◽

Time Series Forecasting

Download Full-text

A LSTM-cBiGAN based hybrid sampling method for time series customer classification

An Effective Method for Imbalanced Time Series Classification: Hybrid Sampling

HYBS: A Novel Hybrid Sampling Method for Learning from Imbalanced Data Sets

A hybrid sampling method for imbalanced data

De Netwerkstructuur van Resilience: Positief Affect in Relatie tot Emoties

HSDP: A Hybrid Sampling Method for Imbalanced Big Data Based on Data Partition

An Efficient Hybrid Sampling Method for Neural Network-Based Microwave Component Modeling and Optimization

Hybrid sampling method for autoregressive classification trees under density-weighted curvature distance

Projection Word Embedding Model With Hybrid Sampling Training for Classifying ICD-10-CM Codes: Longitudinal Observational Study (Preprint)

Combining Hybrid Approach Redefinition-Multiclass Imbalance (HAR-MI) and Hybrid Sampling in Handling Multi-Class Imbalance and Overlapping

RANDOM SAMPLING METHOD FOR CRYPTOCURRENCY MARKET TIME SERIES FORECASTING

Export Citation Format