A LSTM-cBiGAN based hybrid sampling method for time series customer classification

2021 ◽  
Author(s):  
Xin Pan ◽  
Bing Zhu ◽  
Yanbing Liu ◽  
Feijie Huang
2018 ◽  
Author(s):  
Jasper Naberman

In deze studie is gekeken naar de netwerkstructuur van resilience. Hiervoor is gebruik gemaakt van netwerkanalyse om enkele eigenschappen van resilience in netwerkmodellen te schatten. Extra aandacht is er voor positief affect, welke naar de broaden-and-build theory of positive emotions (Fredrickson, 1998) een centrale rol zou spelen in resilience. Er deed een totaal van 31 deelnemers mee waarbij twee weken lang driemaal daags time series data werd afgenomen naar de experience sampling method. Er werden temporal en contemporaneous individuele netwerkmodellen geschat, waaruit bleek dat positief affect een minder centrale rol vertolkt dan op basis van de broaden- and-build theory of positive emotions werd verwacht. De aanwezigheid van partiële correlaties geeft een indicatie van causaliteit tussen de knopen van de netwerken.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Liping Chen ◽  
Jiabao Jiang ◽  
Yong Zhang

The classical classifiers are ineffective in dealing with the problem of imbalanced big dataset classification. Resampling the datasets and balancing samples distribution before training the classifier is one of the most popular approaches to resolve this problem. An effective and simple hybrid sampling method based on data partition (HSDP) is proposed in this paper. First, all the data samples are partitioned into different data regions. Then, the data samples in the noise minority samples region are removed and the samples in the boundary minority samples region are selected as oversampling seeds to generate the synthetic samples. Finally, a weighted oversampling process is conducted considering the generation of synthetic samples in the same cluster of the oversampling seed. The weight of each selected minority class sample is computed by the ratio between the proportion of majority class in the neighbors of this selected sample and the sum of all these proportions. Generation of synthetic samples in the same cluster of the oversampling seed guarantees new synthetic samples located inside the minority class area. Experiments conducted on eight datasets show that the proposed method, HSDP, is better than or comparable with the typical sampling methods for F-measure and G-mean.


2019 ◽  
Author(s):  
Chin Lin ◽  
Yu-Sheng Lou ◽  
Dung-Jang Tsai ◽  
Chia-Cheng Lee ◽  
Chia-Jung Hsu ◽  
...  

BACKGROUND Most current state-of-the-art models for searching the International Classification of Diseases, Tenth Revision Clinical Modification (ICD-10-CM) codes use word embedding technology to capture useful semantic properties. However, they are limited by the quality of initial word embeddings. Word embedding trained by electronic health records (EHRs) is considered the best, but the vocabulary diversity is limited by previous medical records. Thus, we require a word embedding model that maintains the vocabulary diversity of open internet databases and the medical terminology understanding of EHRs. Moreover, we need to consider the particularity of the disease classification, wherein discharge notes present only positive disease descriptions. OBJECTIVE We aimed to propose a projection word2vec model and a hybrid sampling method. In addition, we aimed to conduct a series of experiments to validate the effectiveness of these methods. METHODS We compared the projection word2vec model and traditional word2vec model using two corpora sources: English Wikipedia and PubMed journal abstracts. We used seven published datasets to measure the medical semantic understanding of the word2vec models and used these embeddings to identify the three–character-level ICD-10-CM diagnostic codes in a set of discharge notes. On the basis of embedding technology improvement, we also tried to apply the hybrid sampling method to improve accuracy. The 94,483 labeled discharge notes from the Tri-Service General Hospital of Taipei, Taiwan, from June 1, 2015, to June 30, 2017, were used. To evaluate the model performance, 24,762 discharge notes from July 1, 2017, to December 31, 2017, from the same hospital were used. Moreover, 74,324 additional discharge notes collected from seven other hospitals were tested. The F-measure, which is the major global measure of effectiveness, was adopted. RESULTS In medical semantic understanding, the original EHR embeddings and PubMed embeddings exhibited superior performance to the original Wikipedia embeddings. After projection training technology was applied, the projection Wikipedia embeddings exhibited an obvious improvement but did not reach the level of original EHR embeddings or PubMed embeddings. In the subsequent ICD-10-CM coding experiment, the model that used both projection PubMed and Wikipedia embeddings had the highest testing mean F-measure (0.7362 and 0.6693 in Tri-Service General Hospital and the seven other hospitals, respectively). Moreover, the hybrid sampling method was found to improve the model performance (F-measure=0.7371/0.6698). CONCLUSIONS The word embeddings trained using EHR and PubMed could understand medical semantics better, and the proposed projection word2vec model improved the ability of medical semantics extraction in Wikipedia embeddings. Although the improvement from the projection word2vec model in the real ICD-10-CM coding task was not substantial, the models could effectively handle emerging diseases. The proposed hybrid sampling method enables the model to behave like a human expert.


Author(s):  
Hartono Hartono ◽  
Erianto Ongko

Class imbalance is one of the main problems in classification because the number of samples in majority class is far more than the number of samples in minority class.  The class imbalance problem in the multi-class dataset is much more difficult to handle than the problem in the two class dataset. This multi-class imbalance problem is even more complicated if it is accompanied by overlapping. One method that has proven reliable in dealing with this problem is the Hybrid Approach Redefinition-Multiclass Imbalance (HAR-MI) method which is classified as a hybrid approach which combines sampling and classifier ensembles. However, in terms of diversity among classifiers, hybrid approach that combine sampling and classifier ensembles will give better results. HAR-MI delivers excellent results in handling multi-class imbalances. The HAR-MI method uses SMOTE to increase the number of sample in minority class. However, this SMOTE also has a weakness where if there is an extremely imbalanced dataset and a large number of attributes there will be over-fitting. To overcome the problem of over-fitting, the Hybrid Sampling method was proposed. HAR-MI combination with Hybrid Sampling is done to increase the number of samples in the minority class and at the same time reduce the number of noise samples in the majority class. The preprocessing stages at HAR-MI will use the Minimizing Overlapping Selection under Hybrid Sazmpling (MOSHS) method and the processing stages will use Different Contribution Sampling. The results obtained will be compared with the results using Neighbourhood-based undersampling. Overlapping and Classifier Performance will be measured using Augmented R-Value, the Matthews Correlation Coefficient (MCC), Precision, Recall, and F-Value. The results showed that HAR-MI with Hybrid Sampling gave better results in terms of Augmented R-Value, Precision, Recall, and F-Value.


Sign in / Sign up

Export Citation Format

Share Document