scholarly journals Overcoming the ordinal imbalanced data problem by combining data processing and stacked generalizations

2021 ◽  
pp. 100241
Author(s):  
Marine Desprez ◽  
Kyle Zawada ◽  
Daniel Ramp
Author(s):  
Marcel Tilly ◽  
Stephan Reiff-Marganiec

The deluge of intelligent objects that are providing continuous access to data and services on one hand and the demand of developers and consumers to handle these data on the other hand require us to think about new communication paradigms and middleware. In hyper-scale systems, such as in the Internet of Things, large scale sensor networks or even mobile networks, one emerging requirement is to process, procure, and provide information with almost zero latency. This work is introducing new concepts for a middleware to enable fast communication by limiting information flow with filtering concepts using policy obligations and combining data processing techniques adopted from complex event processing.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Tianjun Li ◽  
Long Chen ◽  
Min Gan

Abstract Background Mass spectra are usually acquired from the Liquid Chromatography-Mass Spectrometry (LC-MS) analysis for isotope labeled proteomics experiments. In such experiments, the mass profiles of labeled (heavy) and unlabeled (light) peptide pairs are represented by isotope clusters (2D or 3D) that provide valuable information about the studied biological samples in different conditions. The core task of quality control in quantitative LC-MS experiment is to filter out low-quality peptides with questionable profiles. The commonly used methods for this problem are the classification approaches. However, the data imbalance problems in previous control methods are often ignored or mishandled. In this study, we introduced a quality control framework based on the extreme gradient boosting machine (XGBoost), and carefully addressed the imbalanced data problem in this framework. Results In the XGBoost based framework, we suggest the application of the Synthetic minority over-sampling technique (SMOTE) to re-balance data and use the balanced data to train the boosted trees as the classifier. Then the classifier is applied to other data for the peptide quality assessment. Experimental results show that our proposed framework increases the reliability of peptide heavy-light ratio estimation significantly. Conclusions Our results indicate that this framework is a powerful method for the peptide quality assessment. For the feature extraction part, the extracted ion chromatogram (XIC) based features contribute to the peptide quality assessment. To solve the imbalanced data problem, SMOTE brings a much better classification performance. Finally, the XGBoost is capable for the peptide quality control. Overall, our proposed framework provides reliable results for the further proteomics studies.


2019 ◽  
Vol 9 (20) ◽  
pp. 4216 ◽  
Author(s):  
Zhen Chen ◽  
Xiaoyan Han ◽  
Chengwei Fan ◽  
Zirun He ◽  
Xueneng Su ◽  
...  

In recent years, machine learning methods have shown the great potential for real-time transient stability status prediction (TSSP) application. However, most existing studies overlook the imbalanced data problem in TSSP. To address this issue, a novel data segmentation-based ensemble classification (DSEC) method for TSSP is proposed in this paper. Firstly, the effects of the imbalanced data problem on the decision boundary and classification performance of TSSP are investigated in detail. Then, a three-step DSEC method is presented. In the first step, the data segmentation strategy is utilized for dividing the stable samples into multiple non-overlapping stable subsets, ensuring that the samples in each stable subset are not more than the unstable ones, then each stable subset is combined with the unstable set into a training subset. For the second step, an AdaBoost classifier is built based on each training subset. In the final step, decision values from each AdaBoost classifier are aggregated for determining the transient stability status. The experiments are conducted on the Northeast Power Coordinating Council 140-bus system and the simulation results indicate that the proposed approach can significantly improve the classification performance of TSSP with imbalanced data.


2017 ◽  
Vol 102 (2) ◽  
pp. 937-950 ◽  
Author(s):  
Lijuan Zhou ◽  
Ran Li ◽  
Shudong Zhang ◽  
Hua Wang

2012 ◽  
Vol 6-7 ◽  
pp. 1036-1040
Author(s):  
Bao An Li

Big data problem has caused widespread concern from industry to academia in recent years. As the amount of data produced by various industries and sectors of rapid growth, increasing demands on data processing and analysis capabilities, how to face the challenges of data, discover new opportunities, the issue has received wide attention. As a traditional industry, the oil drilling or refinery enterprise is facing the operational status of the system to produce large amounts of data. This text introduced an approach to massive data processing for oil enterprise based on cloud computing and Internet of Things.


Sign in / Sign up

Export Citation Format

Share Document