dataset analysis
Recently Published Documents





2021 ◽  
Hannes Westermann ◽  
Jaromír Šavelka ◽  
Vern R. Walker ◽  
Kevin D. Ashley ◽  
Karim Benyekhlef

Machine learning research typically starts with a fixed data set created early in the process. The focus of the experiments is finding a model and training procedure that result in the best possible performance in terms of some selected evaluation metric. This paper explores how changes in a data set influence the measured performance of a model. Using three publicly available data sets from the legal domain, we investigate how changes to their size, the train/test splits, and the human labelling accuracy impact the performance of a trained deep learning classifier. Our experiments suggest that analyzing how data set properties affect performance can be an important step in improving the results of trained classifiers, and leads to better understanding of the obtained results.

2021 ◽  
Vol 2078 (1) ◽  
pp. 012027
Ze yuan Liu ◽  
Xin long Li

Abstract The remarkable advances in ensemble machine learning methods have led to a significant analysis in large data, such as random forest algorithms. However, the algorithms only use the current features during the process of learning, which caused the initial upper accuracy’s limit no matter how well the algorithms are. Moreover, the low classification accuracy happened especially when one type of observation’s proportion is much lower than the other types in training datasets. The aim of the present study is to design a hierarchical classifier which try to extract new features by ensemble machine learning regressors and statistical methods inside the whole machine learning process. In stage 1, all the categorical variables will be characterized by random forest algorithm to create a new variable through regression analysis while the numerical variables left will serve as the sample of factor analysis (FA) process to calculate the factors value of each observation. Then, all the features will be learned by random forest classifier in stage 2. Diversified datasets consist of categorical and numerical variables will be used in the method. The experiment results show that the classification accuracy increased by 8.61%. Meanwhile, it also improves the classification accuracy of observations with low proportion in the training dataset significantly.

2021 ◽  
Tsukasa Fukunaga ◽  
Wataru Iwasaki

Motivation: Phylogenetic profiling is a powerful computational method for revealing the functions of function-unknown genes. Although conventional similarity evaluation measures in phylogenetic profiling showed high prediction accuracy, they have two estimation biases: an evolutionary bias and a spurious correlation bias. Existing studies have focused on the evolutionary bias, but the spurious correlation bias has not been analyzed. Results: To eliminate the spurious correlation bias, we applied an evaluation measure based on the inverse Potts model (IPM) to phylogenetic profiling. We also proposed an evaluation measure to remove both the evolutionary and spurious correlation biases using the IPM. In an empirical dataset analysis, we demonstrated that these IPM-based evaluation measures improved the prediction performance of phylogenetic profiling. In addition, we found that the integration of several evaluation measures, including the IPM-based evaluation measures, had superior performance to a single evaluation measure.

CHEST Journal ◽  
2021 ◽  
Vol 160 (4) ◽  
pp. A518
Aoife McDonagh ◽  
Laura Walsh ◽  
Terence O'Connor

2021 ◽  
Jiawei Liao ◽  
Julei Ma ◽  
Xingguo Zhang ◽  
Peng Shu

Abstract Background Constitutively activated STAT3 (Signal transducer and activator of transcription 3) has been seen in Multiple Myeloma (MM). However, STAT3 regulator in MM remains enigmatic. Methods Herein, we applied public dataset analysis and identified USP25 (Ubiquitin carboxyl-terminal hydrolase 25) was a potential regulator of STAT3. We further applied western blot and IP to confirm the relation between USP25 and STAT3. Furthermore, we used cell cycle assay to assess the effect USP25 on MM cell cycle.RestultsUSP25 highly expressed in MM CD138+ cells, and support MM cell proliferation. In protein level, USP25 take part in IL-6/USP25/STAT3 axis and could directly down-regulated STAT3 ubiquitination. Using truncated form of USP25, we also proved UCH (Ubiquitin carboxyl-terminal hydrolase) domain of USP25 is critical for USP25-STAT3 binding, UIM (Ubiquitin interacting motif) domain is required for STAT3 ubiquitination, we further proved cell cycle changed by USP25 required STAT3 and cyclinD1, suggesting USP25 inhibition is promising in STAT3, cyclinD1 abnormal MM patients.

Sign in / Sign up

Export Citation Format

Share Document