scholarly journals Subsampling and Aggregation: A Solution to the Scalability Problem in Distance-Based Prediction for Mixed-Type Data

Mathematics ◽  
2021 ◽  
Vol 9 (18) ◽  
pp. 2247
Author(s):  
Amparo Baíllo ◽  
Aurea Grané

The distance-based linear model (DB-LM) extends the classical linear regression to the framework of mixed-type predictors or when the only available information is a distance matrix between regressors (as it sometimes happens with big data). The main drawback of these DB methods is their computational cost, particularly due to the eigendecomposition of the Gram matrix. In this context, ensemble regression techniques provide a useful alternative to fitting the model to the whole sample. This work analyzes the performance of three subsampling and aggregation techniques in DB regression on two specific large, real datasets. We also analyze, via simulations, the performance of bagging and DB logistic regression in the classification problem with mixed-type features and large sample sizes.


2021 ◽  
Vol 2021 (1) ◽  
Author(s):  
Yang Yu ◽  
Hongqing Zhu

AbstractDue to the complex morphology and characteristic of retinal vessels, it remains challenging for most of the existing algorithms to accurately detect them. This paper proposes a supervised retinal vessels extraction scheme using constrained-based nonnegative matrix factorization (NMF) and three dimensional (3D) modified attention U-Net architecture. The proposed method detects the retinal vessels by three major steps. First, we perform Gaussian filter and gamma correction on the green channel of retinal images to suppress background noise and adjust the contrast of images. Then, the study develops a new within-class and between-class constrained NMF algorithm to extract neighborhood feature information of every pixel and reduce feature data dimension. By using these constraints, the method can effectively gather similar features within-class and discriminate features between-class to improve feature description ability for each pixel. Next, this study formulates segmentation task as a classification problem and solves it with a more contributing 3D modified attention U-Net as a two-label classifier for reducing computational cost. This proposed network contains an upsampling to raise image resolution before encoding and revert image to its original size with a downsampling after three max-pooling layers. Besides, the attention gate (AG) set in these layers contributes to more accurate segmentation by maintaining details while suppressing noises. Finally, the experimental results on three publicly available datasets DRIVE, STARE, and HRF demonstrate better performance than most existing methods.





2012 ◽  
Vol 12 (9) ◽  
pp. 2856-2866 ◽  
Author(s):  
Wei-Shen Tai ◽  
Chung-Chian Hsu


Author(s):  
Yu Zhang ◽  
Cangzhi Jia ◽  
Melissa Jane Fullwood ◽  
Chee Keong Kwoh

Abstract The development of deep sequencing technologies has led to the discovery of novel transcripts. Many in silico methods have been developed to assess the coding potential of these transcripts to further investigate their functions. Existing methods perform well on distinguishing majority long noncoding RNAs (lncRNAs) and coding RNAs (mRNAs) but poorly on RNAs with small open reading frames (sORFs). Here, we present DeepCPP (deep neural network for coding potential prediction), a deep learning method for RNA coding potential prediction. Extensive evaluations on four previous datasets and six new datasets constructed in different species show that DeepCPP outperforms other state-of-the-art methods, especially on sORF type data, which overcomes the bottleneck of sORF mRNA identification by improving more than 4.31, 37.24 and 5.89% on its accuracy for newly discovered human, vertebrate and insect data, respectively. Additionally, we also revealed that discontinuous k-mer, and our newly proposed nucleotide bias and minimal distribution similarity feature selection method play crucial roles in this classification problem. Taken together, DeepCPP is an effective method for RNA coding potential prediction.



Author(s):  
Aichetou Bouchareb ◽  
Marc Boullé ◽  
Fabrice Clérot ◽  
Fabrice Rossi


Author(s):  
G. Caruso ◽  
S. A. Gattone ◽  
A. Balzanella ◽  
T. Di Battista


Author(s):  
Sahar Behzadi ◽  
Nikola S. Müller ◽  
Claudia Plant ◽  
Christian Böhm


Sign in / Sign up

Export Citation Format

Share Document