model selection method
Recently Published Documents


TOTAL DOCUMENTS

61
(FIVE YEARS 21)

H-INDEX

9
(FIVE YEARS 2)

2021 ◽  
Vol 2005 (1) ◽  
pp. 012082
Author(s):  
Bing Yang ◽  
Hong Yang ◽  
Chengmei Zhang ◽  
Yajie Wang ◽  
Miao Hao

Author(s):  
Zhe Liu ◽  
Lifen Jia

Regression analysis estimates the relationships among variables which has been widely used in growth curves, and cross-validation as a model selection method assesses the generalization ability of regression models. Classical methods assume that the observation values of variables are precise numbers while in many cases data are imprecisely collected. So this paper explores the Chapman-Richards growth model which is one of the widely used growth models with imprecise observations under the framework of uncertainty theory. The least squares estimates of unknown parameters in this model are given. Moreover, cross-validation with imprecise observations is proposed. Furthermore, estimates of the expected value and variance of the uncertain error using residuals are given. In addition, ways to predict the value of response variable with new observed values of predictor variables are discussed. Finally, a numerical example illustrates our approach.


Author(s):  
Yukiteru Ono ◽  
Kiyoshi Asai ◽  
Michiaki Hamada

Abstract Motivation Recent advances in high-throughput long-read sequencers, such as PacBio and Oxford Nanopore sequencers, produce longer reads with more errors than short-read sequencers. In addition to the high error rates of reads, non-uniformity of errors leads to difficulties in various downstream analyses using long reads. Many useful simulators, which characterize long-read error patterns and simulate them, have been developed. However, there is still room for improvement in the simulation of the non-uniformity of errors. Results To capture characteristics of errors in reads for long-read sequencers, here, we introduce a generative model for quality scores, in which a hidden Markov Model with a latest model selection method, called factorized information criteria, is utilized. We evaluated our developed simulator from various points, indicating that our simulator successfully simulates reads that are consistent with real reads. Availability and implementation The source codes of PBSIM2 are freely available from https://github.com/yukiteruono/pbsim2. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Saeideh Khatiry Goharoodi ◽  
Kevin Dekemele ◽  
Mia Loccufier ◽  
Luc Dupre ◽  
Guillaume Crevecoeur

2020 ◽  
Vol 9 (2-3) ◽  
pp. 53-83
Author(s):  
Alsayed Algergawy ◽  
Samira Babalou ◽  
Friederike Klan ◽  
Birgitta König-Ries

Abstract Ontologies are the backbone of the Semantic Web. As a result, the number of existing ontologies and the number of topics covered by them has increased considerably. With this, reusing these ontologies becomes preferable to constructing new ontologies from scratch. However, a user might be interested in a part and/or a set of parts of a given ontology, only. Therefore, ontology modularization, i.e., splitting up an ontology into smaller parts that can be independently used, becomes a necessity. In this paper, we introduce a new approach to partition ontology based on the seeding-based scheme, which is developed and implemented through the Ontology Analysis and Partitioning Tool (OAPT). This tool proceeds according to the following methodology: first, before a candidate ontology is partitioned, OAPT optionally analyzes the input ontology to determine, if this ontology is worth considering using a predefined set of criteria that quantify the semantic and structural richness of the ontology. After that, we apply the seeding-based partitioning algorithm to modularize it into a set of modules. To decide upon a suitable number of modules that will be generated by partitioning the ontology, we provide the user a recommendation based on an information theoretic model selection method. We demonstrate the effectiveness of the OAPT tool and validate the performance of the partitioning approach by conducting an extensive set of experiments. The results prove the quality and the efficiency of the proposed tool.


Author(s):  
Debbrata K. Saha ◽  
Eswar Damaraju ◽  
Barnaly Rashid ◽  
Anees Abrol ◽  
Sergey M. Plis ◽  
...  

AbstractRecent work has focused on the study of dynamic (vs static) brain connectivity in resting fMRI data. In this work, we focus on temporal correlation between time courses extracted from coherent networks or components called functional network connectivity (FNC). Dynamic functional network connectivity (dFNC) is most commonly estimated using a sliding window-based approach to capture short periods of FNC change. These data are then clustered to estimate transient connectivity patterns or states. Determining the number of states is a challenging problem. The elbow criterion is a widely used approach to determine the optimal number of states. In our work, we present an alternative approach that evaluates classification (e.g. healthy controls versus patients) as a measure to select the optimal number of states (clusters). We apply different classification strategies to perform classification between healthy controls (HC) and patients with schizophrenia (SZ) for different numbers of states (i.e. varying the model order in the clustering algorithm). We compute cross-validated accuracy for different model orders to evaluate the classification performance. Our results are consistent with our earlier work which shows that overall accuracy improves when dynamic connectivity measures are used separately or in combination with static connectivity measures. Results also show that the optimal model order for classification is different from that using the standard k-means model selection method and that such optimization improves resulting in cross-validated accuracy. The optimal model order obtained from the proposed approach also gives significantly improved classification performance over the traditional model selection method. In sum, the observed results suggest that if one’s goal is to perform classification, using the proposed approach as a criterion for selecting the optimal number of states in dynamic connectivity analysis leads to improved accuracy in hold-out data.


Sign in / Sign up

Export Citation Format

Share Document