A Novel Differential Essential Genes Prediction Method Based on Random Forests Model

Author(s):  
Jiang Xie ◽  
Jiamin Sun ◽  
Jiaxin Li ◽  
Fuzhang Yang ◽  
Haozhe Li ◽  
...  
Author(s):  
Yuxin Guo ◽  
Ying Ju ◽  
Dong Chen ◽  
Lihong Wang

Genes, the nucleotide sequences that encode a polypeptide chain or functional RNA, are the basic genetic unit controlling biological traits. They are the guarantee of the basic structures and functions in organisms, and they store information related to biological factors and processes such as blood type, gestation, growth, and apoptosis. The environment and genetics jointly affect important physiological processes such as reproduction, cell division, and protein synthesis. Genes are related to a wide range of phenomena including growth, decline, illness, aging, and death. During the evolution of organisms, there is a class of genes that exist in a conserved form in multiple species. These genes are often located on the dominant strand of DNA and tend to have higher expression levels. The protein encoded by it usually either performs very important functions or is responsible for maintaining and repairing these essential functions. Such genes are called persistent genes. Among them, the irreplaceable part of the body’s life activities is the essential gene. For example, when starch is the only source of energy, the genes related to starch digestion are essential genes. Without them, the organism will die because it cannot obtain enough energy to maintain basic functions. The function of the proteins encoded by these genes is thought to be fundamental to life. Nowadays, DNA can be extracted from blood, saliva, or tissue cells for genetic testing, and detailed genetic information can be obtained using the most advanced scientific instruments and technologies. The information gained from genetic testing is useful to assess the potential risks of disease, and to help determine the prognosis and development of diseases. Such information is also useful for developing personalized medication and providing targeted health guidance to improve the quality of life. Therefore, it is of great theoretical and practical significance to identify important and essential genes. In this paper, the research status of essential genes and the essential genome database of bacteria are reviewed, the computational prediction method of essential genes based on communication coding theory is expounded, and the significance and practical application value of essential genes are discussed.


Author(s):  
Haewon BYEON

Background: We aimed to develop a model predicting the participation of the elderly in a cognitive health program using the random forest algorithm and presented baseline information for enhancing cognitive health. Methods: This study analyzed the raw data of Seoul Welfare Panel Study (SWPS) (20), which was surveyed by Seoul Welfare Foundation for the residents of Seoul from Jun 1st to Aug 31st, 2015. Subjects were 2,111 (879 men and 1232 women) persons aged 60 yr and older living in the community who were not diagnosed with dementia. The outcome variable was the intention to participate in a cognitive health promotion program. A prediction model was developed by the use of a Random forests and the results of the developed model were compared with those of a decision tree analysis based on classification and regression tree (CART). Results: The random forests model predicted education level, subjective health, subjective friendship, subjective family bond, mean monthly family income, age, smoking, living with a spouse or not, depression history, drinking, and regular exercise as the major variables. The analysis results of test data showed that the accuracy of the random forests was 72.3% and that of the CART model was 70.9%. Conclusion: It is necessary to develop a customized health promotion program considering the characteristics of subjects in order to implement a program effectively based on the developed model to predict participation in a cognitive health promotion program.


2015 ◽  
Vol 42 (24) ◽  
pp. 9412-9425 ◽  
Author(s):  
Shisheng Zhong ◽  
Xiaolong Xie ◽  
Lin Lin

Author(s):  
Mark J. van der Laan ◽  
Eric C Polley ◽  
Alan E. Hubbard

When trying to learn a model for the prediction of an outcome given a set of covariates, a statistician has many estimation procedures in their toolbox. A few examples of these candidate learners are: least squares, least angle regression, random forests, and spline regression. Previous articles (van der Laan and Dudoit (2003); van der Laan et al. (2006); Sinisi et al. (2007)) theoretically validated the use of cross validation to select an optimal learner among many candidate learners. Motivated by this use of cross validation, we propose a new prediction method for creating a weighted combination of many candidate learners to build the super learner. This article proposes a fast algorithm for constructing a super learner in prediction which uses V-fold cross-validation to select weights to combine an initial set of candidate learners. In addition, this paper contains a practical demonstration of the adaptivity of this so called super learner to various true data generating distributions. This approach for construction of a super learner generalizes to any parameter which can be defined as a minimizer of a loss function.


2013 ◽  
Vol 859 ◽  
pp. 280-283
Author(s):  
Shiang Hau Wu ◽  
Jiann Jong Guo

The study aimed at analyzing the keywords of the oil exploration research papers abstracts in 2012 and 2013 and using the random forests model to make the classification analysis in order to find the importance and similarities of 2012 and 2013 research trends. The contribution of the study included the following two points. First, the study used the text mining method in order to explore the content of oil exploration research paper abstracts. Second, the study applied the AdaBoost classification analysis to explore the relationship of the keywords between the two years’ keywords.


Sign in / Sign up

Export Citation Format

Share Document