scholarly journals Using Decision Tree Aggregation with Random Forest Model to Identify Gut Microbes Associated with Colorectal Cancer

Genes ◽  
2019 ◽  
Vol 10 (2) ◽  
pp. 112 ◽  
Author(s):  
Dongmei Ai ◽  
Hongfei Pan ◽  
Rongbao Han ◽  
Xiaoxin Li ◽  
Gang Liu ◽  
...  

The imbalance of human gut microbiota has been associated with colorectal cancer. In recent years, metagenomics research has provided a large amount of scientific data enabling us to study the dedicated roles of gut microbes in the onset and progression of cancer. We removed unrelated and redundant features during feature selection by mutual information. We then trained a random forest classifier on a large metagenomics dataset of colorectal cancer patients and healthy people assembled from published reports and extracted and analysed the information from the learned decision trees. We identified key microbial species associated with colorectal cancers. These microbes included Porphyromonas asaccharolytica, Peptostreptococcus stomatis, Fusobacterium, Parvimonas sp., Streptococcus vestibularis and Flavonifractor plautii. We obtained the optimal splitting abundance thresholds for these species to distinguish between healthy and colorectal cancer samples. This extracted consensus decision tree may be applied to the diagnosis of colorectal cancers.

2021 ◽  
Vol 39 (15_suppl) ◽  
pp. 2601-2601
Author(s):  
Tao Zhou ◽  
Libin Chen ◽  
Jing Guo ◽  
Mengmeng Zhang ◽  
Huanhuan Liu ◽  
...  

2601 Background: Microsatellite instability (MSI) is a common genomic alteration in several tumors, such as colorectal cancer, endometrial carcinoma, and stomach, which is characterized as microsatellite instability-high (MSI-H) and microsatellite stable (MSS) based on a high degree of polymorphism in microsatellite lengths. MSI is a predictive biomarker for immunotherapy efficacy in advanced/metastatic solid tumors, especially in colorectal cancer (CRC) patients. Several computational approaches based on target panel sequencing data have been used to detect MSI; However, they are considerably affected by the sequencing depth and panel size. Methods: We developed MSIFinder, a python package for automatic MSI classification, using random forest classifier (RFC)-based genome sequencing, which is a machine learning technology. We included 19 MSI-H and 25 MSS samples as training sets. First, RFC model were built by 54 feature markers from the training sets. Second. The software was validated the classifier using a test set comprising 21 MSI-H and 379 MSS samples. Results: With this test set, MSIFinder achieved a sensitivity (recall) of 0.997, a specificity of 1, an accuracy of 0.998, a positive predictive value (PPV) of 0.954, an F1 score of 0.977, and an area under curve (AUC) of 0.999. We discovered that MSIFinder is less affected by low sequencing depth and can achieve a concordance of 0.993, while exhibiting a sequencing depth of 100×. Furthermore, we realized that MSIFinder is less affected by the panel size and can achieve a concordance of 0.99 when the panel size is 0.5 m (million base). Conclusions: These results indicated that MSIFinder is a robust MSI classification tool and not affected by the panel size and sequencing depth. Furthermore, MSIFinder can provide reliable MSI detection for scientific and clinical purposes.[Table: see text]


2021 ◽  
Vol 23 (08) ◽  
pp. 532-537
Author(s):  
Cherlakola Abhinav Reddy ◽  
◽  
Sai Nitesh Gadiraju ◽  
Dr. Samala Nagaraj ◽  
◽  
...  

Online media has progressively obtained integral to the route billions of individuals experience news and occasions, frequently bypassing writers—the conventional guardians of breaking news. Occasions,in reality, make a relating spike of posts (tweets) on Twitter. This projects a great deal of significance on the validity of data found via online media stages like Twitter. We have utilized different managed learning techniques like Naïve Bayes, Decision Trees, and Support Vector Machines on the information to separate tweets among genuine and counterfeit news. For our AI models, we have utilized tweet and client highlights as our indicators. We accomplished a precision of 88% utilizing the Random Forest classifier and 88% utilizing the Decision tree. Notwithstanding, we accept that breaking down client records would build the accuracy of our models.


Author(s):  
Nitika Kapoor ◽  
Parminder Singh

Data mining is the approach which can extract useful information from the data. The prediction analysis is the approach which can predict future possibilities based on the current information. The authors propose a hybrid classifier to carry out the heart disease prediction. The hybrid classifier is combination of random forest and decision tree classifier. Moreover, the heart disease prediction technique has three steps, which are data pre-processing, feature extraction, and classification. In this research, random forest classifier is applied for the feature extraction and decision tree classifier is applied for the generation of prediction results. However, random forest classifier will extract the information and decision tree will generate final classifier result. The authors show the results of proposed model using the Python platform. Moreover, the results are compared with support vector machine (SVM) and k-nearest neighbour classifier (KNN).


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Tao Zhou ◽  
Libin Chen ◽  
Jing Guo ◽  
Mengmeng Zhang ◽  
Yanrui Zhang ◽  
...  

Abstract Background Microsatellite instability (MSI) is a common genomic alteration in colorectal cancer, endometrial carcinoma, and other solid tumors. MSI is characterized by a high degree of polymorphism in microsatellite lengths owing to the deficiency in the mismatch repair system. Based on the degree, MSI can be classified as microsatellite instability-high (MSI-H) and microsatellite stable (MSS). MSI is a predictive biomarker for immunotherapy efficacy in advanced/metastatic solid tumors, especially in colorectal cancer patients. Several computational approaches based on target panel sequencing data have been used to detect MSI; however, they are considerably affected by the sequencing depth and panel size. Results We developed MSIFinder, a python package for automatic MSI classification, using random forest classifier (RFC)-based genome sequencing, which is a machine learning technology. We included 19 MSI-H and 25 MSS samples as training sets. First, we selected 54 feature markers from the training sets, built an RFC model, and validated the classifier using a test set comprising 21 MSI-H and 379 MSS samples. With this test set, MSIFinder achieved a sensitivity (recall) of 1.0, a specificity of 0.997, an accuracy of 0.998, a positive predictive value of 0.954, an F1 score of 0.977, and an area under the curve of 0.999. To further verify the robustness and effectiveness of the model, we used a prospective cohort consisting of 18 MSI-H samples and 122 MSS samples. MSIFinder achieved a sensitivity (recall) of 1.0 and a specificity of 1.0. We discovered that MSIFinder is less affected by a low sequencing depth and can achieve a concordance of 0.993 while exhibiting a sequencing depth of 100×. Furthermore, we realized that MSIFinder is less affected by the panel size and can achieve a concordance of 0.99 when the panel size is 0.5 M (million bases). Conclusion These results indicate that MSIFinder is a robust and effective MSI classification tool that can provide reliable MSI detection for scientific and clinical purposes.


Modelling the sentiment with context is one of the most important part in Sentiment analysis. There are various classifiers which helps in detecting and classifying it. Detection of sentiment with consideration of sarcasm would make it more accurate. But detection of sarcasm in people review is a challenging task and it may lead to wrong decision making or classification if not detected. This paper uses Decision Tree and Random forest classifiers and compares the performance of both. Here we consider the random forest as hybrid decision tree classifier. We propose that performance of random forest classifier is better than any other normal decision tree classifier with appropriate reasoning


F1000Research ◽  
2021 ◽  
Vol 10 ◽  
pp. 1185
Author(s):  
Yen-Siang Leow ◽  
Kok-Why Ng ◽  
Yih-Jian Yoong ◽  
Seng-Beng Ng

Background: Thalassemia is a hereditary blood disease in which abnormal red blood cells (RBCs) carry insufficient oxygen throughout the body. Conventional methods of thalassemia detection through a complete blood count (CBC) test and peripheral blood smear image still possess a lot of weaknesses. Methods: This paper proposes a hybrid segmentation method to segment the RBCs. It incorporates adaptive thresholding and canny edge method to segment the RBCs. Morphological operations are performed to clean the leftovers. Shape and texture features are extracted using the segmented masks and the gray level co-occurrence matrix. Data imbalance treatment is used for solving the imbalance cell type class in distribution. In the data resampling layer, the synthetic minority oversampling technique (SMOTE), adaptive synthetic sampling (ADASYN), and random over sampling (ROS) are performed and evaluated using the decision tree and logistic regression. In the classification layer, the decision tree, random forest classifier and support vector machine (SVM) are assessed and compared for the best performance in classification. Results:The proposed method outperforms the other methods in the image segmentation layer with the structural similarity index measure (SSIM) of 89.88%. In the data resampling layer, ADASYN is employed as it is more accurate than the SMOTE and ROS. The random forest classifier is chosen at the classification layer as it is more accurate than the decision tree and support vector machine (SVM). Conclusions:The proposed method is tested on the latest dataset of erythrocyteIDB3 and it solves the issues of imbalanced data due to the insufficient cell classes.


2020 ◽  
Vol 9 (12) ◽  
pp. 4038
Author(s):  
Audrius Dulskas ◽  
Vytautas Gaizauskas ◽  
Inga Kildusiene ◽  
Narimantas Evaldas Samalavicius ◽  
Giedre Smailyte

Purpose: In this study, we analyzed the mortality and survival of colorectal cancer patients in Lithuania. Methods: This was a national cohort study. Population-based data from the Lithuanian Cancer Registry and period analyses were collected. Overall, 20,980 colorectal cancer patients were included. We examined the changes in colorectal cancer mortality and survival rates between 1998 and 2012 according to cancer anatomical sub-sites and stages. We calculated the 5-year relative survival estimates using period analysis. Results: Overall, 20,980 colorectal cancer cases reported from 1998 to 2012 were included in the study. The total number of newly diagnosed colorectal cancers increased from 1998–2002 to 2008–2012 by 12.1%. The highest number of colorectal cancers was localized and increased from 33.9% to 42.0%. The number of cancers with regional metastases and advanced cancers decreased by 11.1% and 15.5%, respectively. An increased number of new cases was observed for almost all colon cancer sub-sites. The overall 5-year relative survival rate increased from 37.9% in 1998–2002 to 51.5% in 2008–2012. We showed an increase in survival rates for all stages and all sub-sites. In the most recent period, patients with a localized disease had a 5-year survival rate of 78.6%, while survival estimates for advanced cancer patients remained low at 6.6%. Conclusion: Although survival rates variated in colorectal cancer patients according to disease stages and sub-sites, we showed increased survival rates for all patients.


Sign in / Sign up

Export Citation Format

Share Document