scholarly journals Prediction of driver variants in the cancer genome via machine learning methodologies

Author(s):  
Mark F Rogers ◽  
Tom R Gaunt ◽  
Colin Campbell

Abstract Sequencing technologies have led to the identification of many variants in the human genome which could act as disease-drivers. As a consequence, a variety of bioinformatics tools have been proposed for predicting which variants may drive disease, and which may be causatively neutral. After briefly reviewing generic tools, we focus on a subset of these methods specifically geared toward predicting which variants in the human cancer genome may act as enablers of unregulated cell proliferation. We consider the resultant view of the cancer genome indicated by these predictors and discuss ways in which these types of prediction tools may be progressed by further research.

2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Madeleine Darbyshire ◽  
Zachary du Toit ◽  
Mark F. Rogers ◽  
Tom R. Gaunt ◽  
Colin Campbell

Abstract For cancers, such as common solid tumours, variants in the genome give a selective growth advantage to certain cells. It has recently been argued that the mean count of coding single nucleotide variants acting as disease-drivers in common solid tumours is frequently small in size, but significantly variable by cancer type (hypermutation is excluded from this study). In this paper we investigate this proposal through the use of integrative machine-learning-based classifiers we have proposed recently for predicting the disease-driver status of single nucleotide variants (SNVs) in the human cancer genome. We find that predicted driver counts are compatible with this proposal, have similar variabilities by cancer type and, to a certain extent, the drivers are identifiable by these machine learning methods. We further discuss predicted driver counts stratified by stage of disease and driver counts in non-coding regions of the cancer genome, in addition to driver-genes.


2020 ◽  
Vol 2020 ◽  
pp. 1-6
Author(s):  
Erdal Cosgun ◽  
Min Oh

Background. Next-generation sequencing enables massively parallel processing, allowing lower cost than the other sequencing technologies. In the subsequent analysis with the NGS data, one of the major concerns is the reliability of variant calls. Although researchers can utilize raw quality scores of variant calling, they are forced to start the further analysis without any preevaluation of the quality scores. Method. We presented a machine learning approach for estimating quality scores of variant calls derived from BWA+GATK. We analyzed correlations between the quality score and these annotations, specifying informative annotations which were used as features to predict variant quality scores. To test the predictive models, we simulated 24 paired-end Illumina sequencing reads with 30x coverage base. Also, twenty-four human genome sequencing reads resulting from Illumina paired-end sequencing with at least 30x coverage were secured from the Sequence Read Archive. Results. Using BWA+GATK, VCFs were derived from simulated and real sequencing reads. We observed that the prediction models learned by RFR outperformed other algorithms in both simulated and real data. The quality scores of variant calls were highly predictable from informative features of GATK Annotation Modules in the simulated human genome VCF data (R2: 96.7%, 94.4%, and 89.8% for RFR, MLR, and NNR, respectively). The robustness of the proposed data-driven models was consistently maintained in the real human genome VCF data (R2: 97.8% and 96.5% for RFR and MLR, respectively).


2016 ◽  
Vol 2016 ◽  
pp. 1-14 ◽  
Author(s):  
Siyu Han ◽  
Yanchun Liang ◽  
Ying Li ◽  
Wei Du

Long noncoding RNA (lncRNA) is a kind of noncoding RNA with length more than 200 nucleotides, which aroused interest of people in recent years. Lots of studies have confirmed that human genome contains many thousands of lncRNAs which exert great influence over some critical regulators of cellular process. With the advent of high-throughput sequencing technologies, a great quantity of sequences is waiting for exploitation. Thus, many programs are developed to distinguish differences between coding and long noncoding transcripts. Different programs are generally designed to be utilised under different circumstances and it is sensible and practical to select an appropriate method according to a certain situation. In this review, several popular methods and their advantages, disadvantages, and application scopes are summarised to assist people in employing a suitable method and obtaining a more reliable result.


Author(s):  
He Zhu ◽  
Hongwei Zhang ◽  
Youliang Pei ◽  
Zhibin Liao ◽  
Furong Liu ◽  
...  

Abstract Background Hepatocellular carcinoma (HCC) is a common type of malignant human cancer with high morbidity and poor prognosis, causing numerous deaths per year worldwide. Growing evidence has been demonstrated that long non-coding RNAs (lncRNAs) are closely associated with hepatocarcinogenesis and metastasis. However, the roles, functions, and working mechanisms of most lncRNAs in HCC remain poorly defined. Methods Real-time quantitative polymerase chain reaction (qRT-PCR) was used to detect the expression level of CCDC183-AS1 in HCC tissues and cell lines. Cell proliferation, migration and invasion ability were evaluated by CCK-8 and transwell assay, respectively. Animal experiments were used to explore the role of CCDC183-AS1 and miR-589-5p in vivo. Bioinformatic analysis, dual-luciferase reporter assay and RNA immunoprecipitation (RIP) assay were performed to confirm the regulatory relationship between CCDC183-AS1, miR-589-5p and SKP1. Results Significantly upregulated expression of CCDC183-AS1 was observed in both HCC tissues and cell lines. HCC patients with higher expression of CCDC183-AS1 had a poorer overall survival rate. Functionally, overexpression of CCDC183-AS1 markedly promoted HCC cell proliferation, migration and invasion in vitro and tumor growth and metastasis in vivo, whereas the downregulation of CCDC183-AS1 exerted opposite effects. MiR-589-5p inhibitor counteracted the proliferation, migration and invasion inhibitory effects induced by CCDC183-AS1 silencing. Mechanistically, CCDC183-AS1 acted as a ceRNA through sponging miR-589-5p to offset its inhibitory effect on the target gene SKP1, then promoted the tumorigenesis of HCC. Conclusions CCDC183-AS1 functions as an oncogene to promote HCC progression through the CCDC183-AS1/miR-589-5p/SKP1 axis. Our study provided a novel potential therapeutic target for HCC patients.


Open Medicine ◽  
2020 ◽  
Vol 15 (1) ◽  
pp. 921-931
Author(s):  
Juan Zhao ◽  
Xue-Bin Zeng ◽  
Hong-Yan Zhang ◽  
Jie-Wei Xiang ◽  
Yu-Song Liu

AbstractLong non-coding RNA forkhead box D2 adjacent opposite strand RNA 1 (FOXD2-AS1) has emerged as a potential oncogene in several tumors. However, its biological function and potential regulatory mechanism in glioma have not been fully investigated to date. In the present study, RT-qPCR was conducted to detect the levels of FOXD2-AS1 and microRNA (miR)-506-5p, and western blot assays were performed to measure the expression of CDK2, cyclinE1, P21, matrix metalloproteinase (MMP)7, MMP9, N-cadherin, E-cadherin and vimentin in glioma cells. A luciferase reporter assay was performed to verify the direct targeting of miR-506-5p by FOXD2-AS1. Subsequently, cell viability was analyzed using the CCK-8 assay. Cell migration and invasion were analyzed using Transwell and wound healing assays, respectively. The results demonstrated that FOXD2-AS1 was significantly overexpressed in glioma cells, particularly in U251 cells. Knockdown of FOXD2-AS1 in glioma cells significantly inhibited cell proliferation, migration, invasion and epithelial–mesenchymal transition (EMT) and regulated the expression of CDK2, cyclinE1, P21, MMP7 and MMP9. Next, a possible mechanism for these results was explored, and it was observed that FOXD2-AS1 binds to and negatively regulates miR-506-5p, which is known to be a tumor-suppressor gene in certain human cancer types. Furthermore, overexpression of miR-506-5p significantly inhibited cell proliferation, migration, invasion and EMT, and these effects could be reversed by transfecting FOXD2-AS1 into the cells. In conclusion, our data suggested that FOXD2-AS1 contributed to glioma proliferation, metastasis and EMT via competitively binding to miR-506-5p. FOXD2-AS1 may be a promising target for therapy in patients with glioma.


Author(s):  
Hannah Bolinger ◽  
David Tran ◽  
Kenneth Harary ◽  
George C. Paoli ◽  
Giselle Guron ◽  
...  

Traditional microbiological testing methods are slow, and many molecular-based techniques rely on culture-based enrichment to overcome low limits of detection. Recent advancements in sequencing technologies may make it possible to utilize machine learning (ML) to identify patterns in microbiome data to potentially predict the presence or absence of pathogens. In this study, 299 poultry rinsate samples from various points in the processing chain were analyzed to determine if microbiota could inform about a sample’s risk for containing Salmonella . Samples were culture confirmed as Salmonella -positive or -negative following modified USDA MLG protocols. The culture confirmation result was used as a reference to compare with 16S sequencing data. Pre-chill samples tested positive (71/82) at a higher frequency than post-chill samples (30/217) and contained greater microbial diversity. Due to their larger sample size, post-chill samples were analyzed more deeply. Analysis of variance (ANOVA) identified a significant effect of chilling on the number of genera (p<0.001), but analysis of similarities (ANOSIM) failed to provide evidence for microbial dissimilarity between pre- and post-chill samples (p=0.001, R=0.443). Various ML models were trained using post-chill samples to predict if a sample contained Salmonella based on the samples’ microbiota pre-enrichment. The optimal model was a Random Forest-based model with a performance as follows: accuracy (88%), sensitivity (85%), specificity (90%). While the algorithms described in this paper are prototypes, these risk-based algorithms demonstrate the potential and need for further studies to provide insight alongside diagnostic tests. Combining risk-based information with diagnostic tools can help poultry processors make informed decisions to help identify and prevent the spread of Salmonella . These data add to the growing body of literature exploring novel ways to utilize microbiome data for predictive food safety.


2004 ◽  
Vol 47 (21) ◽  
pp. 5126-5139 ◽  
Author(s):  
Allison B. Edsall ◽  
Arasambattu K. Mohanakrishnan ◽  
Donglai Yang ◽  
Philip E. Fanwick ◽  
Ernest Hamel ◽  
...  

2019 ◽  
Author(s):  
Raphael Leman ◽  
Hélène Tubeuf ◽  
Sabine Raad ◽  
Isabelle Tournier ◽  
Céline Derambure ◽  
...  

Abstract Background: Branch points (BPs) map within short motifs upstream of acceptor splice sites (3’ss) and are essential for splicing of pre-mature mRNA. Several BP-dedicated bioinformatics tools, including HSF, SVM-BPfinder, BPP, Branchpointer, LaBranchoR and RNABPS were developed during the last decade. Here, we evaluated their capability to detect the position of BPs, and also to predict the impact on splicing of variants occurring upstream of 3’ss. Results: We used a large set of constitutive and alternative human 3’ss collected from Ensembl (n = 264,787 3’ss) and from in-house RNAseq experiments (n = 51,986 3’ss). We also gathered an unprecedented collection of functional splicing data for 120 variants (62 unpublished) occurring in BP areas of disease-causing genes. Branchpointer showed the best performance to detect the relevant BPs upstream of constitutive and alternative 3’ss (99.48 % and 65.84 % accuracies, respectively). For variants occurring in a BP area, BPP emerged as having the best performance to predict effects on mRNA splicing, with an accuracy of 89.17 %. Conclusions: Our investigations revealed that Branchpointer was optimal to detect BPs upstream of 3’ss, and that BPP was most relevant to predict splicing alteration due to variants in the BP area. Keywords: Branch Point, Prediction, RNA, Benchmark, HSF, SVM-BPfinder, BPP, Branchpointer, LaBranchoR, RNABPS, Variants


2013 ◽  
Author(s):  
Benjamin P. Berman ◽  
Yaping Liu ◽  
Theresa K. Kelly

Background: Nucleosome organization and DNA methylation are two mechanisms that are important for proper control of mammalian transcription, as well as epigenetic dysregulation associated with cancer. Whole-genome DNA methylation sequencing studies have found that methylation levels in the human genome show periodicities of approximately 190 bp, suggesting a genome-wide relationship between the two marks. A recent report (Chodavarapu et al., 2010) attributed this to higher methylation levels of DNA within nucleosomes. Here, we analyzed a number of published datasets and found a more compelling alternative explanation, namely that methylation levels are highest in linker regions between nucleosomes. Results: Reanalyzing the data from (Chodavarapu et al., 2010), we found that nucleosome-associated methylation could be strongly confounded by known sequence-related biases of the next-generation sequencing technologies. By accounting for these biases and using an unrelated nucleosome profiling technology, NOMe-seq, we found that genome-wide methylation was actually highest within linker regions occurring between nucleosomes in multi-nucleosome arrays. This effect was consistent among several methylation datasets generated independently using two unrelated methylation assays. Linker-associated methylation was most prominent within long Partially Methylated Domains (PMDs) and the positioned nucleosomes that flank CTCF binding sites. CTCF adjacent nucleosomes retained the correct positioning in regions completely devoid of CpG dinucleotides, suggesting that DNA methylation is not required for proper nucleosomes positioning. Conclusions: The biological mechanisms responsible for DNA methylation patterns outside of gene promoters remain poorly understood. We identified a significant genome-wide relationship between nucleosome organization and DNA methylation, which can be used to more accurately analyze and understand the epigenetic changes that accompany cancer and other diseases.


Sign in / Sign up

Export Citation Format

Share Document