Estimation of kinship coefficient in structured and admixed populations using sparse sequencing data

‘Prime-editing’ proposes to replace traditional programmable nucleases (CRISPR-Cas9) using a catalytically impaired Cas9 (dCas9) connected to a engineered reverse transcriptase, and a guide RNA encoding both the target site and the desired change. With just a ‘nick’ on one strand, it is hypothe- sized, the negative, uncontrollable effects arising from double-strand DNA breaks (DSBs) - translocations, complex proteins, integrations and p53 activation - will be eliminated. However, sequencing data pro- vided (Accid:PRJNA565979) reveal plasmid integration, indicating that DSBs occur. Also, looking at only 16 off-targets is inadequate to assert that Prime-editing is more precise. Integration of plasmid occurs in all three versions (PE1/2/3). Interestingly, dCas9 which is known to be toxic in E. coli and yeast, is shown to have residual endonuclease activity. This also affects studies that use dCas9, like base- editors and de/methylations systems. Previous work using hRad51–Cas9 nickases also show significant integration in on-targets, as well as off-target integration [1]. Thus, we show that cellular response to nicking involves DSBs, and subsequent plasmid/Cas9 integration. This is an unacceptable outcome for any in vivo application in human therapy.

Download Full-text

Faculty Opinions recommendation of VarWalker: personalized mutation network analysis of putative cancer genes from next-generation sequencing data.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.718272765.793499663 ◽

2014 ◽

Author(s):

Gary Bader ◽

Mohamed Helmy

Keyword(s):

Next Generation Sequencing ◽

Network Analysis ◽

Next Generation Sequencing Data ◽

Cancer Genes ◽

Next Generation ◽

Sequencing Data ◽

Generation Sequencing

Download Full-text

Faculty Opinions recommendation of Family-based association test using both common and rare variants and accounting for directions of effects for sequencing data.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.718882382.793500875 ◽

2014 ◽

Author(s):

Melanie Bahlo

Keyword(s):

Rare Variants ◽

Association Test ◽

Sequencing Data ◽

Family Based

Download Full-text

Faculty Opinions recommendation of Coalescent Inference Using Serially Sampled, High-Throughput Sequencing Data from Intrahost HIV Infection.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.726132071.793531014 ◽

2017 ◽

Author(s):

Sarah Rowland-Jones ◽

Sophie Andrews

Keyword(s):

Hiv Infection ◽

High Throughput ◽

High Throughput Sequencing ◽

Sequencing Data ◽

High Throughput Sequencing Data

Download Full-text

Faculty Opinions recommendation of Bioinformatory-assisted analysis of next-generation sequencing data for precision medicine in pancreatic cancer.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.727775566.793536095 ◽

2017 ◽

Author(s):

Steve Pereira

Keyword(s):

Pancreatic Cancer ◽

Next Generation Sequencing ◽

Precision Medicine ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Assisted Analysis ◽

Generation Sequencing

Download Full-text

Clustering Count-based RNA Methylation Data Using a Nonparametric Generative Model

Current Bioinformatics ◽

10.2174/1574893613666180601080008 ◽

2018 ◽

Vol 14 (1) ◽

pp. 11-23 ◽

Cited By ~ 3

Author(s):

Lin Zhang ◽

Yanling He ◽

Huaizhi Wang ◽

Hui Liu ◽

Yufei Huang ◽

...

Keyword(s):

Clustering Analysis ◽

Methylation Level ◽

Optimal Number ◽

Generative Model ◽

Methylation Data ◽

Sequencing Data ◽

Number Of Clusters ◽

Rna Methylation ◽

Clustering Effect ◽

Optimal Number Of Clusters

Background: RNA methylome has been discovered as an important layer of gene regulation and can be profiled directly with count-based measurements from high-throughput sequencing data. Although the detailed regulatory circuit of the epitranscriptome remains uncharted, clustering effect in methylation status among different RNA methylation sites can be identified from transcriptome-wide RNA methylation profiles and may reflect the epitranscriptomic regulation. Count-based RNA methylation sequencing data has unique features, such as low reads coverage, which calls for novel clustering approaches. Objective: Besides the low reads coverage, it is also necessary to keep the integer property to approach clustering analysis of count-based RNA methylation sequencing data. Method: We proposed a nonparametric generative model together with its Gibbs sampling solution for clustering analysis. The proposed approach implements a beta-binomial mixture model to capture the clustering effect in methylation level with the original count-based measurements rather than an estimated continuous methylation level. Besides, it adopts a nonparametric Dirichlet process to automatically determine an optimal number of clusters so as to avoid the common model selection problem in clustering analysis. Results: When tested on the simulated system, the method demonstrated improved clustering performance over hierarchical clustering, K-means, MClust, NMF and EMclust. It also revealed on real dataset two novel RNA N6-methyladenosine (m6A) co-methylation patterns that may be induced directly by METTL14 and WTAP, which are two known regulatory components of the RNA m6A methyltransferase complex. Conclusion: Our proposed DPBBM method not only properly handles the count-based measurements of RNA methylation data from sites of very low reads coverage, but also learns an optimal number of clusters adaptively from the data analyzed. Availability: The source code and documents of DPBBM R package are freely available through the Comprehensive R Archive Network (CRAN): https://cran.r-project.org/web/packages/DPBBM/.

Download Full-text

IsoDetect: Detection of splice isoforms from third generation long reads based on short feature sequences

Current Bioinformatics ◽

10.2174/1574893615666200316101205 ◽

2020 ◽

Vol 15 ◽

Author(s):

Hongdong Li ◽

Wenjing Zhang ◽

Yuwen Luo ◽

Jianxin Wang

Keyword(s):

Sequence Similarity ◽

Detection Methods ◽

Sequence Information ◽

Third Generation ◽

Sequencing Data ◽

Splice Isoforms ◽

Third Generation Sequencing ◽

Long Reads ◽

Feature Sequence ◽

Generation Sequencing

Aims: Accurately detect isoforms from third generation sequencing data. Background: Transcriptome annotation is the basis for the analysis of gene expression and regulation. The transcriptome annotation of many organisms such as humans is far from incomplete, due partly to the challenge in the identification of isoforms that are produced from the same gene through alternative splicing. Third generation sequencing (TGS) reads provide unprecedented opportunity for detecting isoforms due to their long length that exceeds the length of most isoforms. One limitation of current TGS reads-based isoform detection methods is that they are exclusively based on sequence reads, without incorporating the sequence information of known isoforms. Objective: Develop an efficient method for isoform detection. Method: Based on annotated isoforms, we propose a splice isoform detection method called IsoDetect. First, the sequence at exon-exon junction is extracted from annotated isoforms as the “short feature sequence”, which is used to distinguish different splice isoforms. Second, we aligned these feature sequences to long reads and divided long reads into groups that contain the same set of feature sequences, thereby avoiding the pair-wise comparison among the large number of long reads. Third, clustering and consensus generation are carried out based on sequence similarity. For the long reads that do not contain any short feature sequence, clustering analysis based on sequence similarity is performed to identify isoforms. Result: Tested on two datasets from Calypte Anna and Zebra Finch, IsoDetect showed higher speed and compelling accuracy compared with four existing methods. Conclusion: IsoDetect is a promising method for isoform detection. Other: This paper was accepted by the CBC2019 conference.

Download Full-text

Germline EGFR variants are over-represented in adolescents and young adults (AYA) with adrenocortical carcinoma

Human Molecular Genetics ◽

10.1093/hmg/ddaa268 ◽

2020 ◽

Author(s):

Sara Akhavanfard ◽

Lamis Yehia ◽

Roshan Padmanabhan ◽

Jordan P Reynolds ◽

Ying Ni ◽

...

Keyword(s):

Young Adults ◽

Adrenocortical Carcinoma ◽

Kinase Inhibitor ◽

Mapk Pathway ◽

Adolescents And Young Adults ◽

Control Group ◽

Precision Oncology ◽

Sequencing Data ◽

Germline Variants ◽

Mutant Cells

Abstract Adrenocortical Carcinoma (ACC) is a rare endocrine tumor with poor overall prognosis and 1.5-fold overrepresentation in females. In children, ACC is associated with inherited cancer syndromes with 50–80% of childhood-ACC associated with TP53 germline variants. ACC in adolescents and young adults (AYA) is rarely due to germline TP53, IGF2, PRKAR1A and MEN1 variants. We analyzed exome sequencing data from 21 children (<15y), 32 AYA (15-39y), and 60 adults (>39y) with ACC, and retained all pathogenic, likely pathogenic, and highly prioritized variants of uncertain significance. We engineered a stable lentiviral-mutant ACC cell line, harboring an EGFR variant (p.Asp1080Asn) from a 21-year-old female without germline-TP53-variant and with aggressive ACC. We found that 4.8% of the children (P = 0.004) and 6.2% of AYA (P < 0.0001), all-female participants, harbored germline EGFR variants, compared to only 0.3% of the control group. Expanding our analysis to the RTK-RAS-MAPK pathway, we found that the RTK genes have the highest number of highly prioritized germline variants in these individuals amongst all three arms of this pathway. We showed EGFR mutant cells migrate faster and are characterized by a stem-like phenotype compared to wild type cells. While EGFR inhibitors did not affect the stemness of mutant cells, Sunitinib, a multireceptor tyrosine kinase inhibitor, significantly reduced their stem-like behavior. Our data suggest that EGFR could be a novel underlying germline predisposition factor for ACC, especially in the Childhood-AYA (C-AYA) population. Further clinical validation can improve precision oncology management of this disease, which is known to have limited therapeutic options.

Download Full-text