large gene
Recently Published Documents


TOTAL DOCUMENTS

371
(FIVE YEARS 109)

H-INDEX

51
(FIVE YEARS 7)

Author(s):  
Paul Zaharias ◽  
Tandy Warnow

With the increased availability of sequence data and even of fully sequenced and assembled genomes, phylogeny estimation of very large trees (even of hundreds of thousands of sequences) is now a goal for some biologists. Yet, the construction of these phylogenies is a complex pipeline presenting analytical and computational challenges, especially when the number of sequences is very large. In the last few years, new methods have been developed that aim to enable highly accurate phylogeny estimations on these large datasets, including divide-and-conquer techniques for multiple sequence alignment and/or tree estimation, methods that can estimate species trees from multi-locus datasets while addressing heterogeneity due to biological processes (e.g., incomplete lineage sorting and gene duplication and loss), and methods to add sequences into large gene trees or species trees. Here we present some of these recent advances and discuss opportunities for future improvements.


2021 ◽  
Author(s):  
Damianos Melidis ◽  
Christian Landgraf ◽  
Anja Schoener-Heinisch ◽  
Gunnar Schmidt ◽  
Sandra von Hardenberg ◽  
...  

Since next-generation sequencing (NGS) has become widely available, large gene panels containing up to several hundred genes can be sequenced cost-efficiently. However, the interpretation of the often large numbers of sequence variants detected when using NGS is laborious, prone to errors and often not comparable across laboratories. To overcome this challenge, the American College of Medical Genetics and Genomics and the Association for Molecular Pathology (ACMG/AMP) introduced standards and guidelines for the interpretation of sequencing variants. Further gene- and disease-specific refinements regarding hereditary hearing loss have been developed since then. With more than 200 genes associated with hearing disorders, the manual inspection of possible causative variants is especially difficult and time consuming. We developed an open-source bioinformatics tool GenOtoScope, which automates all ACMG/AMP criteria that can be assessed without further individual patient information or human curator investigation, including the refined loss of function criterion (“PVS1”). Two types of interfaces are provided: (i) a command line application to classify sequence variants in batches for a set of patients and (ii) a user-friendly website to classify single variants. We compared the performance of our tool with two other variant classification tools using two hearing loss data sets, which were manually annotated either by the ClinGen Hearing Loss Gene Curation Expert Panel or the diagnostics unit of our human genetics department. GenOtoScope achieved the best average accuracy and precision for both data sets. Compared to the second-best tool, GenOtoScope improved accuracy metric by 25.75% and 4.57% and precision metric by 52.11% and 12.13% on the two data sets respectively. The web interface is freely accessible. The command line application along with all source code, documentation and example outputs can be found via the project GitHub page.


2021 ◽  
Author(s):  
Pablo Mier ◽  
Jean-Fred Fontaine ◽  
Marah Stoldt ◽  
Romain Libbrecht ◽  
Carlotta Martelli ◽  
...  

The gene family of insect odorant receptors (ORs) has greatly expanded in the course of evolution. ORs allow insects to detect volatile chemicals and therefore play an important role in social interactions, the detection of enemies and preys, and during foraging. The sequences of several thousand ORs are known, but their specific function or ligands have been identified only for very few of them. To advance the functional characterization of ORs, we compiled, curated and aligned the sequences of 3,902 ORs from 21 insect species. We identified the amino acid positions that best predict the response to ligands using machine learning on sets of functionally characterized proteins from the fly Drosophila melanogaster, the mosquito Anopheles gambiae and the ant Harpegnathos saltator. We studied the conservation of these predicted relevant residues across all OR subfamilies and show that the subfamilies that expanded strongly in social insects exhibit high levels of conservation in their binding sites. This indicates that ORs of social insect families are typically finely tuned and exhibit a sensitivity to very similar odorants. Our novel approach provides a powerful tool to use functional information from a limited number of genes to investigate the functional evolution of large gene families.


2021 ◽  
Author(s):  
Ines Sofia Calado Baptista ◽  
Vinodh Kandavalli ◽  
Vatsala Chauhan ◽  
Mohamed Nasurudeen Mohamed Bahrudeen ◽  
Bilena Lima de Brito Almeida ◽  
...  

Escherichia coli uses the ability of σ factors to recognize specific DNA sequences in order to quickly control large gene cohorts. While most genes respond to only one s factor, approximately 5% have dual s factor preference. The ones in significant numbers are ‘σ 70+38 genes’, responsive to σ 70 , which controls housekeeping genes, as well as to σ 38 , which activates genes during stationary growth and stresses. We show that σ 70+38 genes are almost as upregulated in stationary growth as genes responsive to σ 38 alone. Also, their response strengths to σ 38 are predictable from their promoter sequences. Next, we propose a sequence- and σ 38 level-dependent, analytical model of σ 70+38 genes applicable in the exponential, stationary, and in the transition period between the two growth phases. Finally, we propose a general model, applicable to other σ factors as well. This model can guide the design of synthetic circuits with sequence-dependent sensitivity and plasticity to transitions between the exponential and stationary growth phases.


Blood ◽  
2021 ◽  
Vol 138 (Supplement 1) ◽  
pp. 3969-3969
Author(s):  
So Hyun Julie Park Park ◽  
Mingming Cao ◽  
Yankai Zhang ◽  
Vivien A. Sheehan ◽  
Gang Bao

Abstract Introduction: Several gene editing strategies have been developed to cure sickle cell disease (SCD), including the use of CRISPR/Cas9 to edit beta-globin (HBB), gamma-globin (HBG), or B-cell lymphoma/leukemia 11A (BCL11A) in hematopoietic stem and progenitor cells (HSPCs) from patients with SCD. Although high gene-editing rates can be achieved and off-target effects reduced, new challenges in applying the gene-editing strategies, including unintended gene modifications, need to be addressed in order to cure SCD with high efficacy and safety. To date, due to limitations in sequencing methods, studies on CRISPR/Cas9 genome editing for treating SCD only identified small insertions/deletions (INDELs); the extent and consequences of unintended large gene modifications are generally unknown. Here we provide accurate quantification and profiling of unintended gene modifications due to Cas9 induced double-stranded breaks (DSBs) in SCD HSPCs, including large deletions, insertions, and complex chromosomal arrangements, and the comparison of different approaches. Methods: R-66S gRNA targets the sickle mutation on the HBB. R-02 gRNA generates a DSB 16 bp away from the sickle mutation site. SD-02 gRNA introduces a 13-bp Hereditary Persistence of Fetal Hemoglobin (HPFH) deletion as a major INDEL in the HBG1/HBG2 promoter to reactivate fetal hemoglobin (HbF). BCL11A gRNA targets the GATA1 site at the BCL11A erythroid enhancer to induce HbF. R-66S, R-02, SD-02, and BCL11A gRNAs were respectively complexed with SpyCas9 and delivered as ribonucleoprotein (RNP) to SCD HSPCs. To accurately quantify CRISPR/Cas9 induced large modifications in gene-edited SCD HSPCs, we used PacBio Single Molecule, Real-Time (SMRT) Sequencing with Unique Molecular Identifiers (UMI). The 5-6 kb region around the Cas9 cut-site was dual-UMI tagged using two PCR cycles. The second and third PCR was performed with minimal cycle numbers to enrich the UMI-tagged template molecules. The SMRTbell library composed of edited and unedited SCD HSPCs samples was sequenced on a PacBio Sequel II 8M flowcell using the circular consensus sequencing (CCS) mode. The PacBio subreads were converted to HiFi reads and subjected to UMI consensus read generation and variant calling. Results: SMRT-seq with UMI revealed high rates and broad spectra of unintended large deletions (> 200 bp) induced by Cas9 cutting at HBB, HBG1, and BCL11A genes in RNP treated samples, with respectively R-66S RNP, 31.7%; R-02 RNP, 17.4%; SD-02 RNP, 13.3%; BCL11A RNP, 40%. The large deletions have a very broad distribution of sizes and locations. In addition, we found large insertions (> 50 bp) and local complex chromosomal rearrangements at the Cas9 cut-sites. Therefore, the current assessment of gene-editing rates using short-read Next Generation Sequencing (NGS) misses a substantial proportion of Cas9-cutting induced large gene modifications, resulting in an inaccurate measure of both allele and genotype frequencies. Discussions: We found that unintended on-target large deletions occur at high rates at HBB, HBG1, and BCL11A in gene-edited SCD HSPCs. These results raise significant safety concerns regarding gene-editing of HSPCs to treat SCD. Our results demonstrate the importance of detecting and quantifying all possible CRISPR/Cas9 gene-editing outcomes to ensure the efficient and safe translation of gene-editing-based strategies to cure SCD and other human diseases. Additional work is required to determine the functional consequences of the unintended gene modifications and the persistence of the unintended large gene modifications at the on-target cut-sites. Disclosures Sheehan: Forma Therapeutics: Research Funding; Beam Therapeutics: Research Funding; Novartis: Research Funding.


Author(s):  
Paul Zaharias ◽  
Tandy Warnow

With the increased availability of sequence data and even of fully sequenced and assembled genomes, phylogeny estimation of very large trees (even of hundreds of thousands of sequences) is now a goal for some biologists. Yet, the construction of these phylogenies is a complex pipeline presenting analytical and computational challenges, especially when the number of sequences is very large. In the last few years, new methods have been developed that aim to enable highly accurate phylogeny estimations on these large datasets, including divide-and-conquer techniques for multiple sequence alignment and/or tree estimation, methods that can estimate species trees from multi-locus datasets while addressing heterogeneity due to biological processes (e.g., incomplete lineage sorting and gene duplication and loss), and methods to add sequences into large gene trees or species trees. Here we present some of these recent advances and discuss opportunities for future improvements.


2021 ◽  
Author(s):  
Alberto Cenci ◽  
Mairenys Concepci&oacuten-Hernández ◽  
Geert Angenon ◽  
Mathieu Rouard

GDSL-type esterase/lipase (GELP) enzymes have multiple functions in plants, spanning from developmental processes to the response to biotic and abiotic stresses. Genes encoding GELP belong to a large gene family with several tens to more than hundred members per species in angiosperms. Here, we applied iterative phylogenic analyses to identify 10 main clusters subdivided into 44 expert-curated reference orthogroups (OGs) using three monocot and five dicot genomes. Our results show that some GELP OGs expanded while others were maintained as single copy genes. This semi-automatic approach proves to be effective to characterize large gene families and provides a solid classification framework for the GELP members in angiosperms. The orthogroup-based reference will be useful to perform comparative studies, infer gene functions and better understand the evolutionary history of this gene family.


2021 ◽  
Vol 10 (40) ◽  
Author(s):  
Lina Assad ◽  
Karolis Matjošaitis ◽  
Harald Gross

Escherichia coli Stbl4 is widely used as a laboratory strain for heterologous expression of large gene clusters. Since no genome sequence has been publicly available, we here report the draft sequence of Stbl4, including its F-plasmid. It should serve as a useful reference for researchers working with Stbl4.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Marta Lucchetta ◽  
Marco Pellegrini

AbstractComputational drug repositioning aims at ranking and selecting existing drugs for novel diseases or novel use in old diseases. In silico drug screening has the potential for speeding up considerably the shortlisting of promising candidates in response to outbreaks of diseases such as COVID-19 for which no satisfactory cure has yet been found. We describe DrugMerge as a methodology for preclinical computational drug repositioning based on merging multiple drug rankings obtained with an ensemble of disease active subnetworks. DrugMerge uses differential transcriptomic data on drugs and diseases in the context of a large gene co-expression network. Experiments with four benchmark diseases demonstrate that our method detects in first position drugs in clinical use for the specified disease, in all four cases. Application of DrugMerge to COVID-19 found rankings with many drugs currently in clinical trials for COVID-19 in top positions, thus showing that DrugMerge can mimic human expert judgment.


2021 ◽  
Vol 22 (S11) ◽  
Author(s):  
Sung-Gwon Lee ◽  
Dokyun Na ◽  
Chungoo Park

Abstract Background Lately, high-throughput RNA sequencing has been extensively used to elucidate the transcriptome landscape and dynamics of cell types of different species. In particular, for most non-model organisms lacking complete reference genomes with high-quality annotation of genetic information, reference-free (RF) de novo transcriptome analyses, rather than reference-based (RB) approaches, are widely used, and RF analyses have substantially contributed toward understanding the mechanisms regulating key biological processes and functions. To date, numerous bioinformatics studies have been conducted for assessing the workflow, production rate, and completeness of transcriptome assemblies within and between RF and RB datasets. However, the degree of consistency and variability of results obtained by analyzing gene expression levels through these two different approaches have not been adequately documented. Results In the present study, we evaluated the differences in expression profiles obtained with RF and RB approaches and revealed that the former tends to be satisfactorily replaced by the latter with respect to transcriptome repertoires, as well as from a gene expression quantification perspective. In addition, we urge cautious interpretation of these findings. Several genes that are lowly expressed, have long coding sequences, or belong to large gene families must be validated carefully, whenever gene expression levels are calculated using the RF method. Conclusions Our empirical results indicate important contributions toward addressing transcriptome-related biological questions in non-model organisms.


Sign in / Sign up

Export Citation Format

Share Document