manual curation Latest Research Papers

Incidental germline findings during molecular profiling of tumor tissues for precision oncology: molecular survey and methodological obstacles

Journal of Translational Medicine ◽

10.1186/s12967-022-03230-z ◽

2022 ◽

Vol 20 (1) ◽

Author(s):

Alexandra Lebedeva ◽

Yulia Shaykhutdinova ◽

Daria Seriak ◽

Ekaterina Ignatova ◽

Ekaterina Rozhavskaya ◽

...

Keyword(s):

Normal Tissue ◽

Sanger Sequencing ◽

Computational Prediction ◽

Bioinformatic Analysis ◽

Molecular Profiling ◽

Precision Oncology ◽

Germline Variants ◽

Manual Curation ◽

Uncertain Significance ◽

Tumor Types

Abstract Background A fraction of patients referred for complex molecular profiling of biopsied tumors may harbor germline variants in genes associated with the development of hereditary cancer syndromes (HCS). Neither the bioinformatic analysis nor the reporting of such incidental germline findings are standardized. Methods Data from Next-Generation Sequencing (NGS) of biopsied tumor samples referred for complex molecular profiling were analyzed for germline variants in HCS-associated genes. Analysis of variant origin was performed employing bioinformatic algorithms followed by manual curation. When possible, the origin of the variant was validated by Sanger sequencing of the sample of normal tissue. The variants’ pathogenicity was assessed according to ACMG/AMP. Results Tumors were sampled from 183 patients (Males: 75 [41.0%]; Females: 108 [59.0%]; mean [SD] age, 57.7 [13.3] years) and analysed by targeted NGS. The most common tumor types were colorectal (19%), pancreatic (13%), and lung cancer (10%). A total of 56 sequence variants in genes associated with HCS were detected in 40 patients. Of them, 17 variants found in 14 patients were predicted to be of germline origin, with 6 variants interpreted as pathogenic (PV) or likely pathogenic (LPV), and 9 as variants of uncertain significance (VUS). For the 41 out of 42 (97%) missense variants in HCS-associated genes, the results of computational prediction of variant origin were concordant with that of experimental examination. We estimate that Sanger sequencing of a sample of normal tissue would be required for ~ 1–7% of the total assessed cases with PV or LPV, when necessity to follow with genetic counselling referral in ~ 2–15% of total assessed cases (PV, LPV or VUS found in HCS genes). Conclusion Incidental findings of pathogenic germline variants are common in data from cancer patients referred for complex molecular profiling. We propose an algorithm for the management of patients with newly detected variants in genes associated with HCS.

Identifying the essential nutritional requirements of the probiotic bacteria Bifidobacterium animalis and Bifidobacterium longum through genome-scale modeling

npj Systems Biology and Applications ◽

10.1038/s41540-021-00207-4 ◽

2021 ◽

Vol 7 (1) ◽

Author(s):

Marie Schöpping ◽

Paula Gaspar ◽

Ana Rute Neves ◽

Carl Johan Franzén ◽

Ahmad A. Zeidan

Keyword(s):

Bifidobacterium Longum ◽

Defined Medium ◽

Nutritional Requirements ◽

Modeling Framework ◽

Carbohydrate Utilization ◽

Bifidobacterium Animalis ◽

Manual Curation ◽

Constraint Based Modeling ◽

Complex Culture ◽

Genome Scale

AbstractAlthough bifidobacteria are widely used as probiotics, their metabolism and physiology remain to be explored in depth. In this work, strain-specific genome-scale metabolic models were developed for two industrially and clinically relevant bifidobacteria, Bifidobacterium animalis subsp. lactis BB-12® and B. longum subsp. longum BB-46, and subjected to iterative cycles of manual curation and experimental validation. A constraint-based modeling framework was used to probe the metabolic landscape of the strains and identify their essential nutritional requirements. Both strains showed an absolute requirement for pantethine as a precursor for coenzyme A biosynthesis. Menaquinone-4 was found to be essential only for BB-46 growth, whereas nicotinic acid was only required by BB-12®. The model-generated insights were used to formulate a chemically defined medium that supports the growth of both strains to the same extent as a complex culture medium. Carbohydrate utilization profiles predicted by the models were experimentally validated. Furthermore, model predictions were quantitatively validated in the newly formulated medium in lab-scale batch fermentations. The models and the formulated medium represent valuable tools to further explore the metabolism and physiology of the two species, investigate the mechanisms underlying their health-promoting effects and guide the optimization of their industrial production processes.

Vitis OneGenE: A Causality-Based Approach to Generate Gene Networks in Vitis vinifera Sheds Light on the Laccase and Dirigent Gene Families

Biomolecules ◽

10.3390/biom11121744 ◽

2021 ◽

Vol 11 (12) ◽

pp. 1744

Author(s):

Stefania Pilati ◽

Giulia Malacarne ◽

David Navarro-Payá ◽

Gabriele Tomè ◽

Laura Riscica ◽

...

Keyword(s):

Gene Networks ◽

Functional Characterization ◽

Gene Families ◽

Stilbene Synthase ◽

Breeding Programs ◽

Transcriptomic Data ◽

Network Analyses ◽

Manual Curation ◽

Inference Methods

The abundance of transcriptomic data and the development of causal inference methods have paved the way for gene network analyses in grapevine. Vitis OneGenE is a transcriptomic data mining tool that finds direct correlations between genes, thus producing association networks. As a proof of concept, the stilbene synthase gene regulatory network obtained with OneGenE has been compared with published co-expression analysis and experimental data, including cistrome data for MYB stilbenoid regulators. As a case study, the two secondary metabolism pathways of stilbenoids and lignin synthesis were explored. Several isoforms of laccase, peroxidase, and dirigent protein genes, putatively involved in the final oxidative oligomerization steps, were identified as specifically belonging to either one of these pathways. Manual curation of the predicted sequences exploiting the last available genome assembly, and the integration of phylogenetic and OneGenE analyses, identified a group of laccases exclusively present in grapevine and related to stilbenoids. Here we show how network analysis by OneGenE can accelerate knowledge discovery by suggesting new candidates for functional characterization and application in breeding programs.

PURC v2.0: a program for improved sequence inference for polyploid phylogenetics and other manifestations of the multiple-copy problem

10.1101/2021.11.18.468666 ◽

2021 ◽

Author(s):

Peter W Schafran ◽

Fay-Wei W Li ◽

Carl Rothfels

Keyword(s):

Traditional Approach ◽

Consensus Sequence ◽

Operational Taxonomic Unit ◽

Phylogenetic Inference ◽

Biological Sequences ◽

Multiple Copy ◽

Sequencing Data ◽

Sequencing Errors ◽

Manual Curation ◽

Similarity Thresholds

Inferring the true biological sequences from amplicon mixtures remains a difficult bioinformatic problem. The traditional approach is to cluster sequencing reads by similarity thresholds and treat the consensus sequence of each cluster as an "operational taxonomic unit" (OTU). Recently, this approach has been improved upon by model-based methods that correct PCR and sequencing errors in order to infer "amplicon sequence variants" (ASVs). To date, ASV approaches have been used primarily in metagenomics, but they are also useful for identifying allelic or paralogous variants and for determining homeologs in polyploid organisms. To facilitate the usage of ASV methods among polyploidy researchers, we incorporated ASV inference alongside OTU clustering in PURC v2.0, a major update to PURC (Pipeline for Untangling Reticulate Complexes). In addition to preserving original PURC functions, PURC v2.0 allows users to process PacBio CCS/HiFi reads through DADA2 to generate and annotate ASVs for multiplexed data, with outputs including separate alignments for each locus ready for phylogenetic inference. In addition, PURC v2.0 features faster demultiplexing than the original version and has been updated to be compatible with Python 3. In this chapter we present results indicating that PURC v2.0 (using the ASV approach) is more likely to infer the correct biological sequences in comparison to the earlier OTU-based PURC, and describe how to prepare sequencing data, run PURC v2.0 under several different modes, and interpret the output. We expect that PURC v2.0 will provide biologists with a method for generating multi-locus "moderate data" datasets that are large enough to be phylogenetically informative and small enough for manual curation.

NCBITaxonomy.jl - rapid biological names finding and reconciliation

10.32942/osf.io/uvbfj ◽

2021 ◽

Author(s):

Timothée Poisot ◽

Rory Gibb ◽

Sadie Jane Ryan ◽

Colin Carlson

Keyword(s):

Quality Of Life ◽

Programming Language ◽

String Matching ◽

R Package ◽

Amount Of Information ◽

Manual Curation

NCBITaxonomy.jl is a package designed to facilitate the reconciliation and cleaning of taxonomic names, using a local copy of the NCBI taxonomic backbone (Federhen 2012, Schoch et al. 2020); The basic search functions are coupled with quality-of-life functions including case-insensitive search and custom fuzzy string matching to facilitate the amount of information that can be extracted automatically while allowing efficient manual curation and inspection of results. NCBITaxonomy.jl works with version 1.6 of the Julia programming language (Bezanson et al. 2017), and relies on the Apache Arrow format to store a local copy of the NCBI raw taxonomy files. The design of NCBITaxonomy.jl has been inspired by similar efforts, like the R package taxadb (Norman et al. 2020), which provides an offline alternative to packages like taxize (Chamberlain and Szöcs 2013).

Annotation of Putative Circadian Rhythm-Associated Genes in Diaphorina citri (Hemiptera : Liviidae)

10.1101/2021.10.09.463768 ◽

2021 ◽

Author(s):

Max Reynolds ◽

Lucas de Oliveira ◽

Thompson Paris ◽

Chad Vosburg ◽

Crissy Massimino ◽

...

Keyword(s):

Circadian Rhythm ◽

Crop Yields ◽

Bacterial Pathogen ◽

Danaus Plexippus ◽

Diaphorina Citri ◽

Citrus Greening Disease ◽

Manual Curation ◽

Molecular Therapeutics ◽

Candidatus Liberibacter ◽

Liberibacter Asiaticus

The circadian rhythm is a process involving multiple genes that generates an internal molecular clock, allowing organisms to anticipate environmental conditions produced by the earth's rotation on its axis. This report presents the results of the manual curation of twenty-seven genes likely associated with circadian rhythm in the genome of Diaphorina citri, the Asian citrus psyllid. This insect acts as the vector of the bacterial pathogen Candidatus Liberibacter asiaticus (CLas), the causal agent of citrus greening disease (Huanglongbing). This disease is the most severe detriment to citrus industries and has drastically decreased crop yields worldwide. Based on the genes identified in the psyllid genome, namely cry1 and cry2, D. citri likely possesses a circadian model similar to that of the lepidopteran butterfly, Danaus plexippus. Manual annotation of these genes will allow future molecular therapeutics to be developed that can disrupt the psyllid biology.

Humans and machines in biomedical knowledge curation: hypertrophic cardiomyopathy molecular mechanisms’ representation

BioData Mining ◽

10.1186/s13040-021-00279-2 ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Mila Glavaški ◽

Lazar Velicki

Keyword(s):

Hypertrophic Cardiomyopathy ◽

Molecular Mechanisms ◽

Centrality Measures ◽

Biomedical Knowledge ◽

Topological Parameters ◽

Manual Curation ◽

Single Reading ◽

Extraction Performance ◽

Reading System ◽

High Level

Abstract Background Biomedical knowledge is dispersed in scientific literature and is growing constantly. Curation is the extraction of knowledge from unstructured data into a computable form and could be done manually or automatically. Hypertrophic cardiomyopathy (HCM) is the most common inherited cardiac disease, with genotype–phenotype associations still incompletely understood. We compared human- and machine-curated HCM molecular mechanisms’ models and examined the performance of different machine approaches for that task. Results We created six models representing HCM molecular mechanisms using different approaches and made them publicly available, analyzed them as networks, and tried to explain the models’ differences by the analysis of factors that affect the quality of machine-curated models (query constraints and reading systems’ performance). A result of this work is also the Interactive HCM map, the only publicly available knowledge resource dedicated to HCM. Sizes and topological parameters of the networks differed notably, and a low consensus was found in terms of centrality measures between networks. Consensus about the most important nodes was achieved only with respect to one element (calcium). Models with a reduced level of noise were generated and cooperatively working elements were detected. REACH and TRIPS reading systems showed much higher accuracy than Sparser, but at the cost of extraction performance. TRIPS proved to be the best single reading system for text segments about HCM, in terms of the compromise between accuracy and extraction performance. Conclusions Different approaches in curation can produce models of the same disease with diverse characteristics, and they give rise to utterly different conclusions in subsequent analysis. The final purpose of the model should direct the choice of curation techniques. Manual curation represents the gold standard for information extraction in biomedical research and is most suitable when only high-quality elements for models are required. Automated curation provides more substance, but high level of noise is expected. Different curation strategies can reduce the level of human input needed. Biomedical knowledge would benefit overwhelmingly, especially as to its rapid growth, if computers were to be able to assist in analysis on a larger scale.

O-JMeSH: creating a bilingual English-Japanese controlled vocabulary of MeSH UIDs through machine translation and mutual information

Genomics & Informatics ◽

10.5808/gi.21014 ◽

2021 ◽

Vol 19 (3) ◽

pp. e26

Author(s):

Felipe Soares ◽

Yuka Tateisi ◽

Terue Takatsuki ◽

Atsuko Yamaguchi

Keyword(s):

Mutual Information ◽

Machine Translation ◽

Computational Approach ◽

Controlled Vocabulary ◽

Bilingual Dictionaries ◽

Transformation Rules ◽

Manual Curation ◽

Mesh Terms ◽

Average Accuracy ◽

The Individual

Previous approaches to create a controlled vocabulary for Japanese have resorted to existing bilingual dictionary and transformation rules to allow such mappings. However, given the possible new terms introduced due to coronavirus disease 2019 (COVID-19) and the emphasis on respiratory and infection-related terms, coverage might not be guaranteed. We propose creating a Japanese bilingual controlled vocabulary based on MeSH terms assigned to COVID-19 related publications in this work. For such, we resorted to manual curation of several bilingual dictionaries and a computational approach based on machine translation of sentences containing such terms and the ranking of possible translations for the individual terms by mutual information. Our results show that we achieved nearly 99% occurrence coverage in LitCovid, while our computational approach presented average accuracy of 63.33% for all terms, and 84.51% for drugs and chemicals.

Beyond the reach of homology: successive computational filters find yeast pheromone genes

10.1101/2021.09.28.462209 ◽

2021 ◽

Author(s):

Sriram Srikant ◽

Rachelle Gaudet ◽

Andrew W Murray

Keyword(s):

Proteolytic Processing ◽

Mating Types ◽

Computational Pipeline ◽

Gene Encoding ◽

Closely Related Species ◽

Homologous Sequences ◽

Manual Curation ◽

Fungal Genomes ◽

Strong Candidate ◽

Time Required

The mating of fungi depends on pheromones that mediate communication between two mating types. Most species use short peptides as pheromones, which are either unmodified (e.g., α-factor in Saccharomyces cerevisiae) or C-terminally farnesylated (e.g., a-factor in S. cerevisiae). Peptide pheromones have been found by genetics or biochemistry in small number of fungi, but their short sequences and modest conservation make it impossible to detect homologous sequences in most species. To overcome this problem, we used a four-step computational pipeline to identify candidate a-factor genes in sequenced genomes of the Saccharomycotina, the fungal clade that contains most of the yeasts: we require that candidate genes have a C-terminal prenylation motif, are fewer than 100 amino acids long, contain a proteolytic processing motif upstream of the potential mature pheromone sequence, and that closely related species contain highly conserved homologs of the potential mature pheromone sequence. Additional manual curation exploits the observation that many species carry more than one a-factor gene, encoding identical or nearly identical pheromones. From 332 fungal genomes, we identified strong candidate pheromone genes in 238 genomes, covering 13 clades that are separated from each other by at least 100 million years, the time required for evolution to remove detectable sequence homology. For one small clade, the Yarrowia, we demonstrated that our algorithm found the a-factor genes: deleting all four related genes in the a-mating type of Yarrowia lipolytica prevents mating.

Systems biology analysis of lung fibrosis-related genes in the bleomycin mouse model

Scientific Reports ◽

10.1038/s41598-021-98674-6 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Dmitri Toren ◽

Hagai Yanai ◽

Reem Abu Taha ◽

Gabriela Bunu ◽

Eugen Ursu ◽

...

Keyword(s):

Data Mining ◽

Systems Biology ◽

Pulmonary Fibrosis ◽

Lung Fibrosis ◽

Therapeutic Strategy ◽

Meta Analysis ◽

Systematic Analysis ◽

Manual Curation ◽

Age Related ◽

Human Pathology

AbstractTissue fibrosis is a major driver of pathology in aging and is involved in numerous age-related diseases. The lungs are particularly susceptible to fibrotic pathology which is currently difficult to treat. The mouse bleomycin-induced fibrosis model was developed to investigate lung fibrosis and widely used over the years. However, a systematic analysis of the accumulated results has not been performed. We undertook a comprehensive data mining and subsequent manual curation, resulting in a collection of 213 genes (available at the TiRe database, www.tiredb.org), which when manipulated had a clear impact on bleomycin-induced lung fibrosis. Our meta-analysis highlights the age component in pulmonary fibrosis and strong links of related genes with longevity. The results support the validity of the bleomycin model to human pathology and suggest the importance of a multi-target therapeutic strategy for pulmonary fibrosis treatment.

manual curation
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Incidental germline findings during molecular profiling of tumor tissues for precision oncology: molecular survey and methodological obstacles

Identifying the essential nutritional requirements of the probiotic bacteria Bifidobacterium animalis and Bifidobacterium longum through genome-scale modeling

Vitis OneGenE: A Causality-Based Approach to Generate Gene Networks in Vitis vinifera Sheds Light on the Laccase and Dirigent Gene Families

PURC v2.0: a program for improved sequence inference for polyploid phylogenetics and other manifestations of the multiple-copy problem

NCBITaxonomy.jl - rapid biological names finding and reconciliation

Annotation of Putative Circadian Rhythm-Associated Genes in Diaphorina citri (Hemiptera : Liviidae)

Humans and machines in biomedical knowledge curation: hypertrophic cardiomyopathy molecular mechanisms’ representation

O-JMeSH: creating a bilingual English-Japanese controlled vocabulary of MeSH UIDs through machine translation and mutual information

Beyond the reach of homology: successive computational filters find yeast pheromone genes

Systems biology analysis of lung fibrosis-related genes in the bleomycin mouse model

Export Citation Format

manual curationRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Incidental germline findings during molecular profiling of tumor tissues for precision oncology: molecular survey and methodological obstacles

Identifying the essential nutritional requirements of the probiotic bacteria Bifidobacterium animalis and Bifidobacterium longum through genome-scale modeling

Vitis OneGenE: A Causality-Based Approach to Generate Gene Networks in Vitis vinifera Sheds Light on the Laccase and Dirigent Gene Families

PURC v2.0: a program for improved sequence inference for polyploid phylogenetics and other manifestations of the multiple-copy problem

NCBITaxonomy.jl - rapid biological names finding and reconciliation

Annotation of Putative Circadian Rhythm-Associated Genes in Diaphorina citri (Hemiptera : Liviidae)

Humans and machines in biomedical knowledge curation: hypertrophic cardiomyopathy molecular mechanisms’ representation

O-JMeSH: creating a bilingual English-Japanese controlled vocabulary of MeSH UIDs through machine translation and mutual information

Beyond the reach of homology: successive computational filters find yeast pheromone genes

Systems biology analysis of lung fibrosis-related genes in the bleomycin mouse model

manual curation
Recently Published Documents