manual curation
Recently Published Documents


TOTAL DOCUMENTS

234
(FIVE YEARS 141)

H-INDEX

18
(FIVE YEARS 7)

2022 ◽  
Vol 20 (1) ◽  
Author(s):  
Alexandra Lebedeva ◽  
Yulia Shaykhutdinova ◽  
Daria Seriak ◽  
Ekaterina Ignatova ◽  
Ekaterina Rozhavskaya ◽  
...  

Abstract Background A fraction of patients referred for complex molecular profiling of biopsied tumors may harbor germline variants in genes associated with the development of hereditary cancer syndromes (HCS). Neither the bioinformatic analysis nor the reporting of such incidental germline findings are standardized. Methods Data from Next-Generation Sequencing (NGS) of biopsied tumor samples referred for complex molecular profiling were analyzed for germline variants in HCS-associated genes. Analysis of variant origin was performed employing bioinformatic algorithms followed by manual curation. When possible, the origin of the variant was validated by Sanger sequencing of the sample of normal tissue. The variants’ pathogenicity was assessed according to ACMG/AMP. Results Tumors were sampled from 183 patients (Males: 75 [41.0%]; Females: 108 [59.0%]; mean [SD] age, 57.7 [13.3] years) and analysed by targeted NGS. The most common tumor types were colorectal (19%), pancreatic (13%), and lung cancer (10%). A total of 56 sequence variants in genes associated with HCS were detected in 40 patients. Of them, 17 variants found in 14 patients were predicted to be of germline origin, with 6 variants interpreted as pathogenic (PV) or likely pathogenic (LPV), and 9 as variants of uncertain significance (VUS). For the 41 out of 42 (97%) missense variants in HCS-associated genes, the results of computational prediction of variant origin were concordant with that of experimental examination. We estimate that Sanger sequencing of a sample of normal tissue would be required for ~ 1–7% of the total assessed cases with PV or LPV, when necessity to follow with genetic counselling referral in ~ 2–15% of total assessed cases (PV, LPV or VUS found in HCS genes). Conclusion Incidental findings of pathogenic germline variants are common in data from cancer patients referred for complex molecular profiling. We propose an algorithm for the management of patients with newly detected variants in genes associated with HCS.


2021 ◽  
Vol 7 (1) ◽  
Author(s):  
Marie Schöpping ◽  
Paula Gaspar ◽  
Ana Rute Neves ◽  
Carl Johan Franzén ◽  
Ahmad A. Zeidan

AbstractAlthough bifidobacteria are widely used as probiotics, their metabolism and physiology remain to be explored in depth. In this work, strain-specific genome-scale metabolic models were developed for two industrially and clinically relevant bifidobacteria, Bifidobacterium animalis subsp. lactis BB-12® and B. longum subsp. longum BB-46, and subjected to iterative cycles of manual curation and experimental validation. A constraint-based modeling framework was used to probe the metabolic landscape of the strains and identify their essential nutritional requirements. Both strains showed an absolute requirement for pantethine as a precursor for coenzyme A biosynthesis. Menaquinone-4 was found to be essential only for BB-46 growth, whereas nicotinic acid was only required by BB-12®. The model-generated insights were used to formulate a chemically defined medium that supports the growth of both strains to the same extent as a complex culture medium. Carbohydrate utilization profiles predicted by the models were experimentally validated. Furthermore, model predictions were quantitatively validated in the newly formulated medium in lab-scale batch fermentations. The models and the formulated medium represent valuable tools to further explore the metabolism and physiology of the two species, investigate the mechanisms underlying their health-promoting effects and guide the optimization of their industrial production processes.


Biomolecules ◽  
2021 ◽  
Vol 11 (12) ◽  
pp. 1744
Author(s):  
Stefania Pilati ◽  
Giulia Malacarne ◽  
David Navarro-Payá ◽  
Gabriele Tomè ◽  
Laura Riscica ◽  
...  

The abundance of transcriptomic data and the development of causal inference methods have paved the way for gene network analyses in grapevine. Vitis OneGenE is a transcriptomic data mining tool that finds direct correlations between genes, thus producing association networks. As a proof of concept, the stilbene synthase gene regulatory network obtained with OneGenE has been compared with published co-expression analysis and experimental data, including cistrome data for MYB stilbenoid regulators. As a case study, the two secondary metabolism pathways of stilbenoids and lignin synthesis were explored. Several isoforms of laccase, peroxidase, and dirigent protein genes, putatively involved in the final oxidative oligomerization steps, were identified as specifically belonging to either one of these pathways. Manual curation of the predicted sequences exploiting the last available genome assembly, and the integration of phylogenetic and OneGenE analyses, identified a group of laccases exclusively present in grapevine and related to stilbenoids. Here we show how network analysis by OneGenE can accelerate knowledge discovery by suggesting new candidates for functional characterization and application in breeding programs.


2021 ◽  
Author(s):  
Peter W Schafran ◽  
Fay-Wei W Li ◽  
Carl Rothfels

Inferring the true biological sequences from amplicon mixtures remains a difficult bioinformatic problem. The traditional approach is to cluster sequencing reads by similarity thresholds and treat the consensus sequence of each cluster as an "operational taxonomic unit" (OTU). Recently, this approach has been improved upon by model-based methods that correct PCR and sequencing errors in order to infer "amplicon sequence variants" (ASVs). To date, ASV approaches have been used primarily in metagenomics, but they are also useful for identifying allelic or paralogous variants and for determining homeologs in polyploid organisms. To facilitate the usage of ASV methods among polyploidy researchers, we incorporated ASV inference alongside OTU clustering in PURC v2.0, a major update to PURC (Pipeline for Untangling Reticulate Complexes). In addition to preserving original PURC functions, PURC v2.0 allows users to process PacBio CCS/HiFi reads through DADA2 to generate and annotate ASVs for multiplexed data, with outputs including separate alignments for each locus ready for phylogenetic inference. In addition, PURC v2.0 features faster demultiplexing than the original version and has been updated to be compatible with Python 3. In this chapter we present results indicating that PURC v2.0 (using the ASV approach) is more likely to infer the correct biological sequences in comparison to the earlier OTU-based PURC, and describe how to prepare sequencing data, run PURC v2.0 under several different modes, and interpret the output. We expect that PURC v2.0 will provide biologists with a method for generating multi-locus "moderate data" datasets that are large enough to be phylogenetically informative and small enough for manual curation.


2021 ◽  
Author(s):  
Timothée Poisot ◽  
Rory Gibb ◽  
Sadie Jane Ryan ◽  
Colin Carlson

NCBITaxonomy.jl is a package designed to facilitate the reconciliation and cleaning of taxonomic names, using a local copy of the NCBI taxonomic backbone (Federhen 2012, Schoch et al. 2020); The basic search functions are coupled with quality-of-life functions including case-insensitive search and custom fuzzy string matching to facilitate the amount of information that can be extracted automatically while allowing efficient manual curation and inspection of results. NCBITaxonomy.jl works with version 1.6 of the Julia programming language (Bezanson et al. 2017), and relies on the Apache Arrow format to store a local copy of the NCBI raw taxonomy files. The design of NCBITaxonomy.jl has been inspired by similar efforts, like the R package taxadb (Norman et al. 2020), which provides an offline alternative to packages like taxize (Chamberlain and Szöcs 2013).


2021 ◽  
Author(s):  
Max Reynolds ◽  
Lucas de Oliveira ◽  
Thompson Paris ◽  
Chad Vosburg ◽  
Crissy Massimino ◽  
...  

The circadian rhythm is a process involving multiple genes that generates an internal molecular clock, allowing organisms to anticipate environmental conditions produced by the earth's rotation on its axis. This report presents the results of the manual curation of twenty-seven genes likely associated with circadian rhythm in the genome of Diaphorina citri, the Asian citrus psyllid. This insect acts as the vector of the bacterial pathogen Candidatus Liberibacter asiaticus (CLas), the causal agent of citrus greening disease (Huanglongbing). This disease is the most severe detriment to citrus industries and has drastically decreased crop yields worldwide. Based on the genes identified in the psyllid genome, namely cry1 and cry2, D. citri likely possesses a circadian model similar to that of the lepidopteran butterfly, Danaus plexippus. Manual annotation of these genes will allow future molecular therapeutics to be developed that can disrupt the psyllid biology.


2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Mila Glavaški ◽  
Lazar Velicki

Abstract Background Biomedical knowledge is dispersed in scientific literature and is growing constantly. Curation is the extraction of knowledge from unstructured data into a computable form and could be done manually or automatically. Hypertrophic cardiomyopathy (HCM) is the most common inherited cardiac disease, with genotype–phenotype associations still incompletely understood. We compared human- and machine-curated HCM molecular mechanisms’ models and examined the performance of different machine approaches for that task. Results We created six models representing HCM molecular mechanisms using different approaches and made them publicly available, analyzed them as networks, and tried to explain the models’ differences by the analysis of factors that affect the quality of machine-curated models (query constraints and reading systems’ performance). A result of this work is also the Interactive HCM map, the only publicly available knowledge resource dedicated to HCM. Sizes and topological parameters of the networks differed notably, and a low consensus was found in terms of centrality measures between networks. Consensus about the most important nodes was achieved only with respect to one element (calcium). Models with a reduced level of noise were generated and cooperatively working elements were detected. REACH and TRIPS reading systems showed much higher accuracy than Sparser, but at the cost of extraction performance. TRIPS proved to be the best single reading system for text segments about HCM, in terms of the compromise between accuracy and extraction performance. Conclusions Different approaches in curation can produce models of the same disease with diverse characteristics, and they give rise to utterly different conclusions in subsequent analysis. The final purpose of the model should direct the choice of curation techniques. Manual curation represents the gold standard for information extraction in biomedical research and is most suitable when only high-quality elements for models are required. Automated curation provides more substance, but high level of noise is expected. Different curation strategies can reduce the level of human input needed. Biomedical knowledge would benefit overwhelmingly, especially as to its rapid growth, if computers were to be able to assist in analysis on a larger scale.


2021 ◽  
Vol 19 (3) ◽  
pp. e26
Author(s):  
Felipe Soares ◽  
Yuka Tateisi ◽  
Terue Takatsuki ◽  
Atsuko Yamaguchi

Previous approaches to create a controlled vocabulary for Japanese have resorted to existing bilingual dictionary and transformation rules to allow such mappings. However, given the possible new terms introduced due to coronavirus disease 2019 (COVID-19) and the emphasis on respiratory and infection-related terms, coverage might not be guaranteed. We propose creating a Japanese bilingual controlled vocabulary based on MeSH terms assigned to COVID-19 related publications in this work. For such, we resorted to manual curation of several bilingual dictionaries and a computational approach based on machine translation of sentences containing such terms and the ranking of possible translations for the individual terms by mutual information. Our results show that we achieved nearly 99% occurrence coverage in LitCovid, while our computational approach presented average accuracy of 63.33% for all terms, and 84.51% for drugs and chemicals.


2021 ◽  
Author(s):  
Sriram Srikant ◽  
Rachelle Gaudet ◽  
Andrew W Murray

The mating of fungi depends on pheromones that mediate communication between two mating types. Most species use short peptides as pheromones, which are either unmodified (e.g., α-factor in Saccharomyces cerevisiae) or C-terminally farnesylated (e.g., a-factor in S. cerevisiae). Peptide pheromones have been found by genetics or biochemistry in small number of fungi, but their short sequences and modest conservation make it impossible to detect homologous sequences in most species. To overcome this problem, we used a four-step computational pipeline to identify candidate a-factor genes in sequenced genomes of the Saccharomycotina, the fungal clade that contains most of the yeasts: we require that candidate genes have a C-terminal prenylation motif, are fewer than 100 amino acids long, contain a proteolytic processing motif upstream of the potential mature pheromone sequence, and that closely related species contain highly conserved homologs of the potential mature pheromone sequence. Additional manual curation exploits the observation that many species carry more than one a-factor gene, encoding identical or nearly identical pheromones. From 332 fungal genomes, we identified strong candidate pheromone genes in 238 genomes, covering 13 clades that are separated from each other by at least 100 million years, the time required for evolution to remove detectable sequence homology. For one small clade, the Yarrowia, we demonstrated that our algorithm found the a-factor genes: deleting all four related genes in the a-mating type of Yarrowia lipolytica prevents mating.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Dmitri Toren ◽  
Hagai Yanai ◽  
Reem Abu Taha ◽  
Gabriela Bunu ◽  
Eugen Ursu ◽  
...  

AbstractTissue fibrosis is a major driver of pathology in aging and is involved in numerous age-related diseases. The lungs are particularly susceptible to fibrotic pathology which is currently difficult to treat. The mouse bleomycin-induced fibrosis model was developed to investigate lung fibrosis and widely used over the years. However, a systematic analysis of the accumulated results has not been performed. We undertook a comprehensive data mining and subsequent manual curation, resulting in a collection of 213 genes (available at the TiRe database, www.tiredb.org), which when manipulated had a clear impact on bleomycin-induced lung fibrosis. Our meta-analysis highlights the age component in pulmonary fibrosis and strong links of related genes with longevity. The results support the validity of the bleomycin model to human pathology and suggest the importance of a multi-target therapeutic strategy for pulmonary fibrosis treatment.


Sign in / Sign up

Export Citation Format

Share Document