scholarly journals COSMOS Next Generation – a public knowledge base leveraging chemical and biological data to support the regulatory assessment of chemicals

2021 ◽  
pp. 100175
Author(s):  
C. Yang ◽  
M.T.D. Cronin ◽  
K.B. Arvidson ◽  
B. Bienfait ◽  
S.J. Enoch ◽  
...  
2019 ◽  
Vol 23 (3) ◽  
pp. 312-319
Author(s):  
P. S. Demenkov ◽  
O. V. Saik ◽  
T. V. Ivanisenko ◽  
N. A. Kolchanov ◽  
A. V. Kochetov ◽  
...  

The development of highly efficient technologies in genomics, transcriptomics, proteomics and metabolomics, as well as new technologies in agriculture has led to an “information explosion” in plant biology and crop production, including potato production. Only a small part of the information reaches formalized databases (for example, Uniprot, NCBI Gene, BioGRID, IntAct, etc.). One of the main sources of reliable biological data is the scientific literature. The well-known PubMed database contains more than 18 thousand abstracts of articles on potato. The effective use of knowledge presented in such a number of non-formalized documents in natural language requires the use of modern intellectual methods of analysis. However, in the literature, there is no evidence of a widespread use of intelligent methods for automatically extracting knowledge from scientific publications on cultures such as potatoes. Earlier we developed the SOLANUM TUBEROSUM knowledge base (http://www-bionet.sysbio.cytogen. ru/and/plant/). Integrated into the knowledge base information about the molecular genetic mechanisms underlying the selection of significant traits helps to accelerate the identification of candidate genes for the breeding characteristics of potatoes and the development of diagnostic markers for breeding. The article searches for new potential participants of the molecular genetic mechanisms of resistance to adverse factors in plants. Prioritizing candidate genes has shown that the PHYA, GF14, CNIH1, RCI1A, ABI5, CPK1, RGS1, NHL3, GRF8, and CYP21-4 genes are the most promising for further testing of their relationships with resistance to adverse factors. As a result of the analysis, it was shown that the molecular genetic relationships responsible for the formation of significant agricultural traits are complex and include many direct and indirect interactions. The construction of associative gene networks and their analysis using the SOLANUM TUBEROSUM knowledge base is the basis for searching for target genes for targeted mutagenesis and marker-oriented selection of potato varieties with valuable agricultural characteristics.


2021 ◽  
Author(s):  
Hyungtaek Jung ◽  
Brendan Jeon ◽  
Daniel Ortiz-Barrientos

Storing and manipulating Next Generation Sequencing (NGS) file formats is an essential but difficult task in biological data analysis. The easyfm ( easy f ile m anipulation) toolkit ( https://github.com/TaekAndBrendan/easyfm ) makes manipulating commonly used NGS files more accessible to biologists. It enables them to perform end-to-end reproducible data analyses using a free standalone desktop application (available on Windows, Mac and Linux). Unlike existing tools (e.g. Galaxy), the Graphical User Interface (GUI)-based easyfm is not dependent on any high-performance computing (HPC) system and can be operated without an internet connection. This specific benefit allow easyfm to seamlessly integrate visual and interactive representations of NGS files, supporting a wider scope of bioinformatics applications in the life sciences.


2006 ◽  
Vol 22 (23) ◽  
pp. 2971-2972 ◽  
Author(s):  
S. B. Hedges ◽  
J. Dudley ◽  
S. Kumar

2021 ◽  
pp. 425-437
Author(s):  
Jiahui Zhang ◽  
Changming Zhang ◽  
Erzhi Gao ◽  
Qing Zhou

<b><i>Background:</i></b> At least 10% of adults and most of the children who receive renal replacement therapy have inherited kidney diseases. These disorders substantially decrease their life quality and have a large effect on the health-care system. Multisystem complications, with typical challenges for rare disorders, including variable phenotypes and fragmented clinical and biological data, make genetic diagnosis of inherited kidney disorders difficult. In current clinical practice, genetic diagnosis is important for clinical management, estimating disease development, and applying personal treatment for patients. <b><i>Summary:</i></b> Inherited kidney diseases comprise hundreds of different disorders. Here, we have summarized various monogenic kidney disorders. These disorders are caused by mutations in genes coding for a wide range of proteins including receptors, channels/transporters, enzymes, transcription factors, and structural components that might also have a role in extrarenal organs (bone, eyes, brain, skin, ear, etc.). With the development of next-generation sequencing technologies, genetic testing and analysis become more accessible, promoting our understanding of the pathophysiologic mechanisms of inherited kidney diseases. However, challenges exist in interpreting the significance of genetic variants and translating them to guide clinical managements. Alport syndrome is chosen as an example to introduce the practical application of genetic testing and diagnosis on inherited kidney diseases, considering its clinical features, genetic backgrounds, and genetic testing for making a genetic diagnosis. <b><i>Key Messages:</i></b> Recent advances in genomics have highlighted the complexity of Mendelian disorders, which is due to allelic heterogeneity (distinct mutations in the same gene produce distinct phenotypes), locus heterogeneity (mutations in distinct genes result in similar phenotypes), reduced penetrance, variable expressivity, modifier genes, and/or environmental factors. Implementation of precision medicine in clinical nephrology can improve the clinical diagnostic rate and treatment efficiency of kidney diseases, which requires a good understanding of genetics for nephrologists.


2017 ◽  
Author(s):  
◽  
Siva Ratna Kumari Narisetti

Multi-level 'OMICS' data integration for multiple organisms has been one of the major challenges in the era of advanced next generation sequencing and high performance technologies. Biological data has been producing tremendously fast with the availability of these high throughput sequencing technologies at low price and high speed. However, these data are often stored individually across different web resources based on data type and organism, making it difficult to find and integrate them. There are many websites available which store data from different data types and display that data in pie charts or plain text format but limit their data to only one fixed organism. These web-based multi-omics analysis is an efficient and easy way of analyzing the data but it would be difficult for other researchers working with other organisms and with complex data. The complex multi-omics data requires extensive data management, exhaustive computational analysis, and effective integration to have a one-stop interactive, web-based portal to browse, access, analyze, integrate and share knowledge about genomics and molecular mechanisms, with ultimate links to phenotypes and traits for many different organisms. To achieve this, we have developed Knowledge Base Commons (KBCommons), a platform that automates the process of establishing the database and making the tools available for organisms via a dedicated web resource. KBCommons is currently supporting four different categories including Plants and Crops; Animals and Pets; Humans and Diseases; Microbes and Viruses. It has four main functionalities including Browse KBCommons, Contribute to KB, Add version to KB, and Create a new KB. Using KBCommons, researchers from different groups with different organisms' data can be shared and accessed among all. KBCommons is an automatic framework which uses famous and widely used Laravel PHP framework. This is very efficient to deal with complex and diverse biological datasets. In the Browse KBCommons section, all existing organisms will be displayed under each category and it also shows organisms which can be used as model organisms. KBCommons also displays the logo of each organism along with existing versions, in this way it will give a detailed information on all existing organisms. The user can browse existing data of each organism using various tools including Blast, Multiple Sequence Alignment, Motif Sampler, etc., by going to that particular page. Users can also visualize gene expression and differential expression data via pie charts and plain text. Add version to KB and Create a new KB are related because of their similar steps in the process, users must bring corresponding data in each section. When a particular organism of interest is not existing then the user can create a new Knowledge Base for that new organism with 6 essential files of Genome Sequence, protein coding sequence for Amino acid, gene coding sequence for Nucleotide and Spliced mRNA transcripts, mRNA sequences in GFF3, and a functional annotation file. In Add version to KB, if an organism is already existing then the user can add a new version to the existing KB with these 6 essential files for the new version. In Contribute to KB, user can upload multi-omics data including Transcriptomics -- RNA-Seq and Microarray; Proteomics -- Mass Spectrometry and 2DGel; Epigenomics -- Bisulphite Sequencing, Methylation Array, and MBD-Seq Array. We support both gene expression/ protein expression/ or methylation data and differential expression comparison for each data type. We also support different entities including miRNA/sRNA, Metabolite, SNP/GWAS, Plant introduction lines/ Animal strains, and Phenotype/ TRAIT/Diseases.


Blood ◽  
2016 ◽  
Vol 128 (22) ◽  
pp. 2434-2434 ◽  
Author(s):  
Francois Girodon ◽  
Fabrice Airaud ◽  
Garrec Céline ◽  
Pacault Mathilde ◽  
Dumont Solenne ◽  
...  

Abstract Introduction: Erythrocytoses are characterized by an elevated red cell mass. The most widely studied disease is Polycythemia Vera (PV), a myeloproliferative neoplasm due to the acquired JAK2-V617F mutation. However, other types of erythrocytoses exist and are of major importance. They can be either inherited (Congenital Erythrocytosis-CE) or diagnosed in adult patients with no family history (Idiopathic Erythrocytosis-IE). CE/IE are not associated with myeloproliferation but they can be associated with severe thrombo-embolic or haemorrhagic events, pulmonary arterial hypertension and, rarely, tumours. The 8 genes identified so far as causing CE lie at the crossroads of major biological pathways (metabolism, inflammation, oncogenesis) and are implicated in multiple diseases. These genes are involved (i) in the regulation of the hypoxia pathway, PHD2 (also called EGLN1), HIF-2A (EPAS1), VHL, (ii) in proliferation and differentiation of erythroid progenitors (EPOR), or (iii) in mature cell function, haemoglobins (HBB, HBA1, HBA2) or bisphosphoglyceratemutase (BPGM). However, in 80% of cases the cause remains unknown meaning that no proper diagnosis can be made, no prognosis or advice can be provided to CE/IE patients and their families, and no curative treatment exists. Method: We created and developed a national network in France to (i) identify, (ii) collect and (iii) analyze the genomic abnormalities in patients suspected of CE/IE. The selection of patients was performed using a clinical and biological data sheet including mandatory further tests in order to exclude patients with PV or obvious secondary erythrocytosis related to lung, cardiac or renal disorder. Next generation sequencing (NGS) has been used to analyse the presence of mutations in 17 genes (VHL, PHD1, PHD2 and PHD3, HIF1A, HIF2A, HIF3A, FH, BPGM, and 8 other candidate genes). SureDesign software (Agilent, Santa Clara, CA) was used to design the custom HaloPlex capture assay. For sequence capture, HaloPlex Target Enrichment System Kit (AgilentR), for Illumina sequencing was used, according to the manufacturer's instructions. Results: To date, samples from 103 patients have been recorded, among whom 46 have been tested using NGS approach. Variants in 10 (21%) patients [9 males and 1 female ; median age 50 y. (12-71)] with unknown significance have been detected, including 4 in PHD genes, 5 in HIF genes, and 1 in JAK2 gene. In patients with variants, a familial history of erythrocytosis was noted in 2. No independent thrombotic complication was reported in the 10 patients. The proportion of variants detected (21%) was close to the classical rate of genomic abnormalities usually observed in CE/IE. In 2 patients (one with a PHD2 and one with a JAK2 variants), the erythropoietin was low, whereas for the others, the erythropoietin was normal. Of note, the median age of the patients was surprisingly high, suggesting that the diagnostic was not previously performed due to the absence of available tests. Indeed, the diagnostic approaches using NGS techniques led to a considerable time gain and facilitated the identification of certain molecular abnormalities associated with CE/IE Conclusion: NGS is a useful tool to explore mutations in CE/IE, but identifies genetic variants in only 20% of patients with such disorder. In vitro, in cellulo and in vivo (including zebrafish models) functional studies are currently performed to validate the clinical relevance of these variants. Further exams including whole exome sequencing are planned to achieve a right diagnosis in the 80% remaining CE/IE patients without identified genomic abnormalities. Disclosures No relevant conflicts of interest to declare.


PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e8699 ◽  
Author(s):  
Angelo D. Armijos Carrion ◽  
Damien D. Hinsinger ◽  
Joeri S. Strijk

Background With the rapid increase in availability of genomic resources offered by Next-Generation Sequencing (NGS) and the availability of free online genomic databases, efficient and standardized metadata curation approaches have become increasingly critical for the post-processing stages of biological data. Especially in organelle-based studies using circular chloroplast genome datasets, the assembly of the main structural regions in random order and orientation represents a major limitation in our ability to easily generate “ready-to-align” datasets for phylogenetic reconstruction, at both small and large taxonomic scales. In addition, current practices discard the most variable regions of the genomes to facilitate the alignment of the remaining coding regions. Nevertheless, no software is currently available to perform curation to such a degree, through simple detection, organization and positioning of the main plastome regions, making it a time-consuming and error-prone process. Here we introduce a fast and user friendly software ECuADOR, a Perl script specifically designed to automate the detection and reorganization of newly assembled plastomes obtained from any source available (NGS, sanger sequencing or assembler output). Methods ECuADOR uses a sliding-window approach to detect long repeated sequences in draft sequences, which then identifies the inverted repeat regions (IRs), even in case of artifactual breaks or sequencing errors and automates the rearrangement of the sequence to the widely used LSC–Irb–SSC–IRa order. This facilitates rapid post-editing steps such as creation of genome alignments, detection of variable regions, SNP detection and phylogenomic analyses. Results ECuADOR was successfully tested on plant families throughout the angiosperm phylogeny by curating 161 chloroplast datasets. ECuADOR first identified and reordered the central regions (LSC–Irb–SSC–IRa) for each dataset and then produced a new annotation for the chloroplast sequences. The process took less than 20 min with a maximum memory requirement of 150 MB and an accuracy of over 99%. Conclusions ECuADOR is the sole de novo one-step recognition and re-ordination tool that provides facilitation in the post-processing analysis of the extra nuclear genomes from NGS data. The program is available at https://github.com/BiodivGenomic/ECuADOR/.


Algorithms ◽  
2020 ◽  
Vol 13 (6) ◽  
pp. 151
Author(s):  
Bruno Carpentieri

The increase in memory and in network traffic used and caused by new sequenced biological data has recently deeply grown. Genomic projects such as HapMap and 1000 Genomes have contributed to the very large rise of databases and network traffic related to genomic data and to the development of new efficient technologies. The large-scale sequencing of samples of DNA has brought new attention and produced new research, and thus the interest in the scientific community for genomic data has greatly increased. In a very short time, researchers have developed hardware tools, analysis software, algorithms, private databases, and infrastructures to support the research in genomics. In this paper, we analyze different approaches for compressing digital files generated by Next-Generation Sequencing tools containing nucleotide sequences, and we discuss and evaluate the compression performance of generic compression algorithms by confronting them with a specific system designed by Jones et al. specifically for genomic file compression: Quip. Moreover, we present a simple but effective technique for the compression of DNA sequences in which we only consider the relevant DNA data and experimentally evaluate its performances.


Sign in / Sign up

Export Citation Format

Share Document