scholarly journals Mash Sketched Reference Dataset for Genome-Based Taxonomy and Comparative Genomics

Author(s):  
Ayixon Sánchez-Reyes ◽  
Maikel Gilberto Fernández-López

The analysis of curated genomic, metagenomic, and proteomic data are of paramount importance in the fields of biology, medicine, education, and bioinformatics. Although this type of data is usually hosted in raw form in free international repositories, its access requires plenty of computing, storage, and processing capacities for the domestic user. The purpose of the study is to offer a comprehensive set of genomic and proteomic reference data, in an accessible and easy-to-use form to the scientific community. A representative type material set of genomes, proteomes and metagenomes were directly downloaded from the site: https://www.ncbi.nlm.nih.gov/assembly/ and from Genome Taxonomy Database, associated with the major groups of Bacteria, Archaea, Virus, and Fungi. Sketched databases were subsequently created and stored on handy raw reduced representations, by using Mash software. Our dataset contains near to 100 GB of space disk reduced to 585.78 MB and represents 87,476 genomics/proteomic records from eight informative contexts, which have been prefiltered to make them accessible, usable, and user-friendly with computational resources. Potential uses of this dataset include but are not limited to, microbial species delimitation, estimation of genomic distances, genomic novelties, paired comparisons between proteomes, genomes, and metagenomes.

2020 ◽  
Vol 17 (03) ◽  
pp. 2050010
Author(s):  
Saeed Saeedvand ◽  
Hadi S. Aghdasi ◽  
Jacky Baltes

Although there are several popular and capable humanoid robot designs available in the kid-size range, they lack some important characteristics: affordability, being user-friendly, using a wide-angle camera, sufficient computational resources for advanced AI algorithms, and mechanical robustness and stability are the most important ones. Recent advances in 3D printer technology enables researchers to move from model to physical implementation relatively easy. Therefore, we introduce a novel fully 3D printed open platform humanoid robot design named ARC. In this paper, we discuss the mechanical structure and software architecture. We show the capabilities of the ARC design in a series of experimental evaluations.


Author(s):  
Maryam Hamzeh-Mivehroud ◽  
Babak Sokouti ◽  
Siavoush Dastmalchi

The current chapter introduces different aspects of molecular docking technique in order to give an overview to the readers about the topics which will be dealt with throughout this volume. Like many other fields of science, molecular docking studies has experienced a lagging period of slow and steady increase in terms of acquiring attention of scientific community as well as its frequency of application, followed by a pronounced era of exponential expansion in theory, methodology, areas of application and performance due to developments in related technologies such as computational resources and theoretical as well as experimental biophysical methods. In the following sections the evolution of molecular docking will be reviewed and its different components including methods, search algorithms, scoring functions, validation of the methods, and area of applications plus few case studies will be touched briefly.


2019 ◽  
Vol 6 (1) ◽  
pp. 205316801983208 ◽  
Author(s):  
Cesar Zucco ◽  
Mariana Batista ◽  
Timothy J. Power

How do political actors value different portfolios? We propose a new approach to measuring portfolio salience by analysing paired comparisons using the Bradley–Terry model. Paired-comparison data are easy to collect using surveys that are user-friendly, rapid, and inexpensive. We implement the approach with serving legislators in Brazil, a particularly difficult case to assess portfolio salience due to the large number of cabinet positions. Our estimates of portfolio values are robust to variations in implementation of the method. Legislators and academics have broadly similar views of the relative worth of cabinet posts. Respondent valuations of portfolios deviate considerably from what would be predicted by objective measures such as budget, policy influence, and opportunities for patronage. Substantively, we show that portfolio salience varies greatly and affects the calculation of formateur advantage and coalescence/proportionality rule measures.


Online Review ◽  
1986 ◽  
Vol 10 (3) ◽  
pp. 163-164 ◽  
Author(s):  
Maria Faraone
Keyword(s):  
Set Up ◽  

If you find yourself drowning in reference data, unable to organize information to your satisfaction, REF‐11 may be the package to help you. With REF‐11, users can set up a file of bibliographic references, and search the file. It is quite user friendly. An optional utility ($30) produces bibliographies from the files. A second utility ($25) stores text material in REF‐11 files.


Author(s):  
N. Soyama ◽  
K. Muramatsu ◽  
M. Daigo ◽  
F. Ochiai ◽  
N. Fujiwara

Validating the accuracy of land cover products using a reliable reference dataset is an important task. A reliable reference dataset is produced with information derived from ground truth data. Recently, the amount of ground truth data derived from information collected by volunteers has been increasing globally. The acquisition of volunteer-based reference data demonstrates great potential. However information given by volunteers is limited useful vegetation information to produce a complete reference dataset based on the plant functional type (PFT) with five specialized forest classes. In this study, we examined the availability and applicability of FLUXNET information to produce reference data with higher levels of reliability. FLUXNET information was useful especially for forest classes for interpretation in comparison with the reference dataset using information given by volunteers.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Wenjing Ma ◽  
Kenong Su ◽  
Hao Wu

Abstract Background Cell type identification is one of the most important questions in single-cell RNA sequencing (scRNA-seq) data analysis. With the accumulation of public scRNA-seq data, supervised cell type identification methods have gained increasing popularity due to better accuracy, robustness, and computational performance. Despite all the advantages, the performance of the supervised methods relies heavily on several key factors: feature selection, prediction method, and, most importantly, choice of the reference dataset. Results In this work, we perform extensive real data analyses to systematically evaluate these strategies in supervised cell identification. We first benchmark nine classifiers along with six feature selection strategies and investigate the impact of reference data size and number of cell types in cell type prediction. Next, we focus on how discrepancies between reference and target datasets and how data preprocessing such as imputation and batch effect correction affect prediction performance. We also investigate the strategies of pooling and purifying reference data. Conclusions Based on our analysis results, we provide guidelines for using supervised cell typing methods. We suggest combining all individuals from available datasets to construct the reference dataset and use multi-layer perceptron (MLP) as the classifier, along with F-test as the feature selection method. All the code used for our analysis is available on GitHub (https://github.com/marvinquiet/RefConstruction_supervisedCelltyping).


2020 ◽  
Author(s):  
Abhishek Agarwal ◽  
Piyush Agrawal ◽  
Aditi Sharma ◽  
Vinod Kumar ◽  
Chirag Mugdal ◽  
...  

AbstractIndiaBioDb (https://webs.iiitd.edu.in/raghava/indiabiodb/) is a manually curated comprehensive repository of bioinformatics resources developed and maintained by Indian researchers. This repository maintains information about 543 freely accessible functional resources that include around 258 biological databases. Each entry provides a complete detail about a resource that includes the name of resources, web link, detail of publication, information about the corresponding author, name of institute, type of resource. A user-friendly searching module has been integrated, which allows users to search our repository on any field. In order to retrieve categorized information, we integrate the browsing facility in this repository. This database can be utilized for extracting the useful information regarding the present scenario of bioinformatics inclusive of all research labs funded by government and private bodies of India. In addition to web interface, we also developed mobile to facilitate the scientific community.


2021 ◽  
pp. 1-12
Author(s):  
Emily R. Mears ◽  
Renee R. Handley ◽  
Matthew J. Grant ◽  
Suzanne J. Reid ◽  
Benjamin T. Day ◽  
...  

Background: The pathological mechanism of cellular dysfunction and death in Huntington’s disease (HD) is not well defined. Our transgenic HD sheep model (OVT73) was generated to investigate these mechanisms and for therapeutic testing. One particular cohort of animals has undergone focused investigation resulting in a large interrelated multi-omic dataset, with statistically significant changes observed comparing OVT73 and control ‘omic’ profiles and reported in literature. Objective: Here we make this dataset publicly available for the advancement of HD pathogenic mechanism discovery. Methods: To enable investigation in a user-friendly format, we integrated seven multi-omic datasets from a cohort of 5-year-old OVT73 (n = 6) and control (n = 6) sheep into a single database utilising the programming language R. It includes high-throughput transcriptomic, metabolomic and proteomic data from blood, brain, and other tissues. Results: We present the ‘multi-omic’ HD sheep database as a queriable web-based platform that can be used by the wider HD research community (https://hdsheep.cer.auckland.ac.nz/). The database is supported with a suite of simple automated statistical analysis functions for rapid exploratory analyses. We present examples of its use that validates the integrity relative to results previously reported. The data may also be downloaded for user determined analysis. Conclusion: We propose the use of this online database as a hypothesis generator and method to confirm/refute findings made from patient samples and alternate model systems, to expand our understanding of HD pathogenesis. Importantly, additional tissue samples are available for further investigation of this cohort.


Author(s):  
Çaglar Bayık ◽  
Kazimierz Becek ◽  
Çetin Mekik ◽  
Mustafa Özendi

The digital elevation model (DEM) is one of the key geospatial datasets used in many fields of engineering and science for countless applications. In this contribution, we assess the vertical accuracy of the Advanced Land Observing Satellite (ALOS) World 3D-30m (AW3D30) DEM using the runway method (RWYM). The RWYM utilizes the longitudinal profiles of runways which are reliable and ubiquitous reference data. A reference dataset used in this project consists of 36 runways located at various points throughout the world. The same dataset was previously used to test the accuracy of WorldDEMTM.  Our study indicates that AW3D30 has a remarkably high RMSE of 1.78 m (one σ). However, while analyzing the results, it has become apparent that it also contains a widespread elevation anomaly. We conclude that this anomaly is the result of uncompensated sensor noise and the data processing algorithm (downsampling of the higher resolution data). We believe that this issue should be communicated to the user community. Also, we would like to note that the traditional accuracy assessment of a DEM, e.g., statistical assessment of the elevation differences = model – reference, does not allow for identification of these type of anomalies in a DEM.


2021 ◽  
Author(s):  
Sebastien Riquier ◽  
Chloe Bessiere ◽  
Benoit Guibert ◽  
Anne-Laure Bouge ◽  
Anthony Boureux ◽  
...  

The huge body of publicly available RNA-seq libraries is a treasure of functional information allowing to quantify the expression of known or novel transcripts in tissues. However, transcript quantification commonly relies on alignment methods requiring a lot of computational resources and processing time, which does not scale easily to large datasets. K-mer decomposition constitutes a new way to process RNA-seq data for the identification of transcriptional signatures, as k-mers can be used to quantify accurately gene expression in a less resource-consuming way. We present the Kmerator Suite, a set of three tools designed to extract specific k-mer signatures, quantify these k-mers into RNA-seq datasets and quickly visualize large datasets characteristics. The core tool, Kmerator, produces specific k-mers for 97% of human genes, enabling the measure of gene expression with high accuracy in simulated datasets. KmerExploR, a direct application of Kmerator, uses a set of predictor genes specific k-mers to infer metadata including library protocol, sample features or contaminations from RNA-seq datasets. KmerExploR results are visualised through a user-friendly interface. Moreover, we demonstrate that the Kmerator Suite can be used for advanced queries targeting known or new biomarkers such as mutations, gene fusions or long non coding-RNAs for human health applications.


Sign in / Sign up

Export Citation Format

Share Document